Network errors when booting up a Mixtile Blade 3 as cluster node

Please check the Debian image which includes the new MIOP driver as below link:
https://downloads.mixtile.com/blade3/image/clusterbox-debian-mixtile-blade3-rockchip-format-20250523.img.xz

OK, thank you. I’ve just flashed all four nodes, but they still can’t connect to the PCIe network. On the control board I now get such an error message:

Sun Jun  8 17:40:04 2025 kern.warn kernel: [ 2390.943924] miop 0000:03:00.0: TX[2]: Queue is full.

Neither do I get a DHCP lease for any of the nodes. Apparently the discover packet doesn’t even reach the control board:

root@blade3:~# dhclient -v pci0
Internet Systems Consortium DHCP Client 4.4.3-P1
Copyright 2004-2022 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/pci0/02:9f:f1:8e:cb:0a
Sending on   LPF/pci0/02:9f:f1:8e:cb:0a
Sending on   Socket/fallback
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 7
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 19
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 9
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 15
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 3
No DHCPOFFERS received.
No working leases in persistent database - sleeping.

A DHCP server does run on the control board, though:

mixtile@ClusterBox:~$ ps | grep dnsmasq
 7848 root      2908 S    {dnsmasq} /sbin/ujail -t 5 -n dnsmasq -u -l -r /bin/ubus -r /etc/TZ -r /etc/dnsmas
 7850 dnsmasq   1720 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmas
 7879 mixtile   1404 S    grep dnsmasq

Even giving the node a static IP (here: 10.20.0.11) by using nmcli, as described in “nmcli set static ip address without the DHCP?” at ServerFault did not help:

root@blade3:~# ping 10.20.0.1
PING 10.20.0.1 (10.20.0.1) 56(84) bytes of data.
From 10.20.0.11 icmp_seq=1 Destination Host Unreachable

You can send the dmesg information of blade3 and clusterbox to us

Now the nodes do get a network link, but I still get caught in the instability issue we already know about:

root@blade3:/# apt-get update
Get:1 http://mirrors.ustc.edu.cn/debian bookworm InRelease [151 kB]
Get:2 http://mirrors.ustc.edu.cn/debian-security bookworm-security InRelease [48.0 kB]
Get:3 http://mirrors.ustc.edu.cn/debian bookworm-updates InRelease [55.4 kB]
Get:4 http://mirrors.ustc.edu.cn/debian bookworm-backports InRelease [59.4 kB]
Get:5 http://mirrors.ustc.edu.cn/debian bookworm/main Sources [9,494 kB]
13% [5 Sources 438 kB/9,494 kB 5%] 

root@blade3:/# client_loop: send disconnect: Broken pipe

At this point, the network connection gets broken.

Here is the dmesg output of the control board:

mixtile@ClusterBox:~$ sudo dmesg | tail -80
[   31.946607] pci 0000:02:00.0: BAR 9: assigned [mem 0x20000000-0x22ffffff 64bit pref]
[   31.954556] pci 0000:02:04.0: BAR 9: assigned [mem 0x23000000-0x25ffffff 64bit pref]
[   31.962495] pci 0000:02:08.0: BAR 9: assigned [mem 0x26000000-0x28ffffff 64bit pref]
[   31.970423] pci 0000:02:0c.0: BAR 9: assigned [mem 0x29000000-0x2bffffff 64bit pref]
[   31.978359] pci 0000:02:00.0: BAR 8: assigned [mem 0x2c000000-0x2c0fffff]
[   31.985328] pci 0000:02:04.0: BAR 8: assigned [mem 0x2c100000-0x2c1fffff]
[   31.992296] pci 0000:02:08.0: BAR 8: assigned [mem 0x2c200000-0x2c2fffff]
[   31.999247] pci 0000:02:0c.0: BAR 8: assigned [mem 0x2c300000-0x2c3fffff]
[   32.006216] pci 0000:03:00.0: BAR 0: assigned [mem 0x20000000-0x21ffffff 64bit pref]
[   32.014180] pci 0000:03:00.0: BAR 4: assigned [mem 0x22000000-0x220fffff 64bit pref]
[   32.022142] pci 0000:03:00.0: BAR 6: assigned [mem 0x2c000000-0x2c00ffff pref]
[   32.029540] pci 0000:02:00.0: PCI bridge to [bus 03]
[   32.034648] pci 0000:02:00.0:   bridge window [mem 0x2c000000-0x2c0fffff]
[   32.041610] pci 0000:02:00.0:   bridge window [mem 0x20000000-0x22ffffff 64bit pref]
[   32.049557] pci 0000:04:00.0: BAR 0: assigned [mem 0x24000000-0x25ffffff 64bit pref]
[   32.057521] pci 0000:04:00.0: BAR 4: assigned [mem 0x23000000-0x230fffff 64bit pref]
[   32.065486] pci 0000:04:00.0: BAR 6: assigned [mem 0x2c100000-0x2c10ffff pref]
[   32.072894] pci 0000:02:04.0: PCI bridge to [bus 04]
[   32.077986] pci 0000:02:04.0:   bridge window [mem 0x2c100000-0x2c1fffff]
[   32.084946] pci 0000:02:04.0:   bridge window [mem 0x23000000-0x25ffffff 64bit pref]
[   32.092905] pci 0000:05:00.0: BAR 0: assigned [mem 0x26000000-0x27ffffff 64bit pref]
[   32.100862] pci 0000:05:00.0: BAR 4: assigned [mem 0x28000000-0x280fffff 64bit pref]
[   32.108823] pci 0000:05:00.0: BAR 6: assigned [mem 0x2c200000-0x2c20ffff pref]
[   32.116234] pci 0000:02:08.0: PCI bridge to [bus 05]
[   32.121329] pci 0000:02:08.0:   bridge window [mem 0x2c200000-0x2c2fffff]
[   32.128289] pci 0000:02:08.0:   bridge window [mem 0x26000000-0x28ffffff 64bit pref]
[   32.136245] pci 0000:06:00.0: BAR 0: assigned [mem 0x2a000000-0x2bffffff 64bit pref]
[   32.144209] pci 0000:06:00.0: BAR 4: assigned [mem 0x29000000-0x290fffff 64bit pref]
[   32.152175] pci 0000:06:00.0: BAR 6: assigned [mem 0x2c300000-0x2c30ffff pref]
[   32.159584] pci 0000:02:0c.0: PCI bridge to [bus 06]
[   32.164691] pci 0000:02:0c.0:   bridge window [mem 0x2c300000-0x2c3fffff]
[   32.171652] pci 0000:02:0c.0:   bridge window [mem 0x29000000-0x2bffffff 64bit pref]
[   32.179590] pci 0000:01:00.0: PCI bridge to [bus 02-06]
[   32.184955] pci 0000:01:00.0:   bridge window [mem 0x2c000000-0x2c3fffff]
[   32.191918] pci 0000:01:00.0:   bridge window [mem 0x20000000-0x2bffffff 64bit pref]
[   32.199855] pci 0000:00:00.0: PCI bridge to [bus 01-06]
[   32.205223] pci 0000:00:00.0:   bridge window [mem 0x2c000000-0x2c3fffff]
[   32.212185] pci 0000:00:00.0:   bridge window [mem 0x20000000-0x2bffffff pref]
[   32.219715] shpchp 0000:00:00.0: card - bus=0x0, slot = 0x0 irq=0
[   32.226099] shpchp 0000:01:00.0: card - bus=0x1, slot = 0x0 irq=4
[   32.232488] shpchp 0000:02:00.0: card - bus=0x2, slot = 0x0 irq=4
[   32.238837] shpchp 0000:02:04.0: card - bus=0x2, slot = 0x0 irq=4
[   32.245235] shpchp 0000:02:08.0: card - bus=0x2, slot = 0x0 irq=4
[   32.251602] shpchp 0000:02:0c.0: card - bus=0x2, slot = 0x0 irq=4
[   32.257993] miop 0000:03:00.0: card - bus=0x3, slot = 0x0 irq=4
[   32.264132] miop 0000:03:00.0: probing MIOP node on bus:03
[   33.082914] miop 0000:03:00.0: PCIe bus number 3 mapped to MIOP node id: 2
[   33.091173] miop 0000:03:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.105089] miop 0000:03:00.0: miop irq on tx ready
[   33.141599] miop 0000:03:00.0: MIOP node[2] on bus:03 is online
[   33.148044] miop 0000:04:00.0: card - bus=0x4, slot = 0x0 irq=4
[   33.154205] miop 0000:04:00.0: probing MIOP node on bus:04
[   33.159822] miop 0000:04:00.0: PCIe bus number 4 mapped to MIOP node id: 3
[   33.167993] miop 0000:04:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.182751] miop 0000:04:00.0: miop irq on tx ready
[   33.275704] miop 0000:04:00.0: MIOP node[3] on bus:04 is online
[   33.282179] miop 0000:05:00.0: card - bus=0x5, slot = 0x0 irq=4
[   33.288307] miop 0000:05:00.0: probing MIOP node on bus:05
[   33.293962] miop 0000:05:00.0: PCIe bus number 5 mapped to MIOP node id: 1
[   33.302151] miop 0000:05:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.319961] miop 0000:05:00.0: miop irq on tx ready
[   33.390438] miop 0000:05:00.0: MIOP node[1] on bus:05 is online
[   33.396919] miop 0000:06:00.0: card - bus=0x6, slot = 0x0 irq=4
[   33.403071] miop 0000:06:00.0: probing MIOP node on bus:06
[   33.408686] miop 0000:06:00.0: PCIe bus number 6 mapped to MIOP node id: 0
[   33.416888] miop 0000:06:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.432606] miop 0000:06:00.0: miop irq on tx ready
[   33.477450] miop 0000:06:00.0: MIOP node[0] on bus:06 is online
[   34.077900] 8021q: adding VLAN 0 to HW filter on device eth0
[   34.113901] device eth0 entered promiscuous mode
[   34.134275] br-lan: port 1(eth0.1) entered blocking state
[   34.139942] br-lan: port 1(eth0.1) entered disabled state
[   34.145884] device eth0.1 entered promiscuous mode
[   35.138888] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   38.212820] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   38.227171] br-lan: port 1(eth0.1) entered blocking state
[   38.232797] br-lan: port 1(eth0.1) entered forwarding state
[   38.238937] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.2: link becomes ready
[   38.375307] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   38.428611] mtk_soc_eth 10100000.ethernet eth0: port 5 link up (100Mbps/Full duplex)

This comes from node #1, which has a static IP:

root@blade3:/# dmesg | tail -50
[    4.634762] pcie_ep_rk35: module license 'MIXTILE' taints kernel.
[    4.634785] Disabling lock debugging due to kernel taint
[    4.635296] Mixtile TCP/IP over PCIe device driver initializing
[    4.644666] Mixtile TCP/IP over PCIe endpoint netdevice driver initializing
[    4.654131] Mixtile TCP/IP over PCIe EP driver probe
[    4.708539] r8169 0002:24:00.0 enP2p36s0: Link is Down
[    4.718145] hrtimer: interrupt took 2917 ns
[    4.718598] miop-ep fe150000.pcie: assigned reserved memory node miop_dma@0x0e000000
[    4.736094] miop-ep fe150000.pcie: PCIe Linking...0, LTSSM is 0x0
[    6.800110] miop-ep fe150000.pcie: PCIe Link up, LTSSM is 0x230011
[    6.803638] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[    6.813290] systemd-journald[291]: File /var/log/journal/bad8ecfc142e4799bdbb18271a523621/user-1001.journal corrupted or uncleanly shut down, renaming and replacing.
[    6.909079] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.909122] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.909155]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.921110] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.921142] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.921168]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.932773] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.932798] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.932816]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.947822] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.947844] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.947856]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    7.070412] rk_pcie_establish_link: 272 callbacks suppressed
[    7.070435] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.091504] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.112104] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.132222] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.153335] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.174440] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.195556] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.216671] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.237722] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.258835] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.468793] rk-pcie fe180000.pcie: PCIe Link Fail, LTSSM is 0x3, hw_retries=1
[    8.492234] rk-pcie fe180000.pcie: failed to initialize host
[   16.840306] miop-ep fe150000.pcie: Descriptor PCI address: 2a000000
[   16.840354] miop-ep fe150000.pcie: MSI not enabled, check on legacy interrupt.
[   16.840367] miop-ep fe150000.pcie: irq line: 4
[   16.840380] miop-ep fe150000.pcie: Node online: 80000000
[   16.840496] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   16.843284] miop-ep fe150000.pcie: TX[0]: Queue is ready.
[   16.885310] miop-ep fe150000.pcie: Node online: 1
[   16.890238] miop-ep fe150000.pcie: Node online: 2
[   16.891421] miop-ep fe150000.pcie: TX[1]: Queue is ready.
[   16.894286] miop-ep fe150000.pcie: Node online: 3
[   16.895231] miop-ep fe150000.pcie: TX[3]: Queue is ready.
[   16.896329] miop-ep fe150000.pcie: TX[2]: Queue is ready.
[   17.004845] platform mtd_vendor_storage: deferred probe pending

And this comes from node #2, which still has a dynamic IP:

root@blade3:/# dmesg | tail -50
[    4.680195] pcie_ep_rk35: module license 'MIXTILE' taints kernel.
[    4.680223] Disabling lock debugging due to kernel taint
[    4.680651] Mixtile TCP/IP over PCIe device driver initializing
[    4.682864] r8169 0002:24:00.0 enP2p36s0: Link is Down
[    4.692633] Mixtile TCP/IP over PCIe endpoint netdevice driver initializing
[    4.705309] Mixtile TCP/IP over PCIe EP driver probe
[    4.706902] hrtimer: interrupt took 4083 ns
[    4.707451] miop-ep fe150000.pcie: assigned reserved memory node miop_dma@0x0e000000
[    4.726673] miop-ep fe150000.pcie: PCIe Linking...0, LTSSM is 0x1
[    6.770503] miop-ep fe150000.pcie: PCIe Link up, LTSSM is 0x230011
[    6.774708] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[    6.784338] systemd-journald[289]: File /var/log/journal/0df7273170e34a89874b152862130c7e/user-1001.journal corrupted or uncleanly shut down, renaming and replacing.
[    6.815704] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.815729] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.815749]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.827242] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.827264] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.827283]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.838652] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.838668] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.838678]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.851695] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.851713] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.851728]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    7.054559] rk_pcie_establish_link: 271 callbacks suppressed
[    7.054581] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.075718] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.096834] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.117938] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.139040] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.160154] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.181253] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.202356] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.223469] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.244582] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.475563] rk-pcie fe180000.pcie: PCIe Link Fail, LTSSM is 0x3, hw_retries=1
[    8.498506] rk-pcie fe180000.pcie: failed to initialize host
[   16.722539] miop-ep fe150000.pcie: Descriptor PCI address: 26000000
[   16.722589] miop-ep fe150000.pcie: MSI not enabled, check on legacy interrupt.
[   16.722603] miop-ep fe150000.pcie: irq line: 4
[   16.722617] miop-ep fe150000.pcie: Node online: 80000001
[   16.722732] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   16.725616] miop-ep fe150000.pcie: TX[1]: Queue is ready.
[   16.796712] miop-ep fe150000.pcie: Node online: 2
[   16.799702] miop-ep fe150000.pcie: TX[2]: Queue is ready.
[   16.800735] miop-ep fe150000.pcie: Node online: 3
[   16.802266] miop-ep fe150000.pcie: TX[3]: Queue is ready.
[   16.880502] miop-ep fe150000.pcie: Node online: 0
[   16.887371] miop-ep fe150000.pcie: TX[0]: Queue is ready.
[   16.979036] platform mtd_vendor_storage: deferred probe pending

I saw that you have the operation “root@blade3:/# apt-get update”, it is recommended that you use blade3 to connect to the external network via wired network, and 4 blade3 to connect via pcie network for example: Four devices are in the 10.20.0.x network segment; Another point to note is the default gateway Settings

Maybe you have noticed that the crash occurs after only 9 MB of data having been transferred. Not much. Typically, the nodes exchange gigabytes or even terabytes of data when in operation, so your PCIe-based network switch must handle such large amounts of data.

And: I have already looked into the default gateway settings.

It’s not only me who’s still waiting for a stable version of the MIOP driver. And it’s not only me who suggests you to open-source the driver code so that more eyes can review it and more brains can improve it. Please think about it.

As one of your colleagues suggested me to resell your hardware: I can’t do that before it runs without errors and crashes.