Network errors when booting up a Mixtile Blade 3 as cluster node

Please check the Debian image which includes the new MIOP driver as below link:
https://downloads.mixtile.com/blade3/image/clusterbox-debian-mixtile-blade3-rockchip-format-20250523.img.xz

OK, thank you. I’ve just flashed all four nodes, but they still can’t connect to the PCIe network. On the control board I now get such an error message:

Sun Jun  8 17:40:04 2025 kern.warn kernel: [ 2390.943924] miop 0000:03:00.0: TX[2]: Queue is full.

Neither do I get a DHCP lease for any of the nodes. Apparently the discover packet doesn’t even reach the control board:

root@blade3:~# dhclient -v pci0
Internet Systems Consortium DHCP Client 4.4.3-P1
Copyright 2004-2022 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/pci0/02:9f:f1:8e:cb:0a
Sending on   LPF/pci0/02:9f:f1:8e:cb:0a
Sending on   Socket/fallback
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 7
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 19
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 9
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 15
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 3
No DHCPOFFERS received.
No working leases in persistent database - sleeping.

A DHCP server does run on the control board, though:

mixtile@ClusterBox:~$ ps | grep dnsmasq
 7848 root      2908 S    {dnsmasq} /sbin/ujail -t 5 -n dnsmasq -u -l -r /bin/ubus -r /etc/TZ -r /etc/dnsmas
 7850 dnsmasq   1720 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmas
 7879 mixtile   1404 S    grep dnsmasq

Even giving the node a static IP (here: 10.20.0.11) by using nmcli, as described in “nmcli set static ip address without the DHCP?” at ServerFault did not help:

root@blade3:~# ping 10.20.0.1
PING 10.20.0.1 (10.20.0.1) 56(84) bytes of data.
From 10.20.0.11 icmp_seq=1 Destination Host Unreachable

You can send the dmesg information of blade3 and clusterbox to us

Now the nodes do get a network link, but I still get caught in the instability issue we already know about:

root@blade3:/# apt-get update
Get:1 http://mirrors.ustc.edu.cn/debian bookworm InRelease [151 kB]
Get:2 http://mirrors.ustc.edu.cn/debian-security bookworm-security InRelease [48.0 kB]
Get:3 http://mirrors.ustc.edu.cn/debian bookworm-updates InRelease [55.4 kB]
Get:4 http://mirrors.ustc.edu.cn/debian bookworm-backports InRelease [59.4 kB]
Get:5 http://mirrors.ustc.edu.cn/debian bookworm/main Sources [9,494 kB]
13% [5 Sources 438 kB/9,494 kB 5%] 

root@blade3:/# client_loop: send disconnect: Broken pipe

At this point, the network connection gets broken.

Here is the dmesg output of the control board:

mixtile@ClusterBox:~$ sudo dmesg | tail -80
[   31.946607] pci 0000:02:00.0: BAR 9: assigned [mem 0x20000000-0x22ffffff 64bit pref]
[   31.954556] pci 0000:02:04.0: BAR 9: assigned [mem 0x23000000-0x25ffffff 64bit pref]
[   31.962495] pci 0000:02:08.0: BAR 9: assigned [mem 0x26000000-0x28ffffff 64bit pref]
[   31.970423] pci 0000:02:0c.0: BAR 9: assigned [mem 0x29000000-0x2bffffff 64bit pref]
[   31.978359] pci 0000:02:00.0: BAR 8: assigned [mem 0x2c000000-0x2c0fffff]
[   31.985328] pci 0000:02:04.0: BAR 8: assigned [mem 0x2c100000-0x2c1fffff]
[   31.992296] pci 0000:02:08.0: BAR 8: assigned [mem 0x2c200000-0x2c2fffff]
[   31.999247] pci 0000:02:0c.0: BAR 8: assigned [mem 0x2c300000-0x2c3fffff]
[   32.006216] pci 0000:03:00.0: BAR 0: assigned [mem 0x20000000-0x21ffffff 64bit pref]
[   32.014180] pci 0000:03:00.0: BAR 4: assigned [mem 0x22000000-0x220fffff 64bit pref]
[   32.022142] pci 0000:03:00.0: BAR 6: assigned [mem 0x2c000000-0x2c00ffff pref]
[   32.029540] pci 0000:02:00.0: PCI bridge to [bus 03]
[   32.034648] pci 0000:02:00.0:   bridge window [mem 0x2c000000-0x2c0fffff]
[   32.041610] pci 0000:02:00.0:   bridge window [mem 0x20000000-0x22ffffff 64bit pref]
[   32.049557] pci 0000:04:00.0: BAR 0: assigned [mem 0x24000000-0x25ffffff 64bit pref]
[   32.057521] pci 0000:04:00.0: BAR 4: assigned [mem 0x23000000-0x230fffff 64bit pref]
[   32.065486] pci 0000:04:00.0: BAR 6: assigned [mem 0x2c100000-0x2c10ffff pref]
[   32.072894] pci 0000:02:04.0: PCI bridge to [bus 04]
[   32.077986] pci 0000:02:04.0:   bridge window [mem 0x2c100000-0x2c1fffff]
[   32.084946] pci 0000:02:04.0:   bridge window [mem 0x23000000-0x25ffffff 64bit pref]
[   32.092905] pci 0000:05:00.0: BAR 0: assigned [mem 0x26000000-0x27ffffff 64bit pref]
[   32.100862] pci 0000:05:00.0: BAR 4: assigned [mem 0x28000000-0x280fffff 64bit pref]
[   32.108823] pci 0000:05:00.0: BAR 6: assigned [mem 0x2c200000-0x2c20ffff pref]
[   32.116234] pci 0000:02:08.0: PCI bridge to [bus 05]
[   32.121329] pci 0000:02:08.0:   bridge window [mem 0x2c200000-0x2c2fffff]
[   32.128289] pci 0000:02:08.0:   bridge window [mem 0x26000000-0x28ffffff 64bit pref]
[   32.136245] pci 0000:06:00.0: BAR 0: assigned [mem 0x2a000000-0x2bffffff 64bit pref]
[   32.144209] pci 0000:06:00.0: BAR 4: assigned [mem 0x29000000-0x290fffff 64bit pref]
[   32.152175] pci 0000:06:00.0: BAR 6: assigned [mem 0x2c300000-0x2c30ffff pref]
[   32.159584] pci 0000:02:0c.0: PCI bridge to [bus 06]
[   32.164691] pci 0000:02:0c.0:   bridge window [mem 0x2c300000-0x2c3fffff]
[   32.171652] pci 0000:02:0c.0:   bridge window [mem 0x29000000-0x2bffffff 64bit pref]
[   32.179590] pci 0000:01:00.0: PCI bridge to [bus 02-06]
[   32.184955] pci 0000:01:00.0:   bridge window [mem 0x2c000000-0x2c3fffff]
[   32.191918] pci 0000:01:00.0:   bridge window [mem 0x20000000-0x2bffffff 64bit pref]
[   32.199855] pci 0000:00:00.0: PCI bridge to [bus 01-06]
[   32.205223] pci 0000:00:00.0:   bridge window [mem 0x2c000000-0x2c3fffff]
[   32.212185] pci 0000:00:00.0:   bridge window [mem 0x20000000-0x2bffffff pref]
[   32.219715] shpchp 0000:00:00.0: card - bus=0x0, slot = 0x0 irq=0
[   32.226099] shpchp 0000:01:00.0: card - bus=0x1, slot = 0x0 irq=4
[   32.232488] shpchp 0000:02:00.0: card - bus=0x2, slot = 0x0 irq=4
[   32.238837] shpchp 0000:02:04.0: card - bus=0x2, slot = 0x0 irq=4
[   32.245235] shpchp 0000:02:08.0: card - bus=0x2, slot = 0x0 irq=4
[   32.251602] shpchp 0000:02:0c.0: card - bus=0x2, slot = 0x0 irq=4
[   32.257993] miop 0000:03:00.0: card - bus=0x3, slot = 0x0 irq=4
[   32.264132] miop 0000:03:00.0: probing MIOP node on bus:03
[   33.082914] miop 0000:03:00.0: PCIe bus number 3 mapped to MIOP node id: 2
[   33.091173] miop 0000:03:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.105089] miop 0000:03:00.0: miop irq on tx ready
[   33.141599] miop 0000:03:00.0: MIOP node[2] on bus:03 is online
[   33.148044] miop 0000:04:00.0: card - bus=0x4, slot = 0x0 irq=4
[   33.154205] miop 0000:04:00.0: probing MIOP node on bus:04
[   33.159822] miop 0000:04:00.0: PCIe bus number 4 mapped to MIOP node id: 3
[   33.167993] miop 0000:04:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.182751] miop 0000:04:00.0: miop irq on tx ready
[   33.275704] miop 0000:04:00.0: MIOP node[3] on bus:04 is online
[   33.282179] miop 0000:05:00.0: card - bus=0x5, slot = 0x0 irq=4
[   33.288307] miop 0000:05:00.0: probing MIOP node on bus:05
[   33.293962] miop 0000:05:00.0: PCIe bus number 5 mapped to MIOP node id: 1
[   33.302151] miop 0000:05:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.319961] miop 0000:05:00.0: miop irq on tx ready
[   33.390438] miop 0000:05:00.0: MIOP node[1] on bus:05 is online
[   33.396919] miop 0000:06:00.0: card - bus=0x6, slot = 0x0 irq=4
[   33.403071] miop 0000:06:00.0: probing MIOP node on bus:06
[   33.408686] miop 0000:06:00.0: PCIe bus number 6 mapped to MIOP node id: 0
[   33.416888] miop 0000:06:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.432606] miop 0000:06:00.0: miop irq on tx ready
[   33.477450] miop 0000:06:00.0: MIOP node[0] on bus:06 is online
[   34.077900] 8021q: adding VLAN 0 to HW filter on device eth0
[   34.113901] device eth0 entered promiscuous mode
[   34.134275] br-lan: port 1(eth0.1) entered blocking state
[   34.139942] br-lan: port 1(eth0.1) entered disabled state
[   34.145884] device eth0.1 entered promiscuous mode
[   35.138888] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   38.212820] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   38.227171] br-lan: port 1(eth0.1) entered blocking state
[   38.232797] br-lan: port 1(eth0.1) entered forwarding state
[   38.238937] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.2: link becomes ready
[   38.375307] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   38.428611] mtk_soc_eth 10100000.ethernet eth0: port 5 link up (100Mbps/Full duplex)

This comes from node #1, which has a static IP:

root@blade3:/# dmesg | tail -50
[    4.634762] pcie_ep_rk35: module license 'MIXTILE' taints kernel.
[    4.634785] Disabling lock debugging due to kernel taint
[    4.635296] Mixtile TCP/IP over PCIe device driver initializing
[    4.644666] Mixtile TCP/IP over PCIe endpoint netdevice driver initializing
[    4.654131] Mixtile TCP/IP over PCIe EP driver probe
[    4.708539] r8169 0002:24:00.0 enP2p36s0: Link is Down
[    4.718145] hrtimer: interrupt took 2917 ns
[    4.718598] miop-ep fe150000.pcie: assigned reserved memory node miop_dma@0x0e000000
[    4.736094] miop-ep fe150000.pcie: PCIe Linking...0, LTSSM is 0x0
[    6.800110] miop-ep fe150000.pcie: PCIe Link up, LTSSM is 0x230011
[    6.803638] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[    6.813290] systemd-journald[291]: File /var/log/journal/bad8ecfc142e4799bdbb18271a523621/user-1001.journal corrupted or uncleanly shut down, renaming and replacing.
[    6.909079] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.909122] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.909155]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.921110] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.921142] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.921168]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.932773] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.932798] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.932816]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.947822] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.947844] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.947856]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    7.070412] rk_pcie_establish_link: 272 callbacks suppressed
[    7.070435] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.091504] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.112104] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.132222] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.153335] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.174440] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.195556] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.216671] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.237722] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.258835] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.468793] rk-pcie fe180000.pcie: PCIe Link Fail, LTSSM is 0x3, hw_retries=1
[    8.492234] rk-pcie fe180000.pcie: failed to initialize host
[   16.840306] miop-ep fe150000.pcie: Descriptor PCI address: 2a000000
[   16.840354] miop-ep fe150000.pcie: MSI not enabled, check on legacy interrupt.
[   16.840367] miop-ep fe150000.pcie: irq line: 4
[   16.840380] miop-ep fe150000.pcie: Node online: 80000000
[   16.840496] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   16.843284] miop-ep fe150000.pcie: TX[0]: Queue is ready.
[   16.885310] miop-ep fe150000.pcie: Node online: 1
[   16.890238] miop-ep fe150000.pcie: Node online: 2
[   16.891421] miop-ep fe150000.pcie: TX[1]: Queue is ready.
[   16.894286] miop-ep fe150000.pcie: Node online: 3
[   16.895231] miop-ep fe150000.pcie: TX[3]: Queue is ready.
[   16.896329] miop-ep fe150000.pcie: TX[2]: Queue is ready.
[   17.004845] platform mtd_vendor_storage: deferred probe pending

And this comes from node #2, which still has a dynamic IP:

root@blade3:/# dmesg | tail -50
[    4.680195] pcie_ep_rk35: module license 'MIXTILE' taints kernel.
[    4.680223] Disabling lock debugging due to kernel taint
[    4.680651] Mixtile TCP/IP over PCIe device driver initializing
[    4.682864] r8169 0002:24:00.0 enP2p36s0: Link is Down
[    4.692633] Mixtile TCP/IP over PCIe endpoint netdevice driver initializing
[    4.705309] Mixtile TCP/IP over PCIe EP driver probe
[    4.706902] hrtimer: interrupt took 4083 ns
[    4.707451] miop-ep fe150000.pcie: assigned reserved memory node miop_dma@0x0e000000
[    4.726673] miop-ep fe150000.pcie: PCIe Linking...0, LTSSM is 0x1
[    6.770503] miop-ep fe150000.pcie: PCIe Link up, LTSSM is 0x230011
[    6.774708] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[    6.784338] systemd-journald[289]: File /var/log/journal/0df7273170e34a89874b152862130c7e/user-1001.journal corrupted or uncleanly shut down, renaming and replacing.
[    6.815704] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.815729] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.815749]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.827242] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.827264] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.827283]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.838652] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.838668] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.838678]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.851695] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.851713] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.851728]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    7.054559] rk_pcie_establish_link: 271 callbacks suppressed
[    7.054581] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.075718] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.096834] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.117938] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.139040] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.160154] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.181253] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.202356] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.223469] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.244582] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.475563] rk-pcie fe180000.pcie: PCIe Link Fail, LTSSM is 0x3, hw_retries=1
[    8.498506] rk-pcie fe180000.pcie: failed to initialize host
[   16.722539] miop-ep fe150000.pcie: Descriptor PCI address: 26000000
[   16.722589] miop-ep fe150000.pcie: MSI not enabled, check on legacy interrupt.
[   16.722603] miop-ep fe150000.pcie: irq line: 4
[   16.722617] miop-ep fe150000.pcie: Node online: 80000001
[   16.722732] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   16.725616] miop-ep fe150000.pcie: TX[1]: Queue is ready.
[   16.796712] miop-ep fe150000.pcie: Node online: 2
[   16.799702] miop-ep fe150000.pcie: TX[2]: Queue is ready.
[   16.800735] miop-ep fe150000.pcie: Node online: 3
[   16.802266] miop-ep fe150000.pcie: TX[3]: Queue is ready.
[   16.880502] miop-ep fe150000.pcie: Node online: 0
[   16.887371] miop-ep fe150000.pcie: TX[0]: Queue is ready.
[   16.979036] platform mtd_vendor_storage: deferred probe pending

I saw that you have the operation “root@blade3:/# apt-get update”, it is recommended that you use blade3 to connect to the external network via wired network, and 4 blade3 to connect via pcie network for example: Four devices are in the 10.20.0.x network segment; Another point to note is the default gateway Settings

Maybe you have noticed that the crash occurs after only 9 MB of data having been transferred. Not much. Typically, the nodes exchange gigabytes or even terabytes of data when in operation, so your PCIe-based network switch must handle such large amounts of data.

And: I have already looked into the default gateway settings.