Network errors when booting up a Mixtile Blade 3 as cluster node

this is the output of:

sudo tcpdump -i pci0

09:33:11.323700 IP blade3-N1.41122 > blade3-N4.ssh: Flags [.], ack 45113, win 512, options [nop,nop,TS val 1305005630 ecr 2533616080], length 0
09:33:11.324439 IP blade3-N4.ssh > blade3-N1.41122: Flags [P.], seq 45113:53377, ack 4269, win 509, options [nop,nop,TS val 2533616082 ecr 1305005630], length 8264
09:33:11.324763 IP blade3-N4.ssh > blade3-N1.41122: Flags [.], seq 53377:86017, ack 4269, win 509, options [nop,nop,TS val 2533616082 ecr 1305005630], length 32640

Can anyone tell me if the values ​​are abnormal?
It is a routing between blades, that is to exploit the pcie connection. It seems that excessive traffic creates this kind of error.

Well, I haven’t reached the point where I could test the speed of the network connection between nodes, but I’ve made the experience that when sucking an O/S update via apt-get upgrade (Internet → control board → node), the connection already stalls after some 5 MB of data, so it’s not really “excessive traffic”.

Good news.

Prof.
I didn’t notice this, however i was already thinking of creating a route to the gateway of the network card of the Blade3 2.5Gb/s work.

I was saying:
I partially solve the problem of the connection interruption following an excessive number of errors like the one described; it is a palliative but for the moment it seems that it can guarantee a minimum of stability in case of connection forwarding.

As recommended by our good boss ChatGPT: it is a matter of activating RPS (Receive Packet Steering) to balance the load of the network IRQs between different CPUs.

Solution: (on every Blade as root, no sudo)

echo 2 > /sys/class/net//queues/rx-0/rps_cpus
reboot

By the way, it could also be useful in the case of the control card, but i don’t feel like talking to that.

Thanks you so far

Your command (or that you get from ChatGPT) does not work on my nodes, as the subdir /sys/class/net/queues/ is missing. BTW, the double slash // in the directory you stated is obviously wrong.

UPDATE: The correct path reads as follows: /sys/class/net/pci0/queues/rx-0/rps_cpus. And no, I still have network issues with error messages like these two ones on the control board side:

  • [ 660.751012] miop-ep fe150000.pcie: TX[0]: Queue is full.
  • [ 505.970795] miop 0000:06:00.0: DMA timeout, restart DMA controller.

Hi, There is an update for the TCP/IP over PCIe (MIOP) driver of the ClusterBox product, which resolves the following two issues:

  1. Fix random driver collapse.
  2. Resolve the issue of a full PCIe queue.
    Please refer to the following link for the update details and usage tutorial.
    ClusterBox MIOP Driver Update Instructions | Mixtile
1 Like

With ClusterBox MT7620A, do you mean the control board?

Yes, it indeed refers to the control board. I sincerely apologize for the confusion caused. We will promptly revise the instructions and ensure they are more accurate in the future.

OK, I’ve now updated the MIOP driver on the control board following your instructions. About the new Debian image, I’ve found a mismatch between the version you recommend in your firmware upgrade guide: image-release-blade3-debian11-20230505.img

…and the version I’ve already got on my blades: Linux blade3n1 5.10.66 #127 SMP Mon Oct 30 14:11:23 CST 2023 aarch64

Additionally, I get many lines of an obscure error message in the syslog of the control board, which look like this:

[ 504.173322] miop 0000:06:00.0: DMA timeout, restart DMA controller.

So the image you let me download is even slighty older (2023-05-05) than the one I’ve already got (2023-10-30). Or do you generally recommend to install the Ubuntu image?

We highly recommend installing the Ubuntu image via the link provided.
Blade3 Firmware Download Link
For the Debian image, we are currently resolving some issues. The updated version, which will include the new MIOP driver, is scheduled for release by this Friday.

OK, thank you. But: The last error message (the DMA issue) I sent you came from the control board, not from any of the nodes. Does it mean that the current version of the MIOP driver for the control board also has an issue?

When Blade 3 accesses the internet via the Control Board, it may briefly encounter DMA timeout errors. It is recommended to directly use the 2.5GbE port on each Blade 3 for internet access, as this provides faster and more stable connectivity.​

No, this is not the case: In fact, the nodes don’t even see each other. Neither do they see the control board. DHCP requests don’t arrive at the control board. Despite that, I get the DMA error every three seconds, even when the nodes don’t even try to access the network.

I can try to install the Ubuntu image on the nodes, but I am skeptical whether this will solve my issue.

Please check the Debian image which includes the new MIOP driver as below link:
https://downloads.mixtile.com/blade3/image/clusterbox-debian-mixtile-blade3-rockchip-format-20250523.img.xz

OK, thank you. I’ve just flashed all four nodes, but they still can’t connect to the PCIe network. On the control board I now get such an error message:

Sun Jun  8 17:40:04 2025 kern.warn kernel: [ 2390.943924] miop 0000:03:00.0: TX[2]: Queue is full.

Neither do I get a DHCP lease for any of the nodes. Apparently the discover packet doesn’t even reach the control board:

root@blade3:~# dhclient -v pci0
Internet Systems Consortium DHCP Client 4.4.3-P1
Copyright 2004-2022 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/pci0/02:9f:f1:8e:cb:0a
Sending on   LPF/pci0/02:9f:f1:8e:cb:0a
Sending on   Socket/fallback
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 7
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 19
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 9
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 15
DHCPDISCOVER on pci0 to 255.255.255.255 port 67 interval 3
No DHCPOFFERS received.
No working leases in persistent database - sleeping.

A DHCP server does run on the control board, though:

mixtile@ClusterBox:~$ ps | grep dnsmasq
 7848 root      2908 S    {dnsmasq} /sbin/ujail -t 5 -n dnsmasq -u -l -r /bin/ubus -r /etc/TZ -r /etc/dnsmas
 7850 dnsmasq   1720 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmas
 7879 mixtile   1404 S    grep dnsmasq

Even giving the node a static IP (here: 10.20.0.11) by using nmcli, as described in “nmcli set static ip address without the DHCP?” at ServerFault did not help:

root@blade3:~# ping 10.20.0.1
PING 10.20.0.1 (10.20.0.1) 56(84) bytes of data.
From 10.20.0.11 icmp_seq=1 Destination Host Unreachable

You can send the dmesg information of blade3 and clusterbox to us

Now the nodes do get a network link, but I still get caught in the instability issue we already know about:

root@blade3:/# apt-get update
Get:1 http://mirrors.ustc.edu.cn/debian bookworm InRelease [151 kB]
Get:2 http://mirrors.ustc.edu.cn/debian-security bookworm-security InRelease [48.0 kB]
Get:3 http://mirrors.ustc.edu.cn/debian bookworm-updates InRelease [55.4 kB]
Get:4 http://mirrors.ustc.edu.cn/debian bookworm-backports InRelease [59.4 kB]
Get:5 http://mirrors.ustc.edu.cn/debian bookworm/main Sources [9,494 kB]
13% [5 Sources 438 kB/9,494 kB 5%] 

root@blade3:/# client_loop: send disconnect: Broken pipe

At this point, the network connection gets broken.

Here is the dmesg output of the control board:

mixtile@ClusterBox:~$ sudo dmesg | tail -80
[   31.946607] pci 0000:02:00.0: BAR 9: assigned [mem 0x20000000-0x22ffffff 64bit pref]
[   31.954556] pci 0000:02:04.0: BAR 9: assigned [mem 0x23000000-0x25ffffff 64bit pref]
[   31.962495] pci 0000:02:08.0: BAR 9: assigned [mem 0x26000000-0x28ffffff 64bit pref]
[   31.970423] pci 0000:02:0c.0: BAR 9: assigned [mem 0x29000000-0x2bffffff 64bit pref]
[   31.978359] pci 0000:02:00.0: BAR 8: assigned [mem 0x2c000000-0x2c0fffff]
[   31.985328] pci 0000:02:04.0: BAR 8: assigned [mem 0x2c100000-0x2c1fffff]
[   31.992296] pci 0000:02:08.0: BAR 8: assigned [mem 0x2c200000-0x2c2fffff]
[   31.999247] pci 0000:02:0c.0: BAR 8: assigned [mem 0x2c300000-0x2c3fffff]
[   32.006216] pci 0000:03:00.0: BAR 0: assigned [mem 0x20000000-0x21ffffff 64bit pref]
[   32.014180] pci 0000:03:00.0: BAR 4: assigned [mem 0x22000000-0x220fffff 64bit pref]
[   32.022142] pci 0000:03:00.0: BAR 6: assigned [mem 0x2c000000-0x2c00ffff pref]
[   32.029540] pci 0000:02:00.0: PCI bridge to [bus 03]
[   32.034648] pci 0000:02:00.0:   bridge window [mem 0x2c000000-0x2c0fffff]
[   32.041610] pci 0000:02:00.0:   bridge window [mem 0x20000000-0x22ffffff 64bit pref]
[   32.049557] pci 0000:04:00.0: BAR 0: assigned [mem 0x24000000-0x25ffffff 64bit pref]
[   32.057521] pci 0000:04:00.0: BAR 4: assigned [mem 0x23000000-0x230fffff 64bit pref]
[   32.065486] pci 0000:04:00.0: BAR 6: assigned [mem 0x2c100000-0x2c10ffff pref]
[   32.072894] pci 0000:02:04.0: PCI bridge to [bus 04]
[   32.077986] pci 0000:02:04.0:   bridge window [mem 0x2c100000-0x2c1fffff]
[   32.084946] pci 0000:02:04.0:   bridge window [mem 0x23000000-0x25ffffff 64bit pref]
[   32.092905] pci 0000:05:00.0: BAR 0: assigned [mem 0x26000000-0x27ffffff 64bit pref]
[   32.100862] pci 0000:05:00.0: BAR 4: assigned [mem 0x28000000-0x280fffff 64bit pref]
[   32.108823] pci 0000:05:00.0: BAR 6: assigned [mem 0x2c200000-0x2c20ffff pref]
[   32.116234] pci 0000:02:08.0: PCI bridge to [bus 05]
[   32.121329] pci 0000:02:08.0:   bridge window [mem 0x2c200000-0x2c2fffff]
[   32.128289] pci 0000:02:08.0:   bridge window [mem 0x26000000-0x28ffffff 64bit pref]
[   32.136245] pci 0000:06:00.0: BAR 0: assigned [mem 0x2a000000-0x2bffffff 64bit pref]
[   32.144209] pci 0000:06:00.0: BAR 4: assigned [mem 0x29000000-0x290fffff 64bit pref]
[   32.152175] pci 0000:06:00.0: BAR 6: assigned [mem 0x2c300000-0x2c30ffff pref]
[   32.159584] pci 0000:02:0c.0: PCI bridge to [bus 06]
[   32.164691] pci 0000:02:0c.0:   bridge window [mem 0x2c300000-0x2c3fffff]
[   32.171652] pci 0000:02:0c.0:   bridge window [mem 0x29000000-0x2bffffff 64bit pref]
[   32.179590] pci 0000:01:00.0: PCI bridge to [bus 02-06]
[   32.184955] pci 0000:01:00.0:   bridge window [mem 0x2c000000-0x2c3fffff]
[   32.191918] pci 0000:01:00.0:   bridge window [mem 0x20000000-0x2bffffff 64bit pref]
[   32.199855] pci 0000:00:00.0: PCI bridge to [bus 01-06]
[   32.205223] pci 0000:00:00.0:   bridge window [mem 0x2c000000-0x2c3fffff]
[   32.212185] pci 0000:00:00.0:   bridge window [mem 0x20000000-0x2bffffff pref]
[   32.219715] shpchp 0000:00:00.0: card - bus=0x0, slot = 0x0 irq=0
[   32.226099] shpchp 0000:01:00.0: card - bus=0x1, slot = 0x0 irq=4
[   32.232488] shpchp 0000:02:00.0: card - bus=0x2, slot = 0x0 irq=4
[   32.238837] shpchp 0000:02:04.0: card - bus=0x2, slot = 0x0 irq=4
[   32.245235] shpchp 0000:02:08.0: card - bus=0x2, slot = 0x0 irq=4
[   32.251602] shpchp 0000:02:0c.0: card - bus=0x2, slot = 0x0 irq=4
[   32.257993] miop 0000:03:00.0: card - bus=0x3, slot = 0x0 irq=4
[   32.264132] miop 0000:03:00.0: probing MIOP node on bus:03
[   33.082914] miop 0000:03:00.0: PCIe bus number 3 mapped to MIOP node id: 2
[   33.091173] miop 0000:03:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.105089] miop 0000:03:00.0: miop irq on tx ready
[   33.141599] miop 0000:03:00.0: MIOP node[2] on bus:03 is online
[   33.148044] miop 0000:04:00.0: card - bus=0x4, slot = 0x0 irq=4
[   33.154205] miop 0000:04:00.0: probing MIOP node on bus:04
[   33.159822] miop 0000:04:00.0: PCIe bus number 4 mapped to MIOP node id: 3
[   33.167993] miop 0000:04:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.182751] miop 0000:04:00.0: miop irq on tx ready
[   33.275704] miop 0000:04:00.0: MIOP node[3] on bus:04 is online
[   33.282179] miop 0000:05:00.0: card - bus=0x5, slot = 0x0 irq=4
[   33.288307] miop 0000:05:00.0: probing MIOP node on bus:05
[   33.293962] miop 0000:05:00.0: PCIe bus number 5 mapped to MIOP node id: 1
[   33.302151] miop 0000:05:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.319961] miop 0000:05:00.0: miop irq on tx ready
[   33.390438] miop 0000:05:00.0: MIOP node[1] on bus:05 is online
[   33.396919] miop 0000:06:00.0: card - bus=0x6, slot = 0x0 irq=4
[   33.403071] miop 0000:06:00.0: probing MIOP node on bus:06
[   33.408686] miop 0000:06:00.0: PCIe bus number 6 mapped to MIOP node id: 0
[   33.416888] miop 0000:06:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   33.432606] miop 0000:06:00.0: miop irq on tx ready
[   33.477450] miop 0000:06:00.0: MIOP node[0] on bus:06 is online
[   34.077900] 8021q: adding VLAN 0 to HW filter on device eth0
[   34.113901] device eth0 entered promiscuous mode
[   34.134275] br-lan: port 1(eth0.1) entered blocking state
[   34.139942] br-lan: port 1(eth0.1) entered disabled state
[   34.145884] device eth0.1 entered promiscuous mode
[   35.138888] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   38.212820] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   38.227171] br-lan: port 1(eth0.1) entered blocking state
[   38.232797] br-lan: port 1(eth0.1) entered forwarding state
[   38.238937] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.2: link becomes ready
[   38.375307] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   38.428611] mtk_soc_eth 10100000.ethernet eth0: port 5 link up (100Mbps/Full duplex)

This comes from node #1, which has a static IP:

root@blade3:/# dmesg | tail -50
[    4.634762] pcie_ep_rk35: module license 'MIXTILE' taints kernel.
[    4.634785] Disabling lock debugging due to kernel taint
[    4.635296] Mixtile TCP/IP over PCIe device driver initializing
[    4.644666] Mixtile TCP/IP over PCIe endpoint netdevice driver initializing
[    4.654131] Mixtile TCP/IP over PCIe EP driver probe
[    4.708539] r8169 0002:24:00.0 enP2p36s0: Link is Down
[    4.718145] hrtimer: interrupt took 2917 ns
[    4.718598] miop-ep fe150000.pcie: assigned reserved memory node miop_dma@0x0e000000
[    4.736094] miop-ep fe150000.pcie: PCIe Linking...0, LTSSM is 0x0
[    6.800110] miop-ep fe150000.pcie: PCIe Link up, LTSSM is 0x230011
[    6.803638] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[    6.813290] systemd-journald[291]: File /var/log/journal/bad8ecfc142e4799bdbb18271a523621/user-1001.journal corrupted or uncleanly shut down, renaming and replacing.
[    6.909079] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.909122] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.909155]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.921110] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.921142] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.921168]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.932773] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.932798] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.932816]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.947822] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.947844] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.947856]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    7.070412] rk_pcie_establish_link: 272 callbacks suppressed
[    7.070435] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.091504] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.112104] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.132222] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.153335] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.174440] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.195556] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.216671] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.237722] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.258835] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.468793] rk-pcie fe180000.pcie: PCIe Link Fail, LTSSM is 0x3, hw_retries=1
[    8.492234] rk-pcie fe180000.pcie: failed to initialize host
[   16.840306] miop-ep fe150000.pcie: Descriptor PCI address: 2a000000
[   16.840354] miop-ep fe150000.pcie: MSI not enabled, check on legacy interrupt.
[   16.840367] miop-ep fe150000.pcie: irq line: 4
[   16.840380] miop-ep fe150000.pcie: Node online: 80000000
[   16.840496] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   16.843284] miop-ep fe150000.pcie: TX[0]: Queue is ready.
[   16.885310] miop-ep fe150000.pcie: Node online: 1
[   16.890238] miop-ep fe150000.pcie: Node online: 2
[   16.891421] miop-ep fe150000.pcie: TX[1]: Queue is ready.
[   16.894286] miop-ep fe150000.pcie: Node online: 3
[   16.895231] miop-ep fe150000.pcie: TX[3]: Queue is ready.
[   16.896329] miop-ep fe150000.pcie: TX[2]: Queue is ready.
[   17.004845] platform mtd_vendor_storage: deferred probe pending

And this comes from node #2, which still has a dynamic IP:

root@blade3:/# dmesg | tail -50
[    4.680195] pcie_ep_rk35: module license 'MIXTILE' taints kernel.
[    4.680223] Disabling lock debugging due to kernel taint
[    4.680651] Mixtile TCP/IP over PCIe device driver initializing
[    4.682864] r8169 0002:24:00.0 enP2p36s0: Link is Down
[    4.692633] Mixtile TCP/IP over PCIe endpoint netdevice driver initializing
[    4.705309] Mixtile TCP/IP over PCIe EP driver probe
[    4.706902] hrtimer: interrupt took 4083 ns
[    4.707451] miop-ep fe150000.pcie: assigned reserved memory node miop_dma@0x0e000000
[    4.726673] miop-ep fe150000.pcie: PCIe Linking...0, LTSSM is 0x1
[    6.770503] miop-ep fe150000.pcie: PCIe Link up, LTSSM is 0x230011
[    6.774708] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[    6.784338] systemd-journald[289]: File /var/log/journal/0df7273170e34a89874b152862130c7e/user-1001.journal corrupted or uncleanly shut down, renaming and replacing.
[    6.815704] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.815729] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.815749]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.827242] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.827264] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.827283]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.838652] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.838668] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.838678]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    6.851695] rk_hdmirx fdee0000.hdmirx-controller: hdmirx_audio_startup: device is no connected or audio is off
[    6.851713] hdmi-audio-codec hdmi-audio-codec.8.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[    6.851728]  rockchip-hdmiin: ASoC: error at __soc_pcm_open on rockchip-hdmiin: -19
[    7.054559] rk_pcie_establish_link: 271 callbacks suppressed
[    7.054581] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.075718] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.096834] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.117938] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.139040] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.160154] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.181253] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.202356] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.223469] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.244582] rk-pcie fe180000.pcie: PCIe Linking... LTSSM is 0x3
[    7.475563] rk-pcie fe180000.pcie: PCIe Link Fail, LTSSM is 0x3, hw_retries=1
[    8.498506] rk-pcie fe180000.pcie: failed to initialize host
[   16.722539] miop-ep fe150000.pcie: Descriptor PCI address: 26000000
[   16.722589] miop-ep fe150000.pcie: MSI not enabled, check on legacy interrupt.
[   16.722603] miop-ep fe150000.pcie: irq line: 4
[   16.722617] miop-ep fe150000.pcie: Node online: 80000001
[   16.722732] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   16.725616] miop-ep fe150000.pcie: TX[1]: Queue is ready.
[   16.796712] miop-ep fe150000.pcie: Node online: 2
[   16.799702] miop-ep fe150000.pcie: TX[2]: Queue is ready.
[   16.800735] miop-ep fe150000.pcie: Node online: 3
[   16.802266] miop-ep fe150000.pcie: TX[3]: Queue is ready.
[   16.880502] miop-ep fe150000.pcie: Node online: 0
[   16.887371] miop-ep fe150000.pcie: TX[0]: Queue is ready.
[   16.979036] platform mtd_vendor_storage: deferred probe pending

I saw that you have the operation “root@blade3:/# apt-get update”, it is recommended that you use blade3 to connect to the external network via wired network, and 4 blade3 to connect via pcie network for example: Four devices are in the 10.20.0.x network segment; Another point to note is the default gateway Settings

Maybe you have noticed that the crash occurs after only 9 MB of data having been transferred. Not much. Typically, the nodes exchange gigabytes or even terabytes of data when in operation, so your PCIe-based network switch must handle such large amounts of data.

And: I have already looked into the default gateway settings.

It’s not only me who’s still waiting for a stable version of the MIOP driver. And it’s not only me who suggests you to open-source the driver code so that more eyes can review it and more brains can improve it. Please think about it.

As one of your colleagues suggested me to resell your hardware: I can’t do that before it runs without errors and crashes.

OK, I’ve now at least got a workaround, which may not be the ultimate solution, but at least it works: I installed a web proxy (Squid) on the control board and then made the four nodes (which now have Ubuntu 24 on them) use the proxy for all HTTP and HTTPS traffic. Step by step:

  1. Install squid on the control board (via LuCI or type sudo opkg install squid on the command line).
  2. Log into node #1 via serial console (nodectl console -n 1).
  3. Edit the environment file sudo nano /etc/environment and add the necessary proxy settings:
export http_proxy=10.20.0.1:3128
export https_proxy=10.20.0.1:3128
  1. Create a proxy settings file for APT: sudo nano /etc/apt/apt.conf.d/proxy.conf:
Acquire::http::Proxy "http://10.20.0.1:3128/";
Acquire::https::Proxy "http://10.20.0.1:3128/";
  1. Finally, make APT perform a system update: sudo apt-get update↵sudo apt-get upgrade
  2. Repeat steps #2 thru #5 with the remaining nodes.