ClusterBox MIOP Driver Update Instructions

Seems to be a file from your package!

mixtile@mixtile-blade3:~$ dpkg -S /lib/modules/miop/miop.ko 
miop: /lib/modules/miop/miop.ko
mixtile@mixtile-blade3:~$ dpkg -l | grep miop
ii  miop                                          1.0.0-1                                                arm64        miop TCP/IP over PCIe device driver.
mixtile@mixtile-blade3:~$ modinfo /lib/modules/miop/miop.ko 
filename:       /lib/modules/miop/miop.ko
license:        MIXTILE
version:        1.0
description:    Mixtile TCP/IP over PCIe host driver
author:         Martin Liu <martin@mixtile.com>
srcversion:     6F8B498A16A68EE999F9633
alias:          pci:v00004586d0000B6F2sv*sd*bc*sc*i*
depends:        
name:           miop
vermagic:       6.1.0-1016-rockchip SMP mod_unload modversions aarch64

Good morning.

I’m wondering if operation: opkg install miop_1.0_mipsel_24kc.ipk is reversible.
I see that Blade3s that aren’t updated might experience kernel panics on startup.
I’d like to be able to get back to work if something goes wrong.

Thank you very much

Unfortunately, the operation opkg install miop_1.0_mipsel_24kc. ipk isn’t reversible.
Once installed, it overwrites existing driver files, and opkg doesn’t keep a backup of the originals.

If something goes wrong (like a kernel panic or boot failure), the only reliable way to recover is to reflash or update the full system firmware.

It’s best to make a complete firmware backup or keep a copy of the previous . ipk before installing, just in case you need to roll back.

Thanks for the reply.

Luckily, I haven’t done that yet.

Is it possible to flash the firmware via USB?
The last time I tried with the power button pressed, I think I failed.

Could you possibly tell me the best way to backup and restore the firmware?

miop_xx_...ipk should be the name of the ipk to look for!?
Could you tell me if there are any dependencies I should consider?
How do I find my .ipk to back it up?
Do you have the .ipk for my kernel version?

Thank you !!

You can directly replace the driver in the directory /lib/modules/5.15.150/miop.ko

A miop update only makes things worse: Updating it from the IPK file ends in an error message:

Package miop (1.0) installed in root is up to date.

Replacing the kernel module with the one mentioned in the last post ends me up with tons of DMA failures like this one:

[  131.123979] miop 0000:06:00.0: DMA timeout, restart DMA controller.

I would like to ask whether both of the steps in this link, 1. Installing the MIOP Driver on Controller Board 2. Updating Blade3 Firmware, need to be updated using the firmware in the link

Yeah. I have already done the firmware update on the nodes. Please note that the error message I stated appeared on the control board, not on the nodes.

Please confirm whether the PCIe device is recognized normally and whether the negotiated rate is normal,
for example:

root@ClusterBox:~# lspci -s 03:00.0 -vvv|grep LnkSta
LnkSta: Speed 8GT/s, Width x4 
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ 

Do you mean only on the control board, or also on the nodes?

This command is executed on the control board to check the connection status of bus 3, 4, 5, and 6. For example, check bus 3;

lspci -s 03:00.0 -vvv|grep LnkSta

Ah, buses 4 to 6 also! I’ve just run the command as stated by you, but I’ve got some errors:

mixtile@ClusterBox:~$ sudo lspci -s 03:00.0 -vvv|grep LnkSta
                LnkSta: Speed 8GT/s, Width x2 (downgraded)
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
lspci: Unable to load libkmod resources: error -2
mixtile@ClusterBox:~$ sudo lspci -s 04:00.0 -vvv|grep LnkSta
                LnkSta: Speed 8GT/s, Width x2 (downgraded)
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
lspci: Unable to load libkmod resources: error -2
mixtile@ClusterBox:~$ sudo lspci -s 05:00.0 -vvv|grep LnkSta
                LnkSta: Speed 8GT/s, Width x2 (downgraded)
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
lspci: Unable to load libkmod resources: error -2
mixtile@ClusterBox:~$ sudo lspci -s 06:00.0 -vvv|grep LnkSta
                LnkSta: Speed 8GT/s, Width x2 (downgraded)
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
lspci: Unable to load libkmod resources: error -2

When trying to run the command with strace, I even get more errors:

mixtile@ClusterBox:~$ sudo strace -e file lspci -s 03:00.0 -vvv |& grep ENOENT
open("/etc/ld-musl-mipsel-sf.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libpci.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libpci.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libkmod.so.2", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libkmod.so.2", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/root/.pciids-cache", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/sys/bus/pci/devices/0000:03:00.0/label", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/sys/bus/pci/devices/0000:03:00.0/numa_node", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
readlink("/sys/bus/pci/devices/0000:03:00.0/iommu_group", 0x7fb2b39c, 1024) = -1 ENOENT (No such file or directory)
readlink("/sys/bus/pci/devices/0000:03:00.0/of_node", 0x7fb2b39c, 1024) = -1 ENOENT (No such file or directory)
open("/sys/module/compression", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/etc/modprobe.d", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0x7fb2a580) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/run/modprobe.d", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0x7fb2a580) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/usr/local/lib/modprobe.d", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0x7fb2a580) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/lib/modprobe.d", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0x7fb2a580) = -1 ENOENT (No such file or directory)
open("/lib/modules/5.15.150/modules.softdep", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/modules/5.15.150/modules.dep.bin", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)

In the lspci command log, it is seen that the Speed is 8GT/s and the Width is x2. The negotiated one is 2 lanes. I would like to ask if each blade3 is connected to an M.2 SSD. In addition, the error message of the lspci command can be ignored and will not affect it for the time being.

No, there are (still) no M.2 SSDs implanted into the cluster box, but I’m planning to do that.

BTW, the speed of the LAN port of the control board has suddenly dropped to only 100 MBit/sec.

Judging from the logs and descriptions you provided, your four Blade3s have been burned https://downloads.mixtile.com/cluster-box/blade3-ubuntu-images/ubuntu-24.04-preinstalled-desktop-arm64-mixtile-blade3-rockchip-format.zip and no SSD is connected. Your normal recognition of PCIe should be Speed 8GT/s and Width x4. But now it shows Speed 8GT/s and Width x2, which is a problem. My suggestion is to check if there are any contact issues with balde3 or re-flash the firmware for testing.

Already done that. No problems found.

Nope. In fact, I’ve got a version of Debian Bookworm. Does the version of Ubuntu you mentioned fully support the Cluster Box’s MIOP? Does this also apply to your server version?

root@blade3:/# neofetch
       _,met$$$$$gg.          root@blade3 
    ,g$$$$$$$$$$$$$$$P.       ----------- 
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux 12 (bookworm) aarch64 
 ,$$P'              `$$$.     Host: Mixtile Blade 3 v1.0.1 
',$$P       ,ggs.     `$$b:   Kernel: 6.1.99 
`d$$'     ,$P"'   .    $$$    Uptime: 5 mins 
 $$P      d$'     ,    $$P    Packages: 1460 (dpkg) 
 $$:      $$.   -    ,d$$'    Shell: bash 5.2.15 
 $$;      Y$b._   _,d$P'      WM: Xfwm4 
 Y$$.    `.`"Y$$$$P"'         Theme: Adwaita [GTK3] 
 `$$b      "-.__              Icons: Adwaita [GTK3] 
  `Y$$                        CPU: (8) @ 1.800GHz 
   `Y$$.                      Memory: 486MiB / 15947MiB 
     `$$b.
       `Y$$b.                                         
          `"Y$b._                                     
              `"""

Exactly. But: I’d like to do that once the cluster works as expected.

Does the nodectl flash command work now?

Currently, the ubuntu desktop version offers better support, so it is recommended that you use this version of the firmware. The nodectl flash function is not yet available. You still need to flash a single device.

OK, I’ve just flashed all four nodes with the Ubuntu distro you sent me, but:

  1. Apparently, you must set the node into MaskROM mode before flashing, as otherwise the node won’t be recognised.
  2. The new distro does not solve original problem with the breaking connection when trying to download something to some node. Additionally, the control board now becomes very slow in some circumstances.

Node 1:

mixtile@mixtile-ubuntu:~$ sudo apt-get update
Hit:1 http://ports.ubuntu.com noble InRelease
Get:2 http://ports.ubuntu.com noble-updates InRelease [126 kB]
Get:3 http://ports.ubuntu.com noble-backports InRelease [126 kB]
Get:4 https://ppa.launchpadcontent.net/jjriek/panfork-mesa/ubuntu noble InRelease [17.8 kB]
Get:5 http://ports.ubuntu.com noble-security InRelease [126 kB]      
0% [5 InRelease 2,572 B/126 kB 2%]client_loop: send disconnect: Broken pipe

Control board:

[  308.581738] miop 0000:06:00.0: DMA timeout, restart DMA controller.
[  309.591778] miop 0000:06:00.0: DMA timeout, restart DMA controller.
[  310.601900] miop 0000:06:00.0: DMA timeout, restart DMA controller.
[  311.611940] miop 0000:06:00.0: DMA timeout, restart DMA controller.

BTW, this is what I get in the control board’s syslog directly after booting up just after I had received the broken pipe:

mixtile@ClusterBox:~$ sudo dmesg | tail -35
[   33.273584] miop 0000:03:00.0: probing MIOP node on bus:03
[   34.036673] miop 0000:03:00.0: PCIe bus number 3 mapped to MIOP node id: 2
[   34.044954] miop 0000:03:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   34.062477] miop 0000:03:00.0: miop irq on tx ready
[   34.111196] miop 0000:03:00.0: MIOP node[2] on bus:03 is online
[   34.117678] miop 0000:04:00.0: card - bus=0x4, slot = 0x0 irq=4
[   34.123844] miop 0000:04:00.0: probing MIOP node on bus:04
[   34.129460] miop 0000:04:00.0: PCIe bus number 4 mapped to MIOP node id: 3
[   34.137619] miop 0000:04:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   34.155713] miop 0000:04:00.0: miop irq on tx ready
[   34.244668] miop 0000:04:00.0: MIOP node[3] on bus:04 is online
[   34.251110] miop 0000:05:00.0: card - bus=0x5, slot = 0x0 irq=4
[   34.257278] miop 0000:05:00.0: probing MIOP node on bus:05
[   34.262901] miop 0000:05:00.0: PCIe bus number 5 mapped to MIOP node id: 1
[   34.271059] miop 0000:05:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   34.289820] miop 0000:05:00.0: miop irq on tx ready
[   34.359149] miop 0000:05:00.0: MIOP node[1] on bus:05 is online
[   34.365618] miop 0000:06:00.0: card - bus=0x6, slot = 0x0 irq=4
[   34.371761] miop 0000:06:00.0: probing MIOP node on bus:06
[   34.377377] miop 0000:06:00.0: PCIe bus number 6 mapped to MIOP node id: 0
[   34.385556] miop 0000:06:00.0: pci_alloc_irq_vectors() only alloc 1 vectors
[   34.404051] miop 0000:06:00.0: miop irq on tx ready
[   34.443187] miop 0000:06:00.0: MIOP node[0] on bus:06 is online
[   35.089768] 8021q: adding VLAN 0 to HW filter on device eth0
[   35.146707] device eth0 entered promiscuous mode
[   35.171010] br-lan: port 1(eth0.1) entered blocking state
[   35.176613] br-lan: port 1(eth0.1) entered disabled state
[   35.182607] device eth0.1 entered promiscuous mode
[   36.120473] IPv6: ADDRCONF(NETDEV_CHANGE): pci0: link becomes ready
[   39.240293] mtk_soc_eth 10100000.ethernet eth0: port 5 link up (100Mbps/Full duplex)
[   39.265169] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   39.290115] br-lan: port 1(eth0.1) entered blocking state
[   39.295711] br-lan: port 1(eth0.1) entered forwarding state
[   39.301857] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.2: link becomes ready
[   39.393188] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready

miop driver on controller board 是否更新为这个?ClusterBox MIOP Driver Update Instructions

I have already done that, @Buyuliang ! As stated in another thread, I’m now at least able to work around this issue by running an HTTP proxy on the control board and making all blades use it when contacting external web servers. It’s not the best solution, but at least it works.