I’m experiencing severe problems with the internal network of the Cluster Box: Accessing the external network from the control board typically works fine, but when trying to get into the external network from a cluster node (Blade 3; no matter, which one), I get hangups when downloading > 1 MB of data. Now I rebooted one of the nodes via control board, logged into the node, and got a bunch of error messages, which seem to be related to a networking issue:
root@blade3n1:~# dmesg | tail -35
[ 57.720852] r8169 0002:24:00.0: Unable to load firmware rtl_nic/rtl8125b-2.fw (-2)
[ 57.747464] RTL8226B_RTL8221B 2.5Gbps PHY r8169-2-2400:00: attached PHY driver [RTL8226B_RTL8221B 2.5Gbps PHY] (mii_bus:phy_addr=r8169-2-2400:00, irq=IGNORE)
[ 57.867650] r8169 0002:24:00.0 enP2p36s0: Link is Down
[ 58.228952] usb_gadget_probe_driver udc_name=fc000000.usb, dev_name=fc000000.usb
[ 58.360878] android_work: did not send uevent (0 0 0000000000000000)
[ 58.517711] mali fb000000.gpu: Loading Mali firmware 0x1010000
[ 58.518162] mali fb000000.gpu: Protected memory allocator not found, Firmware protected mode entry will not be supported
[ 58.518174] mali fb000000.gpu: Protected memory allocator not found, Firmware protected mode entry will not be supported
[ 58.518181] mali fb000000.gpu: Protected memory allocator not found, Firmware protected mode entry will not be supported
[ 60.216618] systemd-journald[281]: File /var/log/journal/24b0e758b78949088e06ee4bfcbcef83/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[ 62.425928] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[ 65.120067] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.120090] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[0] get remote terminal sensor failed!
[ 65.120094] stream_cif_mipi_id0: update sensor info failed -19
[ 65.120255] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.120268] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[1] get remote terminal sensor failed!
[ 65.120271] stream_cif_mipi_id1: update sensor info failed -19
[ 65.120339] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.120345] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[2] get remote terminal sensor failed!
[ 65.120347] stream_cif_mipi_id2: update sensor info failed -19
[ 65.121224] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.121256] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[3] get remote terminal sensor failed!
[ 65.121267] stream_cif_mipi_id3: update sensor info failed -19
[ 65.121915] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.121936] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[0] get remote terminal sensor failed!
[ 65.121943] rkcif_scale_ch0: update sensor info failed -19
[ 65.123307] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.123328] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[1] get remote terminal sensor failed!
[ 65.123345] rkcif_scale_ch1: update sensor info failed -19
[ 65.123553] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.123570] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[2] get remote terminal sensor failed!
[ 65.123577] rkcif_scale_ch2: update sensor info failed -19
[ 65.124184] rockchip-csi2-dphy0: No link between dphy and sensor
[ 65.124200] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[3] get remote terminal sensor failed!
[ 65.124209] rkcif_scale_ch3: update sensor info failed -19
What’s wrong here? The MIOP firmware version miop-control-blade3-arm64-v0.0.3-20240523 installed on the nodes is already up-to-date, and I am using the revised version of the control board (with the Oculink connectors at the back).
UPDATE: On another node, I even get another PCIe-related error message afterwards:
root@blade3n4:~# dmesg | tail -35
[ 64.577332] rockchip-csi2-dphy0: No link between dphy and sensor
[ 64.577337] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[3] get remote terminal sensor failed!
[ 64.577340] stream_cif_mipi_id3: update sensor info failed -19
[ 64.577543] rockchip-csi2-dphy0: No link between dphy and sensor
[ 64.577551] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[0] get remote terminal sensor failed!
[ 64.577554] rkcif_scale_ch0: update sensor info failed -19
[ 64.577632] rockchip-csi2-dphy0: No link between dphy and sensor
[ 64.577638] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[1] get remote terminal sensor failed!
[ 64.577640] rkcif_scale_ch1: update sensor info failed -19
[ 64.577735] rockchip-csi2-dphy0: No link between dphy and sensor
[ 64.578001] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[2] get remote terminal sensor failed!
[ 64.578005] rkcif_scale_ch2: update sensor info failed -19
[ 64.578110] rockchip-csi2-dphy0: No link between dphy and sensor
[ 64.578116] rkcif-mipi-lvds2: rkcif_update_sensor_info: stream[3] get remote terminal sensor failed!
[ 64.578118] rkcif_scale_ch3: update sensor info failed -19
[ 71.867113] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 71.867175] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 88.938474] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 88.938535] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 121.786155] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 121.786217] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 190.392068] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 190.392130] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 319.472388] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 319.472451] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 329.097643] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 329.097704] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 350.139620] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 3
[ 350.485478] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2
[ 609.315517] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 609.315578] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 1153.024487] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 1153.024548] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 2233.728447] miop-ep fe150000.pcie: TX[0]: Queue is full.
[ 2233.728508] miop-ep fe150000.pcie: TX[1]: Queue is full.