Dis/connection problem: "Queue is full."

Marco_Venere · March 19, 2025, 12:36am

From the ClustrBox console, typing:

nodectl poweroff -n X

I get the poweroff of the blade inside X; after this command on the connected blade I start to receive a series of dmesg messages similar to this:

[ 78.080918] miop-ep fe150000.pcie: TX[1]: Queue is full.

The condition worsens when I try to bring Blade3 back online using the command:

nodectl poweron -n X.

Following this command in fact I receive a disproportionate number of errors similar to the one reported.

[ 462.974521] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 462.974625] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 488.753665] miop-ep fe150000.pcie: TX[1]: Queue is full.
[ 489.760317] miop-ep fe150000.pcie: TX[1]: Queue is full.

This error causes the inability to interface with the system with the need to reboot the ClusterBox.

In fact, everything seems to work correctly until I poweroff the connected blades.

I ask if this problem is related to some configuration error and instead what is the correct way to dis/connect (poweroff/poweron) a Blade3 inside the Clusterbox.

Marco_Venere · March 19, 2025, 12:53am

Can you tell me the exact configuration of the eth_pci interface shown in Network/Interfaces from the Clusterbox dashboard?
I must have changed the original configuration.

What are the correct values for:

IPv4 netmask
IPv4 broadcast

I can’t perform a restore configuration file from System/Backup of the ClusterBox dashboard. I see that this process is different from the factory reset, where do I find this function?

Marco_Venere · March 20, 2025, 7:47pm

In the meantime i suggest reading the topic:

in which i describe a CRITICAL PROBLEM i found myself running into.

I thought the full queue problem could be solved by:

•manually assigning the configuration of each blade (even through network manager gui)
•deleting (#) from the /etc/config/dhcp file of the ClusterBox, (at the end of the file), the various entries regarding the address assignment (host and MAC address) of the blades previously set (via WebUI) as having a static address.

In this way maybe the address should remain available. Furthermore, this solution should not create problems with excessive requests in the queue after reconnection.

At the time of writing I actually solve the problem or rather I attenuate it as far as it concerns the temporary manual disconnection of the pci0 interface of the Blade3.

As far as the loss of connectivity after rebooting a single Blade3, I continue to encounter difficulties in connecting despite the commands “ip neigh flush all” and rebooting the eth_pci0 interface of the ClusterBox. Perhaps at the moment the best way to run the ClusterBox in “1,2,3 blade off mode” is to give nodectl poweroff -nX for shutdown, while for rebooting first reboot the ClusterBox (sudo reboot) and then poweroff the blades to be excluded (nodectl poweroff -nX).

Marco_Venere · March 25, 2025, 4:27pm

Regardless of the configuration used (auto or manual), I am finding it useful to reboot the ClusterBox with the following command:

ssh 10.20.0.1 nodectl reboot -all && sudo reboot

This is because a simple reboot does not seem to restore correct connectivity.