[ROS2] Node Becomes Zombie, All Pubs/Subs "Dead" But Node Remains Active
I have a node which is responsible for communicating with an Arduino via USB Serial. It seems to have problems only when it is receiving subscription data. I have another node acting as a PID speed controller publishing PWM values, and the Arduino node has two subscribers for these values (one per motor).
Everything seems to start fine, but after reaching some unknown condition the Arduino node just simply stops responding. I cannot find any errors, warnings, etc. in the logs. I haven't observed any caught or uncaught exceptions. In fact, the node still catches keyboard interrupt exceptions and prints a corresponding "I was caught" info message to the ROS log.
Running ros2 node list
shows that the node is still alive.
Running ros2 topic list -t
shows that subscribers of other nodes are still listening to topics for messages that the Arduino node would normally publish.
Running ros2 doctor --report | less
shows that all of the publishers and subscribers within the Arduino node are completely missing.
This is further corroborated by running ros2 topic echo <arduino_published_topic> <message_type>
and seeing absolutely nothing whereas previously messages were being published quite frequently. Additionally, echoing topics from other concurrently-running nodes does produce a steady stream of messages.
None of this is running in a simulated manner. I have a robot which contains a RaspberryPi 4 and the Arduino, and I interact with them over wifi. Running all of the above commands on the RaspberryPi 4, as opposed to my main desktop computer, on my home network (i.e. congestion or network management can be ruled-out), shows no difference in behavior.
I'm not sure how much I should copy/paste from my node as it has grown quite long, but below are some relevant tidbits. Let me know if I should elaborate further on anything!
Also one final thing I will mention is that the PID Velocity controller producing the subscribed messages has a much lower publish frequency than any of the publishers and timers in this node (0.2 sec per publish vs 0.05 sec per encoder publish from serial_arbiter, for example).
# "Arduino node", called serial_arbiter.py
class SerialArbiter(Node):
def __init__(self):
super().__init__('serial_arbiter')
# Omitting most parts where I declare params, retrieve param values, spit them to the logger
# Declare Parameters and default values
self.declare_parameter('serial_port_path', '/dev/ttyACM0')
self.declare_parameter('serial_baud_rate', 115200)
self.arduino = SerialComms(self.serial_port_path, self.serial_baud_rate)
self.arduino.reset_encoders()
# Publishers and timers are declared here, but these are the only two subscribers.
self.subscriber_motor_right = self.create_subscription(Int16, 'set_motor_power_right', self.subscribed_motor_right_cb, 10)
self.subscriber_motors_left = self.create_subscription(Int16, 'set_motor_power_left', self.subscribed_motor_left_cb, 10)
# Omitting probably-irrelevant callback functions
def subscribed_motor_right_cb(self, message):
try:
self.arduino.set_motor_power(self.arduino.RIGHT, int(message.data))
except Exception as e:
self.get_logger().error("Failed to update Motor 1 power: '{}'".format(e))
pass
def subscribed_motor_left_cb(self, message):
try:
self.arduino.set_motor_power(self.arduino.LEFT, int(message.data))
except Exception as e:
self.get_logger().error("Failed to update Motor 2 power: '{}'".format(e))
pass ...