ROS2 python node randomly shuts down

asked 2021-12-22 15:48:14 -0600

Gintecc gravatar image

Dear,

I have some ros2 nodes developed with rclpy and python that randomly stop working. I'm pretty sure that the problems lies not with the code since the nodes can survive up to a few hours without issues but then stop working. Even more, there are always two nodes of the same type running and one lives much longer than the other.

I'm not sure if the nodes are actually shut down or are just not receiving any messages anymore since the process is still running and no exceptions are visible in the logs. We've had the same problem over different types of hardware (x86 and ARM) and the behaviour is not really consistent, sometimes it will work for 20min and other times it will work for 3days.

We've checked the system loads and these are not unreasonable (50-80%cpu 20-30%RAM, even at shutdown). The problems is hard to debug since no information is available (No exception, no ros message, ...).

Does any have a clue what might be the cause of this weird problem?

Kind regards.

ROS specs: Ubuntu 20.04 ROS2 galactic (Fast DDS, same effect with CyclonDDS) (Same behaviour with ROS2 foxy)

edit retag flag offensive close merge delete

Comments

Are you able to post your source code so we can try to reproduce the problem?

Geoff gravatar image Geoff  ( 2021-12-22 16:41:03 -0600 )edit

` def __init__(self): self.subscription = self.create_subscription( Image, '/input', self.ps_single_image, qos_profile_sensor_data )

self.publisher = self.create_publisher(ClassificationOutput, '/output', 10)

def ps_single_image(self, data: Image): cv_img = self.bridge.imgmsg_to_cv2(data, desired_encoding="rgb8") label, conf = self.classify_img(cv_img) self.get_logger().info(f"Classified {label} with confidence: {conf}") self.publisher.publish(ClassificationOutput(label=label, conf=conf)) `

This is not the entire source code, but the most important piece I thinks. It's mostly a simple Input/Output class that classifies an images. We have the same problem with a node that reads data from a GigE camera Node.

` def publish_img(self, img, publisher):

if img.shape[-1] == 3:
    ros_msg = self.bridge.cv2_to_imgmsg(img, encoding='rgb8')
elif img.shape[-1] == 1:
    ros_msg = self.bridge.cv2_t
Gintecc gravatar image Gintecc  ( 2021-12-23 03:56:18 -0600 )edit

For some reason the message won't format properly, here is the pastebin for the first fragment: https://pastebin.com/duXXD1JA

Gintecc gravatar image Gintecc  ( 2021-12-23 04:00:02 -0600 )edit

You should not post code in comments. Edit your original question.

gvdhoorn gravatar image gvdhoorn  ( 2021-12-23 05:16:06 -0600 )edit