[ROS2] Random message delivery time

asked 2020-07-20 04:56:27 -0500

ypicchi gravatar image

Hello. I have just a few weeks of experience in ros and could use some help.

I am trying to make myself a tool to profile roughly any node by intercepting a message and measuring how long it take the node to produce a reply. I have it working for ROS but when trying to port it to ROS2 some odd random delay appears.

If I make a test node take up 100ms to run (simulating it with a sleep), I would expect to measure something around 100-110 ms and that's what I get 80% of the time. The problem is that the other 20% it take much longer : anything between 300ms to 15 seconds longer. If I print the time after publishing and the time after receiving I get several seconds of difference, so it seems the odd delay comes from the message's transit. Both the profiler and test nodes are in a spinning loop so I don't understand why the message would get stuck. Both node are also running on the same machine so there should not be any networking issue.

I tried to play with the QoS setting but the only noticable change I can see regarding this issue is that using the best_effort policy instead of the reliable one make it so the receiver receive no messages at all.

I've been looking for anything related to this issue over the internet for a few days but failed so far at finding some lead. If anyone have an idea of why this is happening or how to fix it I would like to hear it.

Notes :

  • The profiling node takes a few dozens of samples, so it is not just the first one that have this issue.
  • The message is a pointcloud of ~25k points so that's around 500kB of data
  • The profiling node is in python, the test node is in python, the real nodes I want to profile are in C++.
  • I tried it on an x86 ubuntu and an aarch64 ubuntu but I get the same issue on both.
  • This is part of an existing project I just joined. I'm not aware of anything special about my ROS2 config or the DDS's, but I can't rule out this possibility.
edit retag flag offensive close merge delete