Robotics StackExchange | Archived questions

Node not running at constant frequency

Hi there :)

I've got a little problem running ROS on two machines and I don't really know how to find out what could cause the problem: Setup: I'm running nodes on two machines at different but constant frequencies (e.g. IMU at 50 Hz, controller at 10 Hz, ...). Most of the time it works perfectly well but just sometimes it seems that all nodes stop running for short instants (see plot of IMU measurements here: https://postimg.org/image/g3p3n4xtz/). You can see in the plot that it's running well until about 10.5 seconds, then slower and then there's a pretty long break of no data being published/received. I'm suspecting two problems:

  1. Wrong synchronization of the machines (I'm using chrony but I can't guarantee that everything is set up perfectly right)
  2. Too high CPU workload which does not allow all nodes to be executed (which would be weird since it's running fine at other times)

Have you ever observed a similar problem or do you know what could cause it? And if not - which tools or methods would you suggest to find the problem?

Any help will be much appreciated, thank you very much in advance!

Best, Max

Asked by maxb on 2016-11-15 17:44:58 UTC

Comments

Are you having multiple callbacks (either subscriber/service callback) in the single-threaded node? Also, check for any sleep in your callback, it might block the main thread and prevent other callback to be triggered

Asked by DavidN on 2016-11-16 01:21:39 UTC

Any details on how the two nodes are connected to one another? e.g. Wifi is known to create lags.

Asked by Humpelstilzchen on 2016-11-16 03:44:13 UTC

I do have multiple callbacks in almost all nodes (7 nodes and up to 5 callbacks) but there are no sleep or other time consuming functions in the callbacks. The nodes are connected by ssh over Wifi so this might be the reason. I also found out that the machines are properly synchronized.

Asked by maxb on 2016-11-16 19:06:28 UTC

Assuming the wifi is indeed the problem: Do you know any tricks to make the connection more reliable? Both in-ROS (maybe decrease number of topics/messages) and 'outside' of ROS?

Asked by maxb on 2016-11-16 21:05:17 UTC

I had the problem that bad reception with a lot of traffic ended in low throughput (Low Bit Rate in iwconfig). For my robot a bigger antenna helped.

Asked by Humpelstilzchen on 2016-11-17 01:40:06 UTC

Provided your application allows for it (can cope with lost msgs) and your msgs aren't too large (smaller than a datagram) you could try and see whether UDPROS works better. See Transport Hints for info.

Asked by gvdhoorn on 2016-11-17 02:59:24 UTC

Personally I never understood the transport hints. Does someone really need to patch all nodes for UDP usage? I'd wish for a global setting...

Asked by Humpelstilzchen on 2016-11-17 04:01:34 UTC

Btw we are indeed not sure yet if Wifi is the problem. I'ld recommend to check if problems also occur with Ethernet.

Asked by Humpelstilzchen on 2016-11-17 04:02:47 UTC

re: finding the root cause: of course. A bit implied perhaps, but that is always the first thing to do. I merely suggested that using UDP is always a better choice when you have lossy links between nodes. If however, you are tunnelling over TCP, as the OP described, that is not going to work.

Asked by gvdhoorn on 2016-11-17 05:19:58 UTC

Answers