Node not running at constant frequency
Hi there :)
I've got a little problem running ROS on two machines and I don't really know how to find out what could cause the problem: Setup: I'm running nodes on two machines at different but constant frequencies (e.g. IMU at 50 Hz, controller at 10 Hz, ...). Most of the time it works perfectly well but just sometimes it seems that all nodes stop running for short instants (see plot of IMU measurements here: https://postimg.org/image/g3p3n4xtz/). You can see in the plot that it's running well until about 10.5 seconds, then slower and then there's a pretty long break of no data being published/received. I'm suspecting two problems:
- Wrong synchronization of the machines (I'm using chrony but I can't guarantee that everything is set up perfectly right)
- Too high CPU workload which does not allow all nodes to be executed (which would be weird since it's running fine at other times)
Have you ever observed a similar problem or do you know what could cause it? And if not - which tools or methods would you suggest to find the problem?
Any help will be much appreciated, thank you very much in advance!
Best, Max
Asked by maxb on 2016-11-15 17:44:58 UTC
Comments
Are you having multiple callbacks (either subscriber/service callback) in the single-threaded node? Also, check for any
sleep
in your callback, it might block the main thread and prevent other callback to be triggeredAsked by DavidN on 2016-11-16 01:21:39 UTC
Any details on how the two nodes are connected to one another? e.g. Wifi is known to create lags.
Asked by Humpelstilzchen on 2016-11-16 03:44:13 UTC
I do have multiple callbacks in almost all nodes (7 nodes and up to 5 callbacks) but there are no sleep or other time consuming functions in the callbacks. The nodes are connected by ssh over Wifi so this might be the reason. I also found out that the machines are properly synchronized.
Asked by maxb on 2016-11-16 19:06:28 UTC
Assuming the wifi is indeed the problem: Do you know any tricks to make the connection more reliable? Both in-ROS (maybe decrease number of topics/messages) and 'outside' of ROS?
Asked by maxb on 2016-11-16 21:05:17 UTC
I had the problem that bad reception with a lot of traffic ended in low throughput (Low Bit Rate in iwconfig). For my robot a bigger antenna helped.
Asked by Humpelstilzchen on 2016-11-17 01:40:06 UTC
Provided your application allows for it (can cope with lost msgs) and your msgs aren't too large (smaller than a datagram) you could try and see whether
UDPROS
works better. See Transport Hints for info.Asked by gvdhoorn on 2016-11-17 02:59:24 UTC
Personally I never understood the transport hints. Does someone really need to patch all nodes for UDP usage? I'd wish for a global setting...
Asked by Humpelstilzchen on 2016-11-17 04:01:34 UTC
Btw we are indeed not sure yet if Wifi is the problem. I'ld recommend to check if problems also occur with Ethernet.
Asked by Humpelstilzchen on 2016-11-17 04:02:47 UTC
re: finding the root cause: of course. A bit implied perhaps, but that is always the first thing to do. I merely suggested that using UDP is always a better choice when you have lossy links between nodes. If however, you are tunnelling over TCP, as the OP described, that is not going to work.
Asked by gvdhoorn on 2016-11-17 05:19:58 UTC