Node not running at constant frequency
Hi there :)
I've got a little problem running ROS on two machines and I don't really know how to find out what could cause the problem: Setup: I'm running nodes on two machines at different but constant frequencies (e.g. IMU at 50 Hz, controller at 10 Hz, ...). Most of the time it works perfectly well but just sometimes it seems that all nodes stop running for short instants (see plot of IMU measurements here: https://postimg.org/image/g3p3n4xtz/ ). You can see in the plot that it's running well until about 10.5 seconds, then slower and then there's a pretty long break of no data being published/received. I'm suspecting two problems:
- Wrong synchronization of the machines (I'm using chrony but I can't guarantee that everything is set up perfectly right)
- Too high CPU workload which does not allow all nodes to be executed (which would be weird since it's running fine at other times)
Have you ever observed a similar problem or do you know what could cause it? And if not - which tools or methods would you suggest to find the problem?
Any help will be much appreciated, thank you very much in advance!
Best, Max
Are you having multiple callbacks (either subscriber/service callback) in the single-threaded node? Also, check for any
sleep
in your callback, it might block the main thread and prevent other callback to be triggeredAny details on how the two nodes are connected to one another? e.g. Wifi is known to create lags.
I do have multiple callbacks in almost all nodes (7 nodes and up to 5 callbacks) but there are no sleep or other time consuming functions in the callbacks. The nodes are connected by ssh over Wifi so this might be the reason. I also found out that the machines are properly synchronized.
Assuming the wifi is indeed the problem: Do you know any tricks to make the connection more reliable? Both in-ROS (maybe decrease number of topics/messages) and 'outside' of ROS?
I had the problem that bad reception with a lot of traffic ended in low throughput (Low Bit Rate in iwconfig). For my robot a bigger antenna helped.
Provided your application allows for it (can cope with lost msgs) and your msgs aren't too large (smaller than a datagram) you could try and see whether
UDPROS
works better. See Transport Hints for info.Personally I never understood the transport hints. Does someone really need to patch all nodes for UDP usage? I'd wish for a global setting...
Btw we are indeed not sure yet if Wifi is the problem. I'ld recommend to check if problems also occur with Ethernet.