Simple nodes taking up 100% CPU
I have a bunch of nodes running on a robot and sometimes (not sure after how much runtime) very simple nodes such as topictools/relay or robotposepublisher/robotpose_publisher take up 100% CPU. A restart of the system brings it back to the normal state (less than 1% CPU for the same process).
I attached a gdb session to the relay node while it was consuming 100% and every time I interrupted it it was at CallbackQueue::callAvailable()
(as expected, same as when it is at normal CPU usage).
The system does not appear to be on a heavy load when this happens, memory usage is fine and the number of open file descriptors seems reasonable.
Any idea what else I could check?
EDIT:
A little more info about the system:
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.3 LTS"
$ uname -a
Linux beta4 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -s ros-indigo-roscpp | grep Version
Version: 1.11.13-0trusty-20150522-1157-+0000
EDIT2:
Apparently this has something to do with wifi. We have robots at specific locations where we observe this behavior and when we disable the wifi interfaces, the system load goes down again. The nodes that take up CPU don't transmit or receive anything over wifi.
Asked by Stephan on 2015-08-26 06:39:28 UTC
Comments
Asked by dornhege on 2015-08-26 10:18:29 UTC
The relay is passing LaserScan messages at 15 Hz. It looks like it gets stuck at 100% for a long time. The error is very random. At some point in time one or multiple random C++ nodes grab 100% CPU, both standard ROS nodes and custom (very simple) ones.
Asked by Stephan on 2015-08-27 04:54:26 UTC
Will it make any difference if you reduce the message queue length?
Asked by Boris on 2015-08-27 17:01:52 UTC
This seems similar to https://github.com/ros/ros_comm/issues/914
Asked by ahendrix on 2017-06-26 10:51:39 UTC
That's possible. Thanks!
Asked by Stephan on 2017-06-26 10:59:54 UTC