Simple nodes taking up 100% CPU [closed]

asked 2015-08-26 06:39:28 -0600

Stephan gravatar image

updated 2017-06-26 10:37:33 -0600

I have a bunch of nodes running on a robot and sometimes (not sure after how much runtime) very simple nodes such as topic_tools/relay or robot_pose_publisher/robot_pose_publisher take up 100% CPU. A restart of the system brings it back to the normal state (less than 1% CPU for the same process). I attached a gdb session to the relay node while it was consuming 100% and every time I interrupted it it was at CallbackQueue::callAvailable() (as expected, same as when it is at normal CPU usage).

The system does not appear to be on a heavy load when this happens, memory usage is fine and the number of open file descriptors seems reasonable.

Any idea what else I could check?

EDIT:

A little more info about the system:

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.3 LTS"
$ uname -a
Linux beta4 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -s ros-indigo-roscpp | grep Version
Version: 1.11.13-0trusty-20150522-1157-+0000

EDIT2:

Apparently this has something to do with wifi. We have robots at specific locations where we observe this behavior and when we disable the wifi interfaces, the system load goes down again. The nodes that take up CPU don't transmit or receive anything over wifi.

edit retag flag offensive reopen merge delete

Closed for the following reason question is not relevant or outdated by tfoote
close date 2018-03-02 18:59:48.425223

Comments

  1. Are you pushing many messages through the relay? 2. Is that shortly 100% before going back to 1% or does it get stuck at 100%?
dornhege gravatar imagedornhege ( 2015-08-26 10:18:29 -0600 )edit

The relay is passing LaserScan messages at 15 Hz. It looks like it gets stuck at 100% for a long time. The error is very random. At some point in time one or multiple random C++ nodes grab 100% CPU, both standard ROS nodes and custom (very simple) ones.

Stephan gravatar imageStephan ( 2015-08-27 04:54:26 -0600 )edit

Will it make any difference if you reduce the message queue length?

Boris gravatar imageBoris ( 2015-08-27 17:01:52 -0600 )edit
ahendrix gravatar imageahendrix ( 2017-06-26 10:51:39 -0600 )edit

That's possible. Thanks!

Stephan gravatar imageStephan ( 2017-06-26 10:59:54 -0600 )edit