Ask Your Question
1

rosmaster unresponsive

asked 2012-02-02 23:53:33 -0500

RedundantEntry gravatar image

updated 2012-02-03 02:59:11 -0500

Hi guys,

I have a somewhat weird problem with rosmaster. In certain situations rosmaster becomes completely unresponsive and reports 100% cpu usage. The situation occurred in Electric both under Ubuntu Natty (2.6.38-13) and Oneiric (3.0.0-14) 64-bit.

An executable p (written in C++) subscribes and publishes to a single topic t and sends message every 100ms via rosudp. After about 10 seconds the executable calls ros::shutdown() and terminates.

Executables are started in the following pattern:

while(true) {
 for(i: 1..10) {
  for(j: 1..10) {
    Start p i-times in parallel.
    Wait for all processes to terminate
    Sleep(0.5s)
  }
 }
}

After about 300 runs through the innermost loop, rosmaster becomes unresponsive without error message. Tools such as roswtf or rostopic fail to communicate with it and do not report any error. Further, cpu usage of rosmaster climbs to 100% in top. This happens both if the nodes initialise ros with the AnonymousName init option or without.

Is there any suggested way to diagnose this problem? Is there an upper limit on the number of processes/nodes a roscore can manage?

Update:

The master.log contains many errors similar to the following:

[rosmaster.threadpool][ERROR] 2012-02-03 12:28:02,770: Traceback (most recent call last):
  File "/opt/ros/electric/stacks/ros_comm/tools/rosmaster/src/rosmaster/threadpool.py", line 218, in run
result = cmd(*args)
  File "/opt/ros/electric/stacks/ros_comm/tools/rosmaster/src/rosmaster/master_api.py", line 189, in publisher_update_task
xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1575, in __request
verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1264, in request
return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1297, in single_request
return self.parse_response(response)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1473, in parse_response
return u.close()
  File "/usr/lib/python2.7/xmlrpclib.py", line 793, in close
raise Fault(**self._stack[0])
Fault: <Fault -1: 'publisherUpdate: unknown method name'>

The log ends with the following error:

rosmaster.threadpool][ERROR] 2012-02-03 12:28:38,945: Traceback (most recent call last):
  File "/opt/ros/electric/stacks/ros_comm/tools/rosmaster/src/rosmaster/threadpool.py", line 218, in run
result = cmd(*args)
  File "/opt/ros/electric/stacks/ros_comm/tools/rosmaster/src/rosmaster/master_api.py", line 189, in publisher_update_task
xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1575, in __request
verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1264, in request
return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1292, in single_request
self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1439, in send_content
connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
self ...
(more)
edit retag flag offensive close merge delete

Comments

can you post the logfile from the master somewhere? (roscd log: look for master.log)
kwc gravatar image kwc  ( 2012-02-03 00:36:04 -0500 )edit
Sure! The complete log is available at http://das-lab.net/~cn/crash03022012-master.log.tar.gz
RedundantEntry gravatar image RedundantEntry  ( 2012-02-03 02:15:32 -0500 )edit
FYI: still looking into this. haven't been able to reproduce yet
kwc gravatar image kwc  ( 2012-02-07 10:55:52 -0500 )edit

2 Answers

Sort by ยป oldest newest most voted
0

answered 2012-02-07 13:15:20 -0500

kwc gravatar image

To answer part of the question, a rosmaster should easily function with hundreds of nodes. It does begin to perform worse if the nodes have bad connectivity as the master is required to contact nodes with updates.

The exceptions appear to be a red-herring. Because your node publishes and subscribes to the same topic, there's basically a race condition where the master informs the node that the subscriptions for a topic it subscribes to has changed -- the exception is printed just in case, but I don't think it's at the heart of the problem.

I attempted to write a test program that replicated your test node, but couldn't elicit the 100% CPU usage after 30 minutes of repeated running. I would need your actual test code to proceed any further.

I made a minor tweak to the rosmaster code as I look back over it. I don't think it fixes your issue, but you're welcome to try:

https://code.ros.org/trac/ros/changeset/16264

edit flag offensive delete link more
0

answered 2012-02-09 06:24:53 -0500

seanarm gravatar image

This may well be unrelated to your problem, but I had a similar problem. rosmaster was unresponsive-- running roscore and then trying to do a "rosnode list" only told me that it could not connect to the master. I did not, however, have the 100% CPU usage problem, since my problem was not thread related as your appears.

My problem was an inconsistency between the system on which I was running the ROS nodes and my ROS_IP and ROS_MASTER environment variables. Ensure that these are correct, given your setup.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2012-02-02 23:53:33 -0500

Seen: 1,667 times

Last updated: Feb 09 '12