Ask Your Question
1

Problems launching many nodes on OSX

asked 2016-07-26 10:36:43 -0500

jonfink gravatar image

updated 2016-07-26 10:38:20 -0500

I recently did a fresh install of ROS-jade on OSX El Capitan and for the most part everything is working well. However, I have noticed that when launching large launch files (e.g., 100 nodes), there seems to be problems accessing the ROS master (registering pub/sub, reading params, etc).

In an effort to test things, I wrote a simple test node:

#include <ros/ros.h>
#include <std_msgs/Float64.h>

void onFloatSub(const std_msgs::Float64::ConstPtr& msg)
{

}

int main(int argc, char *argv[])
{
  ros::init(argc, argv, "test_node");

  ros::NodeHandle pnh("~");

  ros::Subscriber sub = pnh.subscribe<std_msgs::Float64>("float_sub", 1, &onFloatSub);
  ros::Publisher pub = pnh.advertise<std_msgs::Float64>("float_pub", 1);

  ros::Rate r(10.0);
  while(ros::ok()) {

    std::string test_param_value;
    if(!pnh.getParam("test_param", test_param_value)) {
      ROS_ERROR("Unable to read local 'test_param'");
    }

    if(test_param_value != std::string("test_param_value_set"))
      ROS_ERROR("Incorrect test_param_value!");

    ros::spinOnce();

    r.sleep();
  }

  return 0;
}

And ran 100 versions of it in a launch file

<node pkg="test_many_node" type="test_many_node" name="test_node_1" output="screen">
    <param name="test_param" value="test_param_value_set"/>
</node>

(with nodes 1 through 100).

Some nodes seem to run ok, but then I see errors like these (which are similar to what I saw in more complicated, real-world examples):

...
process[test_node_53-54]: started with pid [25907]
process[test_node_54-55]: started with pid [25908]
process[test_node_55-56]: started with pid [25909]
process[test_node_56-57]: started with pid [25910]
[ERROR] ros.test_many_node> Unable to read local 'test_param'
[ERROR] ros.test_many_node> Incorrect test_param_value!
[ERROR] ros.test_many_node> Unable to read local 'test_param'
[ERROR] ros.test_many_node> Incorrect test_param_value!
[ERROR] ros.roscpp> [registerService] Failed to contact master at [localhost:11311].  Retrying...
...

I don't think that I have experienced this problem before upgrading to El Capitan - has anyone else seen similar problems or have thoughts on things to try? I did a quick check on a Ubuntu machine (in a virtualized Docker image) and everything worked fine...

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2016-07-27 09:33:45 -0500

jonfink gravatar image

Ok, almost 24 hours later and I have a solution (but not quite a full explanation).

It comes down to "something" that changed in the OSX build of python 2.7 on El Capitan with respect to handling sockets. The ROS ThreadingXMLRPCServer python class inherits from the SimpleXMLRPCServer class which inherits the SocketServer.TCPServerclass. The TCPServer class sets a default parameter request_queue_size=5 which controls how big it will allow the incoming connection queue to become (that is, the queue of _new_ connections - this value gets passed down to the python socket.listen() function)

I don't know what changed in OSX 10.11, maybe the rate at which the kernel allows connections or maybe how socket.listen() is implemented. Either way, I found any easy fix to be changing the ROS ThreadingXMLRPCServer class:

class ThreadingXMLRPCServer(socketserver.ThreadingMixIn, SimpleXMLRPCServer):
    """
    Adds ThreadingMixin to SimpleXMLRPCServer to support multiple concurrent
    requests via threading. Also makes logging toggleable.
    """
    def __init__(self, addr, log_requests=1):
        """
        Overrides SimpleXMLRPCServer to set option to allow_reuse_address.
        """
        # allow_reuse_address defaults to False in Python 2.4.  We set it
        # to True to allow quick restart on the same port.  This is equivalent
        # to calling setsockopt(SOL_SOCKET,SO_REUSEADDR,1)
        self.allow_reuse_address = True
        self.request_queue_size = 128
...

To increase the request_queue_size variable. I'll make an issue/pull-request to the ros_comm repos on github.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2016-07-26 10:36:43 -0500

Seen: 73 times

Last updated: Jul 27 '16