ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

robotzak's profile - activity

2020-10-16 02:36:42 -0500 received badge  Nice Question (source)
2017-04-18 09:25:11 -0500 received badge  Famous Question (source)
2017-04-18 09:25:11 -0500 received badge  Notable Question (source)
2016-07-06 19:35:10 -0500 received badge  Popular Question (source)
2016-06-13 11:59:57 -0500 received badge  Student (source)
2016-06-12 18:33:32 -0500 asked a question Actionlib Delays when Using Multiple Clients/Servers

Hi everyone,

I am observing some interesting behavior when using multiple actionlib clients and servers.

Here is my setup: I have one master process that maintains N SimpleActionClients. I have N separate ros nodes which each have an instance of the same SimpleActionServer (the same ros node is ran with different node names, and each action server is given a different name based on the node name). My use case involves running different scenarios and getting the results asynchronously, and trying to make sure each server is always working on something (as soon as I receive a completed goal, I send a new scenario to the server).

When N is 1, I find that a single execute callback on the server node takes some amount of time, let's call it X sec.

When N is 10, each server node takes roughly 1.2*X sec

When N is 20, each server node takes roughly 2*X sec

My intuition says that there should be some overhead for managing multiple actionlib clients like this, but I didn't expect that each individual execute callback is somehow affected by the number of active actionlib clients and/or servers. I find it also weird that I can run all of the servers in every test case, and still observe this behavior (Always running 20 servers, I still observe this behavior when I create 1, 10, and 20 clients and use them).

In order to know when a server has completed a goal, I am asking SimpleActionClient for its state and seeing if it is done. I cannot waitForResult since any of the currently active servers could return a result that I can process.

EDIT: I just want to add that I am running this on a machine with Ubuntu 14.04 that has 128GB of RAM and 32 cores, so computational resources aren't getting overtaxed, as far as I can measure. The system's cpu usage goes to ~60% when running 20 servers.

My main questions are:

Is this expected behavior that execute callbacks slow down with more active servers and clients?

Is there a way to remove these delays that are occurring with multiple clients?

I will provide any more clarifications as necessary. Thank you in advance for your insight and help with this issue!

2016-03-07 14:12:13 -0500 received badge  Taxonomist
2014-06-12 14:20:27 -0500 received badge  Famous Question (source)
2013-10-10 08:57:08 -0500 received badge  Notable Question (source)
2013-10-04 22:30:56 -0500 received badge  Popular Question (source)
2013-10-04 06:32:26 -0500 received badge  Organizer (source)
2013-10-04 06:31:35 -0500 asked a question Random Node Crashes Over The Network

I am currently running into a problem that only occurs when trying to run nodes over the network.

Let me explain the use case I have. I am currently using Groovy.

I start a roscore on my local machine (a Mac for what it's worth). Then, I have generated a set of launch files that will each load a significant number of parameters to the Parameter Server (this input defines how the node constructs its objects and the initialization values for them.) The number of these files is quite large (about 100). I then start these nodes by roslaunching the launch files on 4 other machines (Ubuntu 12.04.3 LTS).

The problem I am having is that if I start all these nodes at once, a small percentage of them will not execute(2-5%). The process is terminated before the node even starts. I suspect this has to do with the massive amount of data being processed and served by the parameter server, but I am not certain.

My question is, is there a maximum number of nodes that can be ran under the same roscore? Furthermore, are there restrictions on the amount of parameters that can be stored at any given time on the parameter server?

Thanks

EDIT: Here is the master's log file: http://www.filedropper.com/master_5

EDIT 2: It might be of relevance to say that each node will need to load on the order of 400 parameters upon launching. So for 100 nodes, this can result in 40000 parameters.

UPDATE: I tried running the same situation above, except this time I ran the roscore on an Ubuntu 12.04 machine. The amount of crashed nodes is almost zero in my original test case. When I ran a larger experiment, (many more parameters), 3% of my nodes crashed. (15 out of 500). I also noticed that a mutex is being used in the master.cpp execute function on OSX and not on Linux machines (There are include guards around it). Is this behavior expected when communicating between an OSX machine and a Linux machine?

2013-10-03 09:49:22 -0500 received badge  Famous Question (source)
2013-07-05 05:39:04 -0500 commented question Launching a node crashes Ubuntu 12.04 LTS Server

It looks like it might be a problem with the NFS system that is set up on the cluster. I'm not entirely sure it's a ROS specific problem.

2013-07-04 12:32:16 -0500 received badge  Notable Question (source)
2013-07-04 06:22:29 -0500 received badge  Editor (source)
2013-07-04 04:02:43 -0500 received badge  Popular Question (source)
2013-07-02 06:13:47 -0500 asked a question Launching a node crashes Ubuntu 12.04 LTS Server

I am currently using ROS Groovy on two different machines in a cluster setup.

I have been trying to set up a system where I start a roscore on my local machine (Mac OSX 10.7.5) and then on a server machine running Ubuntu 12.04 LTS Server, I start another node. This works the first few times, but eventually the server machine goes into some sort of kernel core dump (I don't have access to the physical machine to get this core dump) and requires a manual restart. My local machine seems unaffected.

My question is if Ubuntu 12.04 LTS Server has problems running ROS nodes or if this is an isolated incident only for my setup. Any help is appreciated.

EDIT:

Some more details about my situation. I am starting only a roscore on my local machine where I load a set of parameters to the parameter server. I then start a custom written node on the server machine using a Sun Grid Engine qsub containing the roslaunch file. The first time I do this, it works okay. The second time, if the node is launched on the same server machine, it causes the error above.

After fiddling with it some more, I found that restarting the roscore on my local machine between each qsub alleviates this problem. Is it possible that having nodes connecting to the same rosmaster over time is unstable?

I did run a roswtf and got the following error:

ERROR The following nodes should be connected but aren't: * /custom_node_name->/rosout (/rosout)

The node was able to read the parameters so the ROS_MASTER_URI is configured correctly I believe.