Reasons for slow response to actions, services, messages?

asked 2013-09-07 20:52:10 -0500

2062 ●24 ●47 ●55

updated 2013-09-09 22:10:40 -0500

Hi guys!

I'm trying to debug a tool, which responds to service calls and action goals with a lag of 10 to 20 seconds depending on the load of the machine (50% - 90%). I determined this time lag using the debug output, which shows when the action server receives the action goal.

What could be the reasons for this significant time lag?

In my use case - pick & place using MoveIt - the node running the action servers and service providers is causing most of the load. So my first guess is, that something is blocking the respective callbacks.

Is this just a matter of not enough horse power? Which parts of the code could cause this blocking/slowing down?

Interesting is that actions, services, messages of nodes running in parallel are processed fine, i.e. there is only a small time difference between sending and receiving action goals, responding to services, receiving messages.

Thanks for your help!

/edit:

Additional info using the tools recommended in @Adolfo Rodriguez T 's answer:

top
- with point cloud processing: ~40% idle, load average: ~2.5 (quad-core CPU: i5-2500 CPU @ 3.30GHz), move_group process at ~130%
- w/o pcp: ~90% idle, load average: ~0.5
iftop
- with pc: ~2Gb/s
- w/o pcp: ~5Mb/s
sockets
- When idling few (~5) sockets show up every few seconds (5 - 10). For each motion plan action a few more sockets pop up.

edit retag flag offensive close merge delete

Comments

As I mentioned above, the load is usually around 50 ~ 70% (desktop) / 50% ~ 90% (robot) (using top). iftop is an interesting tool! Shows me that the local traffic (lo interface) goes up to 2Gb/s, when starting to process point clouds.

bit-pirate ( 2013-09-08 22:12:52 -0500 )edit

Which process is generating the load? Is it the roscore or one / multiple of the started nodes?

Dirk Thomas ( 2013-09-09 11:34:48 -0500 )edit

The rosmaster load is neglectable. The main load comes from MoveIt's move_group node (more details added in the question). And it's only that node's topics, services and actions, which are processed extremely slow. The other nodes run fine, what is probably because the CPU is not fully used.

bit-pirate ( 2013-09-09 16:33:42 -0500 )edit

add a comment

6

answered 2013-09-08 21:57:56 -0500

Adolfo Rodriguez T

3907 ●28 ●45 ●71

updated 2013-09-09 21:41:10 -0500

More than an answer, this may serve as a first diagnostics step.

How loaded is your system when this happens?. I recommend a first sweep with the following tools:

top to check idle CPU resources and load average.
iftop (may require sudo) to query the traffic on your network interfaces, eg. sudo iftop -i lo for loopback only.
It might also be good to check the number of sockets with a given status, eg. netstat | grep ESTABLISHED | wc -l. ESTABLISHED, CONNECTED correspond to sockets currently in use, while TIME_WAIT, CLOSE_WAIT are pending to close. Pay special attention to the latter, as large counts here can indicate lots of short-lived sockets, which usually occur in ROS environments when you frequently query the master (non-persistent service calls or parameter reads). Many socket opening/closing operations will increase your system CPU load (shown in top under Cpu(s) .... sy).

Edit: From the updated question details.

Could you post for completeness the CPU load and network traffic values with pointcloud perception disabled?.

It seems that the pointcloud messages are taking up a lot of bandwidth, and (de)serializing + processing them (coordinate system change, self-filtering, object detection, etc.) is in turn consuming significant CPU resourecs (maxing out a core, leaving no room to the scheduler to process all incoming messages).

What kind of pointcloud input are you feeding move_group?. If it's the raw input from a Kinect-like RGBD sensor, that might indeed prove prohibitive. Preprocessing the pointcloud might help. These are some indicative numbers I took some time ago:

Original cloud contains 200k-300k valid points.
Crop to a bounding volume of interest (~1 order of magnitude less points)
Downsample with octomap (additional ~2 orders of magnitude reduction)

Finally, if you need point clouds at discrete time instances (as opposed to a continuous stream), gate pointcloud traffic through an on-demand snapshotter.

edit flag offensive delete link

Comments

Thanks for listing these helpful tools. I added the result of them to my question. To me CPU load looks OK as well as the sockets. The network traffic looks high, but then I have no idea what is "high" and "low" regarding the traffic on the lo interface. Do you have any experience with it?

bit-pirate ( 2013-09-09 19:35:31 -0500 )edit

Extra info added. Your suggestions about preprocessing the point cloud is a good idea. I wonder however, if there are other ways to improve this situation. There is still 10% of the computation power unused. Also, there are multiple cores available. Can't that be used?

bit-pirate ( 2013-09-09 22:22:34 -0500 )edit

add a comment

1

answered 2013-09-09 22:26:59 -0500

bit-pirate

2062 ●24 ●47 ●55

updated 2016-02-24 20:08:19 -0500