Latency when sending goal to move_base and it starting

asked 2014-04-03 06:51:42 -0500

Tom Panzarella
51 ●1 ●4 ●8

updated 2014-04-03 08:52:51 -0500

Hi All,

I'm currently working with the ROS navigation stack. I am using move_base through its actionlib interface. The problem I am experiencing is that I seem to consistently see ~7 seconds of latency between sending the goal through the move_base action client and when the goal begins to execute. Once the server starts executing the goal, the speed is just fine and updates from the action server to client show no significant latency.

I have checked out both the navigation stack and actionlib from github so I could instrument the code to chase down the source of this latency. Specifically, what I have found is that there is a consistent ~7 seconds of latency between the time sendGoal (in simple_action_client.h) returns to when goalCallback (in action_server_base.h) begins to handle the "goal request". I have modified initialize() (in action_server_imp.h) to set TCP_NODELAY on the subscriber to the "goal" topic. One thought was that this message was getting lost in the ROS transport plumbing. Again, making this change seems to have no noticeable effect on this latency issue.

I have a couple of questions:

1) Is this normal / expected behavior? 2) Where should I look next?

If more context is needed, I'm happy to provide. However, we are not doing anything "exotic" in how we are using the advertised functionality. Any help is greatly appreciated.

Best, Tom

EDIT1: I should also note that I have tried instantiating the action client both with and without the extra thread to enable/disable using the global callback queue. Again, no difference.

EDIT2: I'm getting somewhere. My hypothesis is that, in my setup, move_base is subscribed to too many "high frequency / large data" topics. Specifically, 4 sensors sources publishing QQVGA point clouds at 30 Hz. My assumption is that the /move_base/goal topic is sharing the same message queue as these sensor sources (??). Empirically, when I remove two of the input sensor sources, the above described latency drops from ~7 seconds to ~3 seconds. What tipped me off to this was that in a separate shell, I can rostopic echo /move_base/goal and I see the goal published from the action client immediately whereas in the move_base process, I saw the ~7 seconds of latency.

I plan to turn my attention to tuning my sensor sources to get the responsiveness from move_base that I need while not sacrificing fidelity and our perception subsystem. In the meantime, is there a mechanism (I am not aware of one) within ROS that would allow me to partition out the action server's message queue from the sensor topics coming into move_base?

Thanks in advance for any pointers.

edit retag flag offensive close merge delete

Comments

Interesting question. I'm not sure where to start digging further. Can you reproduce this with all nodes being on a single machine (no networking other than loopback)?

demmeln ( 2014-04-03 07:13:47 -0500 )edit

Yes. For the nodes in question, they are located on the same machine.

Tom Panzarella ( 2014-04-03 07:26:10 -0500 )edit

You can change to multithreaded spinning or even provide separate calllback queues for different subscribers, which might help. However you likely have to modify move_base yourself for that. Check http://wiki.ros.org/roscpp/Overview/Callbacks%20and%20Spinning .

demmeln ( 2014-04-04 00:46:33 -0500 )edit

Also, maybe throttling the topics that seem to clog up move_base might help. I.e. only reduce the frequency / amount of data the move_base receives, while other parts of your system receive the full sensor data.

demmeln ( 2014-04-04 00:47:58 -0500 )edit

Lastly, a faster machine might also help dramatically.

demmeln ( 2014-04-04 00:48:13 -0500 )edit

1

@demmeln Thanks for the comments. A combination of using a multi-threaded spinner and (down) throttling my sensors is looking promising right now. Going to do more testing before confirming a definitive resolution.

Tom Panzarella ( 2014-04-04 04:29:31 -0500 )edit

add a comment

3

answered 2014-04-03 09:45:21 -0500

Tom Panzarella
51 ●1 ●4 ●8

updated 2014-04-04 07:27:16 -0500

Please see "EDIT2" listed in my original question. The latency was indeed due to the amount of sensor data being subscribed to by move_base which (I can only presume) shares the same message queue with the /move_base/goal topic. Keeping all four sensor sources in place but dropping the rate from 30 Hz to 15 Hz provides acceptable goal responsiveness from move_base.

EDIT1: Upon further testing, it looks like the latency is due IN PART to the amount of sensor data. However, other factors are involved (e.g., TCP_NODELAY on the /move_base/goal topic, adjusting the queue size, etc.). Once I have this worked out definitively, I'll report back. Sorry for the premature resolution. More testing is required.

EDIT2: I believe I have this issue resolved. On the advice of @demmeln a combination of throttling my sensor sources and using a multi-threaded spinner in move_base has provided acceptable latency for the system. The specifics for my setup that work well are to scale the frequency of my 4 sensor sources down from 30 Hz to 15 Hz and to use a multi-threaded spinner within move_base with 2 threads. I have submitted a pull-request for the move_base changes. Those changes are minimal and available here: https://github.com/ros-planning/navig...

I should note that all changes that I had originally made to actionlib (i.e., TCP_NODELAY and changing the queue_size) were reverted during my testing today. The changes that worked for me were limited to what I describe in the preceding paragraph.

Thanks for the help / input.

edit flag offensive delete link

add a comment

0

answered 2014-04-03 07:02:40 -0500

ahendrix

47576 ●179 ●367 ●662 http://namniart.com/

updated 2014-04-03 11:20:00 -0500

This could be either:

The time it takes for the global planner to generate a plan, before the robot starts moving, or
Something wrong with the action server/client or the transport layer

The easiest way to narrow this down is to try to reproduce it with a simpler action server and client.

You may be able to use the threaded callback spinners in ROS to handle the callbacks from different sensor sources in different threads.

edit flag offensive delete link

Comments

I'm certain it is not the global planner, the latency is occurring between the time when the client sends the goal out and when the server receives the goal in its goal callback -- this all has to happen before the global planner gets called upon to generate the plan.

Tom Panzarella ( 2014-04-03 07:09:47 -0500 )edit

Ok; can you reproduce this with a simpler action client and server? Perhaps using some of the actionlib tutorial nodes?

ahendrix ( 2014-04-03 07:15:21 -0500 )edit

Note sure. Is there something simpler than SimpleActionClient that move_base will respond to?

Tom Panzarella ( 2014-04-03 07:22:57 -0500 )edit

I would try using an actionlib server that isn't move_base; perhaps the fibonacci server and client described in this tutorial? http://wiki.ros.org/actionlib_tutorials/Tutorials/RunningServerAndClient

ahendrix ( 2014-04-03 07:33:06 -0500 )edit

add a comment

Latency when sending goal to move_base and it starting

Comments

2 Answers

Comments

Question Tools

Stats

Related questions

Latency when sending goal to move_base and it starting edit

Comments

2 Answers

Comments

Question Tools

Stats

Related questions

Latency when sending goal to move_base and it starting