Ask Your Question
0

Message Size-dependent Queueing/Latency

asked 2018-04-02 12:58:29 -0500

shnomin gravatar image

ROS Indigo (and/or Lunar), Ubuntu 14.04 / Docker (ros-indigo-base, ros-lunar-base)

I've noticed some delays up to 1/(publish_frequency) for medium sized messages. I've reproduced the behavior just using rostopic below:

rostopic pub -r 50 /test std_msgs/String "data: '`printf "%0.s-" {1..100}`'"
rostopic hz -w 50 /test

average rate: 49.983
    min: 0.018s max: 0.022s std dev: 0.00043s window: 50

Pretty normal, what you might expect. Now how about 1000 bytes of hyphens:

rostopic pub -r 50 /test std_msgs/String "data: '`printf "%0.s-" {1..1000}`'"

average rate: 50.033
    min: 0.019s max: 0.020s std dev: 0.00018s window: 50
average rate: 49.994
    min: 0.019s max: 0.021s std dev: 0.00019s window: 50
average rate: 49.990
    min: 0.019s max: 0.021s std dev: 0.00021s window: 50
average rate: 49.973
    min: 0.019s max: 0.021s std dev: 0.00021s window: 50
average rate: 49.027
    min: 0.000s max: 0.040s std dev: 0.01368s window: 50
average rate: 51.041
    min: 0.000s max: 0.041s std dev: 0.01996s window: 50
average rate: 51.035
    min: 0.000s max: 0.044s std dev: 0.01963s window: 50

Notice that after about 3 seconds, we see a min of 0 seconds and a max of .04 seconds.

Now for a weirdness, this doesn't seem to happen with big messages. 100,000 bytes of hyphens:

rostopic pub -r 50 /test std_msgs/String "data: '`printf "%0.s-" {1..100000}`'"

average rate: 50.037
    min: 0.019s max: 0.020s std dev: 0.00020s window: 50
average rate: 49.994
    min: 0.019s max: 0.021s std dev: 0.00027s window: 50
average rate: 49.979
    min: 0.019s max: 0.021s std dev: 0.00020s window: 50
average rate: 50.020
    min: 0.018s max: 0.022s std dev: 0.00042s window: 50
average rate: 49.995
    min: 0.019s max: 0.021s std dev: 0.00020s window: 50
average rate: 50.007
    min: 0.019s max: 0.021s std dev: 0.00017s window: 50

So, no problem for large messages! At 50 Hz, the size of message that seems to do this is ~600 bytes - 65,000 bytes. Unfortunately for my use case, I can't just make all my messages bigger (or smaller). Introspecting in the specific use case, it seems like queueing in that there will be a miss and then 2 messages the next expected cycle. However, after it starts, this continues to happen with regularity, unless the messages are bigger or smaller. Any idea why this is happening?

I've reproduced in roscpp and rospy, with and without tcdnodelay, zero queues, and sychonous publishing (queu_size=None). It always exhibits this same behavior. I've also tried this in the indigo-ros-core docker container as well as the lunar-ros-core docker container. Lastly, I've also adjusted process priority, which also has no effect. Obviously, this is all on one machine as ... (more)

edit retag flag offensive close merge delete

Comments

More info: I've found the EXACT byte range that will cause this error (at least with string messages) Strings lengths between 516 bytes and 65,474 bytes will cause this issue. Lower or higher works! I think this is some hard-coded buffer issue internal to ros_comm.

shnomin gravatar image shnomin  ( 2018-04-02 14:08:27 -0500 )edit

Also, the problem only exhibits for rates above 25 Hz.

shnomin gravatar image shnomin  ( 2018-04-02 14:09:08 -0500 )edit

To be clear, this does not seem to be a throughput issue, as the same byte range has caused the same issue for publish rates of 50, 100, and 200 Hz.

shnomin gravatar image shnomin  ( 2018-04-02 14:15:27 -0500 )edit

If you have a self-contained MWE then this might be something to report over at ros/ros_comm/issues.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 14:21:55 -0500 )edit

Note btw that the ROS middleware gives no guarantees, nor does it make any claims, about message throughput, max/min rates or other types of QoS. I would imagine you'd be in a better place with ROS 2 for those things.

Nevertheless, this is strange and would be good to investigate.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 14:23:20 -0500 )edit

Posted this to ros_comm issues as per gvdhoorn's suggestion:

https://github.com/ros/ros_comm/issue...

shnomin gravatar image shnomin  ( 2018-04-10 07:25:03 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2018-04-10 08:31:04 -0500

eddy.scott gravatar image

I've answered this on your ros_comm issue but for anyone who gets here later this behavior is a result of ROS using TCP for interprocess communication by default. Specifically the behavior above is the result of Nagle's algorithm.

To eliminate the latency one can supply ros::TransportHints to the subscriber on creation. Specifically specifying ros::TransportHints::tcpNoDelay will turn off Nagle's algorithm.

For a more in-depth explanation of TCP_NODELAY and its effect on the network stack I would encourage reading through this blog post

edit flag offensive delete link more

Comments

The OP mentioned disabling Nagle already:

with and without tcdnodelay

it could still be Nagle (in one direction), but that is something the OP would need to verify.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-10 08:36:07 -0500 )edit

@shnomin: don't forget to mark your question as answered here as well.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-11 01:22:13 -0500 )edit
1

Just for future visitors, this must be enabled by the subscriber. Setting the transport hint on the publisher does not work, which is a bit confusing.

shnomin gravatar image shnomin  ( 2018-04-11 10:10:39 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2018-04-02 12:58:29 -0500

Seen: 588 times

Last updated: Apr 10 '18