Message Size-dependent Queueing/Latency

asked 2018-04-02 12:58:29 -0600

shnomin
3 ●1 ●2 ●2

ROS Indigo (and/or Lunar), Ubuntu 14.04 / Docker (ros-indigo-base, ros-lunar-base)

I've noticed some delays up to 1/(publish_frequency) for medium sized messages. I've reproduced the behavior just using rostopic below:

rostopic pub -r 50 /test std_msgs/String "data: '`printf "%0.s-" {1..100}`'"
rostopic hz -w 50 /test

average rate: 49.983
    min: 0.018s max: 0.022s std dev: 0.00043s window: 50

Pretty normal, what you might expect. Now how about 1000 bytes of hyphens:

rostopic pub -r 50 /test std_msgs/String "data: '`printf "%0.s-" {1..1000}`'"

average rate: 50.033
    min: 0.019s max: 0.020s std dev: 0.00018s window: 50
average rate: 49.994
    min: 0.019s max: 0.021s std dev: 0.00019s window: 50
average rate: 49.990
    min: 0.019s max: 0.021s std dev: 0.00021s window: 50
average rate: 49.973
    min: 0.019s max: 0.021s std dev: 0.00021s window: 50
average rate: 49.027
    min: 0.000s max: 0.040s std dev: 0.01368s window: 50
average rate: 51.041
    min: 0.000s max: 0.041s std dev: 0.01996s window: 50
average rate: 51.035
    min: 0.000s max: 0.044s std dev: 0.01963s window: 50

Notice that after about 3 seconds, we see a min of 0 seconds and a max of .04 seconds.

Now for a weirdness, this doesn't seem to happen with big messages. 100,000 bytes of hyphens:

rostopic pub -r 50 /test std_msgs/String "data: '`printf "%0.s-" {1..100000}`'"

average rate: 50.037
    min: 0.019s max: 0.020s std dev: 0.00020s window: 50
average rate: 49.994
    min: 0.019s max: 0.021s std dev: 0.00027s window: 50
average rate: 49.979
    min: 0.019s max: 0.021s std dev: 0.00020s window: 50
average rate: 50.020
    min: 0.018s max: 0.022s std dev: 0.00042s window: 50
average rate: 49.995
    min: 0.019s max: 0.021s std dev: 0.00020s window: 50
average rate: 50.007
    min: 0.019s max: 0.021s std dev: 0.00017s window: 50

So, no problem for large messages! At 50 Hz, the size of message that seems to do this is ~600 bytes - 65,000 bytes. Unfortunately for my use case, I can't just make all my messages bigger (or smaller). Introspecting in the specific use case, it seems like queueing in that there will be a miss and then 2 messages the next expected cycle. However, after it starts, this continues to happen with regularity, unless the messages are bigger or smaller. Any idea why this is happening?

I've reproduced in roscpp and rospy, with and without tcdnodelay, zero queues, and sychonous publishing (queu_size=None). It always exhibits this same behavior. I've also tried this in the indigo-ros-core docker container as well as the lunar-ros-core docker container. Lastly, I've also adjusted process priority, which also has no effect. Obviously, this is all on one machine as ... (more)

edit retag flag offensive close merge delete

Comments

More info: I've found the EXACT byte range that will cause this error (at least with string messages) Strings lengths between 516 bytes and 65,474 bytes will cause this issue. Lower or higher works! I think this is some hard-coded buffer issue internal to ros_comm.

shnomin ( 2018-04-02 14:08:27 -0600 )edit

Also, the problem only exhibits for rates above 25 Hz.

shnomin ( 2018-04-02 14:09:08 -0600 )edit

To be clear, this does not seem to be a throughput issue, as the same byte range has caused the same issue for publish rates of 50, 100, and 200 Hz.

shnomin ( 2018-04-02 14:15:27 -0600 )edit

If you have a self-contained MWE then this might be something to report over at ros/ros_comm/issues.

gvdhoorn ( 2018-04-05 14:21:55 -0600 )edit

Note btw that the ROS middleware gives no guarantees, nor does it make any claims, about message throughput, max/min rates or other types of QoS. I would imagine you'd be in a better place with ROS 2 for those things.

Nevertheless, this is strange and would be good to investigate.

gvdhoorn ( 2018-04-05 14:23:20 -0600 )edit

Posted this to ros_comm issues as per gvdhoorn's suggestion:

https://github.com/ros/ros_comm/issue...

shnomin ( 2018-04-10 07:25:03 -0600 )edit

add a comment

answered 2018-04-10 08:31:04 -0600

eddy.scott
36 ●1 ●3 ●3

I've answered this on your ros_comm issue but for anyone who gets here later this behavior is a result of ROS using TCP for interprocess communication by default. Specifically the behavior above is the result of Nagle's algorithm.

To eliminate the latency one can supply ros::TransportHints to the subscriber on creation. Specifically specifying ros::TransportHints::tcpNoDelay will turn off Nagle's algorithm.

For a more in-depth explanation of TCP_NODELAY and its effect on the network stack I would encourage reading through this blog post

edit flag offensive delete link

Comments

The OP mentioned disabling Nagle already:

with and without tcdnodelay

it could still be Nagle (in one direction), but that is something the OP would need to verify.

gvdhoorn ( 2018-04-10 08:36:07 -0600 )edit

@shnomin: don't forget to mark your question as answered here as well.

gvdhoorn ( 2018-04-11 01:22:13 -0600 )edit

Just for future visitors, this must be enabled by the subscriber. Setting the transport hint on the publisher does not work, which is a bit confusing.

shnomin ( 2018-04-11 10:10:39 -0600 )edit

add a comment

Message Size-dependent Queueing/Latency

Comments

1 Answer

Comments

Question Tools

Stats

Related questions

Message Size-dependent Queueing/Latency edit

Comments

1 Answer

Comments

Question Tools

Stats

Related questions

Message Size-dependent Queueing/Latency