ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

Publisher grouping messages, with no clear reason why.

asked 2018-05-01 08:30:48 -0500

edixon gravatar image

Hi,

I'm experiencing an odd issue and I'm completely at a loss as to what could be causing it, hopefully someone here has some pointers for me. In short, the issue is that a publisher is grouping messages into bunches of 4 whilst a publisher in the same function is not.

I have a node that converts data from a GPS system into several messages that are published on different topics. The publishers for each topic are within the same function and are each publishing slightly different forms of the same data from the GPS packet. Basically, a packet comes in from the GPS every 0.01s, is passed to a function to handle it within which I do a bit of moving data around the structures and some small calculations then publish the data from 6 publishers that are called one after another in the code. What I would expect of this is that each topic hits the ROS bus at more or less the same moment and at the same steady rate of 100Hz with a constant gap of approximately 10ms between each. I see this for four of the topics, but not for the other 2. These other 2 bunch up their messages so that we get four messages published in around 0.1ms, then a break of about 40ms, then another four in 0.1ms. This is completely not expected to me as all of the publishers are called at basically the same point in code and are using the same data so should be roughly identical in behaviour I would think.

I can't find anything in my code that could be causing this, so does anyone know of an error I could have made or a behaviour of ROS that I should be made aware of?

Many thanks, Eliot.

edit retag flag offensive close merge delete

Comments

1

I'm going to make a guess and say that this sounds like Nagle. Are there any significant differences in msg size between the publishers that do and those that don't show this 'behaviour'?

Note also that depending on the rate at which a subscriber processes events, you may see this sort of thing.

gvdhoorn gravatar image gvdhoorn  ( 2018-05-01 09:58:03 -0500 )edit

That sounds like a good shout, does this page describe the issue correctly? Very little difference between the messages in size, I believe that one of the ones that shows errors is only around 8 bytes larger than one that does not.

edixon gravatar image edixon  ( 2018-05-01 10:41:51 -0500 )edit

You don't need to recompile (just yet) to disable Nagle. Just try the TransportHints first.

And this is just a guess. Probably not a correct one.

gvdhoorn gravatar image gvdhoorn  ( 2018-05-01 15:45:11 -0500 )edit

I tried this out and it was indeed fixed by enabling tcpNoDelay, thank you very much for the suggestion. There is still some slight jitter present in the average delay between messages so I will need to investigate further, but the issue is essentially solved as far as I need it to be. Thanks!

edixon gravatar image edixon  ( 2018-05-08 04:09:47 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2018-05-03 06:11:47 -0500

As already commented by @gvdhoorn, this sounds a lot like Nagle's algorithm at work. It's sometimes a bit mysterious as to when these delays happen exactly, as also documented in Q/A 220129, where >= 520 Bytes of message size were discovered to exhibit grouping, while smaller ones did not. Still very much interested in a explanation myself :)

edit flag offensive delete link more

Comments

@edixon could see whether requesting 'unreliable' publishers (ie: UDP) changes anything. Messages are probably small enough for that. If it does, then it would be one more indication that something in the TCP stack is getting in the way here.

gvdhoorn gravatar image gvdhoorn  ( 2018-05-03 06:24:28 -0500 )edit

Question Tools

3 followers

Stats

Asked: 2018-05-01 08:30:48 -0500

Seen: 141 times

Last updated: May 03 '18