Publisher grouping messages, with no clear reason why.
Hi,
I'm experiencing an odd issue and I'm completely at a loss as to what could be causing it, hopefully someone here has some pointers for me. In short, the issue is that a publisher is grouping messages into bunches of 4 whilst a publisher in the same function is not.
I have a node that converts data from a GPS system into several messages that are published on different topics. The publishers for each topic are within the same function and are each publishing slightly different forms of the same data from the GPS packet. Basically, a packet comes in from the GPS every 0.01s, is passed to a function to handle it within which I do a bit of moving data around the structures and some small calculations then publish the data from 6 publishers that are called one after another in the code. What I would expect of this is that each topic hits the ROS bus at more or less the same moment and at the same steady rate of 100Hz with a constant gap of approximately 10ms between each. I see this for four of the topics, but not for the other 2. These other 2 bunch up their messages so that we get four messages published in around 0.1ms, then a break of about 40ms, then another four in 0.1ms. This is completely not expected to me as all of the publishers are called at basically the same point in code and are using the same data so should be roughly identical in behaviour I would think.
I can't find anything in my code that could be causing this, so does anyone know of an error I could have made or a behaviour of ROS that I should be made aware of?
Many thanks, Eliot.
I'm going to make a guess and say that this sounds like Nagle. Are there any significant differences in msg size between the publishers that do and those that don't show this 'behaviour'?
Note also that depending on the rate at which a subscriber processes events, you may see this sort of thing.
That sounds like a good shout, does this page describe the issue correctly? Very little difference between the messages in size, I believe that one of the ones that shows errors is only around 8 bytes larger than one that does not.
You don't need to recompile (just yet) to disable Nagle. Just try the TransportHints first.
And this is just a guess. Probably not a correct one.
I tried this out and it was indeed fixed by enabling tcpNoDelay, thank you very much for the suggestion. There is still some slight jitter present in the average delay between messages so I will need to investigate further, but the issue is essentially solved as far as I need it to be. Thanks!