Optionally disable ros publisher buffering

asked 2017-06-20 05:07:15 -0500

bobismijnnaam gravatar image

updated 2017-06-20 05:48:36 -0500

Hi everyone,

We are using ROS as a basis for our robotics/AI project. It's been great so far with all the tools and code it provides, however today I ran into a small (big?) problem.

A small conceptual explanation of how our system is designed and how it works. We have some input on a socket port which is basically visual data. We have a ros node that reads this port out, does some processing, and publishes the data ready for consumption on a topic. This happens at 60Hz. We have another node that consumes this visual data and sends instructions out a serial port, the control node. This also happens at 60Hz. So far so good; this is standard ROS stuff and besides some minor caveats we've had no problems using this model.

Now here comes my problem. Some buffering takes place in the visual node when publishing the processed data. So at 60Hz we receive visual data from the socket, and at 60Hz we call publisher.publish(processedVisualData);. What sometimes happens is that new data from the socket arrives before the previously published data is actually sent to the control node. I know this for sure because sometimes this happens in the control node (function calls derived from print statements throughout my program):

  • "Start spinning..."
  • ros::spinOnce()
  • visualUpdateCallback();
  • "Ros spinning complete"
  • runControl()
  • "Control finished."
  • "Start spinning..."
  • ros::spinOnce()
  • "Ros spinning complete"
  • runControl()
  • "Control finished."

So what happens here is because of the buffering in the visual node (or in the TCP of the subscriber? I've been looking at the ROS source code a lot lately and I've seen buffers in literally every class I look at) the control node sometimes runs without a new visual update, even though that visual update was given to the publisher in the visual node. So even though both nodes run at 60Hz (I've confirmed this with rostopic hz; the throughput of visual messages and control messages is almost 60Hz), the control nodes effectively runs at 30Hz, because every odd cycle it has to reuse "stale" data.

There are a few possible solutions I've found:

  • Set the queue of the publisher in the visual node to one. That ensures that everytime a message is being sent it's guaranteed to be the most up-to-date. The buffering delay is somehow still in place however, so control still runs at 30Hz. (I tested this by only running the control function when new data arrived. It's output was 30 Hz even though the visual update frequency was 60Hz.)
  • Produce more messages. I guess this can work, but there are 2 limitations
    • The visual input really runs strictly at 60Hz. I can't make it produce more data
    • Buffering is still in place. So even if I could crank the visual data up to 100Hz, then my control node will still run at a percentage of that. That's bad because it makes performance unpredictable: for some topics, if ...
(more)
edit retag flag offensive close merge delete

Comments

Change line 215 on [link] from true to false

I'm confused here: according to this and the page you link, immediate_write is already false.

gvdhoorn gravatar image gvdhoorn  ( 2017-06-20 05:13:41 -0500 )edit

Whoops, that's on me. Of course I meant to write true. Changing it now.

bobismijnnaam gravatar image bobismijnnaam  ( 2017-06-20 05:14:41 -0500 )edit

And a question: have you looked into the transport hints? Especially tcpNoDelay?

gvdhoorn gravatar image gvdhoorn  ( 2017-06-20 05:15:18 -0500 )edit

I assumed tcpNoDelay is true always and would only be false if you want to test your application under dire circumstances. Is this not the case?

bobismijnnaam gravatar image bobismijnnaam  ( 2017-06-20 05:16:47 -0500 )edit

As can be seen on http://docs.ros.org/api/roscpp/html/classros_1_1TransportHints.html#a03191a9987162fca0ae2c81fa79fcde9, I suppose I'm right? Please correct me if I'm wrong.

bobismijnnaam gravatar image bobismijnnaam  ( 2017-06-20 05:19:09 -0500 )edit

And a high-level comment: I think what you are seeing is the classic event-based vs polling (or periodic) system clash. ie: on the one hand you have an async comms pattern (pub-sub) and on the other a sync, periodic 'control system'. Marrying those is always a bit of a challenge.

gvdhoorn gravatar image gvdhoorn  ( 2017-06-20 05:20:34 -0500 )edit

I assumed tcpNoDelay is true always and would only be false if you want to test your application under dire circumstances. Is this not the case?

Afaik, tcpNoDelay is not enabled by default. That is also not what the code shows. The function arg has a true default, but that is something else.

gvdhoorn gravatar image gvdhoorn  ( 2017-06-20 05:22:31 -0500 )edit

About the high-level comment: I agree. With the information I have now after almost a year of development I'm not sure if I would pick ROS's publisher/subscriber system again. At the very least I would go with nodelets (at least for the critical stuff. For debugging infrastructure pub/sub is great).

bobismijnnaam gravatar image bobismijnnaam  ( 2017-06-20 05:24:45 -0500 )edit