Is a single process ROS2 system always reliable?

asked 2019-02-21 10:47:37 -0500

alsora
1322 ●45 ●83 ●76 https://github.com/alsora

updated 2019-02-21 10:47:58 -0500

Hi,

I have a single process application which contains several nodes, publishers and subscribers.

Assuming that the subscribers spin as fast as possible, while the publishers publish at 100Hz.

Are there any differences, from the reliability point of view, between setting the Quality of Service options to Reliable or Best Effort?

Does enabling Intraprocess Communication has any influence on this (regardless if I'm publishing shared pointers or not)?

Thank you

edit retag flag offensive close merge delete

add a comment

answered 2019-02-21 18:31:10 -0500

Geoff

4203 ●18 ●103 ●63

Even if your entire application is contained within a single process, then depending on the DDS implementation you use it may not be guaranteed that all data will be delivered.

If you use FastRTPS (the default), then even when nodes are in the same process data transfers still go through the operating system's network stack. This is because:

By default, a UDPv4 transport endpoint is created for each publisher/subscriber.
FastRTPS does not yet have a shared memory transport. It's on their roadmap and they may even be getting close, though I have no further information than that.

As a consequence of going through the operating system's network stack, even though the loopback interface will probably be used, operating system buffers get used. If your publishers are publishing data than the subscribers can read it, those buffers may overflow and data may be lost. FastRTPS's internal buffers may also cause data loss if they overflow.

If you use Connext DDS, then it does have a shared memory transport option. This is used automatically by Connext DDS if it is enabled (which it is by default) and if it is usable. The usability is defined as "on the same computing node", so it should certainly be used if all your nodes are in the same process. In this case operating system network buffers are not involved so only the DDS implementation's own buffers are relevant. There is still a risk of data loss if those buffers overflow.

In both cases, you can reduce the risk of data loss even when not setting reliability by setting the history QoS to keep all samples and by making sure your buffers are large enough for your application.

However, if you want to be really sure that data loss won't happen, then set the QoS to Reliable (and make sure your DDS buffers are big enough to handle spikes). This is good practice any way because it makes it clear that you want all data. If someone later comes along and changes the application behaviour such that it publishes faster, or changes its structure to go across a network, then you can still expect that all data will be received. Remember, that someone might be you in two years wondering why you built things the way you did!

I'm not sure how complete the intraprocess communication implementation is for things like buffering, but currently the documentation states that publish/subscribe when using intraprocess communication is best effort, which implies that there may be data loss if you don't run the nodes in lock-step.

edit flag offensive delete link

add a comment

Is a single process ROS2 system always reliable?

1 Answer

Question Tools

Stats

Related questions

Is a single process ROS2 system always reliable? edit

1 Answer

Question Tools

Stats

Related questions

Is a single process ROS2 system always reliable?