ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

how does ros2 implement its network design?

asked 2019-03-22 07:10:02 -0500

huchaohong gravatar image

updated 2019-03-24 07:36:54 -0500

lucasw gravatar image

In Why ROS 2.0, there is design goal of ros2 network:

we want ROS to behave as well as is possible when network connectivity degrades due to loss and/or delay, from poor-quality WiFi to ground-to-space communication links.

I am curious about how ros achieves this goal? If the network is poor, what will ros2 do, discard messages?

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted

answered 2019-03-22 12:56:32 -0500

William gravatar image

I am curious about how ros achieves this goal?

The issue with lossy networks and ROS 1 was that it used TCP almost exclusively, and if you lost data, TCP would try to resend it, which would further stress the network and you could end up saturating the network and not even keeping up at all. Especially since the common use case for this was streaming some sensor data over wifi to a workstation to visualize it in rviz, in which case you don't care if you miss a few messages. ROS 1 does have a UDP transport, but it had several issues, for example being unreliable for large data and not being supported uniformly (python never supported it).

DDS has unreliable and reliable communication and graceful degradation, i.e. a reliable publisher can send data to an unreliable subscriber (but not the other way around). But more importantly, DDS's reliable communication happens over UDP with a custom protocol on top (DDSI-RTPS), which has the advantage over TCP that you can control things like how long it will retry to send data, how long it will wait for a NAK, how it will buffer data before sending (like Nagle's algorithm), etc...

Basically, the idea is that DDS's configuration options allow it to be many things between TCP and simple UDP, including a more flexible version of TCP, which in turn allows you to fine tune your communication settings to better work on lossy networks.

This comes at the cost of complexity and some performance (TCP on the local host is _really_ good), but should allow knowledgeable users to get good results in more situations.

If the network is poor, what will ros2 do, discard messages?

To answer this more directly, I'll cop-out and say "it depends". If you're using unreliable the messages will be discarded. If you're using reliable then just like ROS 1 and TCP it will try to send them until your system resource limits are reached, at which point it will discard them. The only difference, as I mentioned above, is that with DDS you can know when they are discarded and have more control over when they will be discarded and how it will try to resend them.

Hope that answers your questions somewhat.

edit flag offensive delete link more


Thanks for you detailed explanation, i can understand the design more clearly now.

huchaohong gravatar image huchaohong  ( 2019-03-22 19:13:51 -0500 )edit

Is there anything in the works that could make future ROS2 versions achieve that really good ROS1 TCP performance level for inter-process localhost communications?

lucasw gravatar image lucasw  ( 2019-03-24 08:24:21 -0500 )edit

You can use the intraprocess transport, which is zero-copy for nodes in the same process, or you could direct your DDS implementation to use TCP if it has support for it (Connext DDS does, for example). Depending on the implementation you may still have marshalling (again, Connext does), but it should use the loopback interface. If you must have no marshalling and local-only maximum performance, then intraprocess is the way to go.

Geoff gravatar image Geoff  ( 2019-03-25 18:41:48 -0500 )edit

It would be nice if TCP could be selected on a topic by topic basis (not clear from looking at - is a node a 'domain participant'? Or a topic?). Or even better use tcp locally by default and udp when needed to communicate with other systems even on the same topic.

I suppose if the tools (especially python ones) for intra-process become nearly seamless it's less of an issue. Making the command line, rviz, and gui tools work intra process seems challenging- each one would spawn nodes inside of every local process they are trying to interact with to get at the data without having to use the udp dds?

lucasw gravatar image lucasw  ( 2019-03-26 12:20:05 -0500 )edit

Yes, there are "locators" for TCP in addition to UDP or UDP multicast or even shared memory, it just depends on the implementation if it supports it. I think Fast-RTPS also has a TCP option https://eprosima-fast-rtps.readthedoc... . However, it's not as awesome as you'd like because it still has to do the RTPS framing which is redundant with a lot of what is in the TCP headers. Depending on the implementation and the size of the messages, it might be much better or only marginally better. More work needs to be done here to see what the benefit might be.

William gravatar image William  ( 2019-03-28 01:24:28 -0500 )edit

answered 2019-03-22 07:33:47 -0500

gvdhoorn gravatar image

updated 2019-03-22 07:36:59 -0500

I guess it's not really a (complete) answer, but: ROS 2 uses DDS (RTPS really though) as its default middleware.

So essentially the answer to your question would be: whatever DDS does to deal with lossy links / poor network performance.

Edit: related article on the ROS 2 design site: ROS on DDS (note the admonition at the top of the article).

edit flag offensive delete link more


I know ROS2 uses DDS but i am not familiar with DDS. I don't get any clue from ROS on DDS. I think there should be more explanation on how network works according to ROS2's design instead of telling people ROS2 uses DDS and you should refer to DDS.

huchaohong gravatar image huchaohong  ( 2019-03-22 08:15:20 -0500 )edit

Just to make sure: are you asking how ROS 2 domain concepts (ie: publishers, subscribers, etc) map onto DDS domain concepts?

I think there should be more explanation on how network works according to ROS2's design instead of telling people ROS2 uses DDS and you should refer to DDS.

What I was trying to avoid was to write a long answer by telling you that DDS is used, which implies that it would make sense to lookup documentation about how DDS works, as that would avoid us here on ROS Answers duplicating all that information.

You would probably agree that duplicating information is not a good idea, right?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-22 08:20:55 -0500 )edit

Perhaps some of the main ROS 2 authors can add something here.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-22 08:29:46 -0500 )edit

No, i am not asking about how publishers, subscribers, etc map onto DDS domain concepts. ROS2 says it supports poor-quality WiFi while ROS1 can't do. I want to do how ROS2 handles with poor network and why ROS1 can't. As a user, i am familiar with publishers and subscriber things, but when ROS2 talks about poor-quality WiFi, i realized that i never thinks about network ability. If this means i have to lookup document of DDS, i will try.

huchaohong gravatar image huchaohong  ( 2019-03-22 08:34:20 -0500 )edit

I've tried finding some resources that explain this clearly, but either my google fu is weakened, or the Friday 5pm effect is setting in. RTI has some good articles on how DDS can deal with lossy networks, so you may want to start looking there.

There is a ROS 2 document on Quality of Service, which may offer some insight (but it's more documentation of a demo than an overview article): Use quality-of-service settings to handle lossy networks.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-22 12:05:47 -0500 )edit

thanks for your sharing, i will dig more into dds to learn its design.

huchaohong gravatar image huchaohong  ( 2019-03-22 19:11:28 -0500 )edit

Question Tools

1 follower


Asked: 2019-03-22 07:10:02 -0500

Seen: 1,620 times

Last updated: Mar 24 '19