how to get asynchronous publish working on ROS2 Galactic with FastRTPS
I'm observing the publisher to be block when sending over a slow link. This behavior is unlike any other pub/sub middleware I've used before (including ROS1). So just running a subscriber on a remote node causes the publish rate of the publisher to drop:
[1676142719.758977019] [test_publisher]: size: 1000000 rate: 40.00 bw: 319.995 Mbits/s
[1676142720.783972503] [test_publisher]: size: 1000000 rate: 40.00 bw: 320.002 Mbits/s
[1676142721.817557893] [test_publisher]: size: 1000000 rate: 26.12 bw: 208.979 Mbits/s
[1676142722.849145421] [test_publisher]: size: 1000000 rate: 12.60 bw: 100.823 Mbits/s
[1676142723.903613207] [test_publisher]: size: 1000000 rate: 14.22 bw: 113.796 Mbits/s
Yes, this is over a Wifi link but the wifi is working just fine. It is simply out of bandwidth. In this case I would expect the publisher to drop messages, but not block in publish(). The latter behavior is really bad if you want to e.g. transmit camera images that the robot is also using for state estimation. After first seeing this in ROS2 Galactic with the default cyclone DDS I switched over to fastrtps using asynchronous mode following these instructions. No matter what I try the publisher always blocks when there is insufficient network bandwidth. I played around with qos policies and kernel buffer memory settings but to no avail. Here is the code for creating the publisher. Looks pretty standard:
rclcpp::QoS qos(1);
pub_ = create_publisher<StringMsg>("test_string", qos);
The variables were set as follows:
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
export FASTRTPS_DEFAULT_PROFILES_FILE=`pwd`/SyncAsync.xml
export RMW_FASTRTPS_USE_QOS_FROM_XML=1
export RMW_FASTRTPS_PUBLICATION_MODE=ASYNCHRONOUS
and this is the xml config file:
<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
<!-- default publisher profile -->
<publisher profile_name="default_publisher" is_default_profile="true">
<historyMemoryPolicy>DYNAMIC</historyMemoryPolicy>
<qos>
<publishMode>
<kind>ASYNCHRONOUS</kind>
</publishMode>
</qos>
</publisher>
</profiles>
The code can be found in a little github repo here
[Edit] With new keywords for searching provided by one of the answers, I've discovered that this has been noticed by others already: https://github.com/ros2/rmw_fastrtps/issues/460 [/Edit]
Asked by Bernd Pfrommer on 2023-02-11 15:03:37 UTC
Answers
In this case I would expect the publisher to drop messages, but not block in publish().
Fast DDS allows to configure the UDP transport with a non_blocking_send flag in the UDP transport descriptor.
More information about how to configure a custom UDP transport can be also found in Fast DDS documentation.
Asked by JLBuenoLopez-eProsima on 2023-02-13 01:55:49 UTC
Comments
Could you add a hint on how the OP could do this in a ROS 2 context? I don't believe that flag is exposed by any infrastructure in rclcpp.
Asked by gvdhoorn on 2023-02-13 05:16:22 UTC
This flag can be set in the XML configuration file. The link to the documentation above has an example. This configuration should be added to the XML configuration file the user is already using.
Asked by JLBuenoLopez-eProsima on 2023-02-13 05:33:57 UTC
Thanks for the pointers! By using this xml file I can indeed decouple sender and receiver. Alas the UDP protocol is not suitable for transfering image frames with a size of 4.4MB over a link that drops packets (because of congestion control or otherwise). Is something similar possible with TCP? This worked in ROS1. Can fastrtps configured to perform similarly?
Asked by Bernd Pfrommer on 2023-02-13 15:09:34 UTC
TCP transport has its own reliability protocol and Fast DDS does not handle it. Maybe setting a Flow Controller may help in your use case. I have pointed to the Flow Controllers version applying to ROS 2 Galactic (Fast DDS v2.3.x branch). Please, be aware that Flow Controllers were refactored in Fast DDS v2.4.0 so the latest Fast DDS documentation does not apply in ROS 2 Galactic.
Asked by JLBuenoLopez-eProsima on 2023-02-14 01:53:26 UTC
@JLBuenoLopez-eProsima: if UDP packets are lost due to whatever reason, would flow controllers still work? Wireless networks are prime examples of links which drop packets/frames due to other reasons than congestion, which flow controllers appear to address mostly (by artificially limiting the maximum bandwidth).
Should @Bernd Pfrommer not be looking at the QoS configuration?
Asked by gvdhoorn on 2023-02-14 02:39:59 UTC
Working with DDS (fastrtps or others) makes me feel dumb. For instance now I cannot even get a basic tcp transport working. The config file I'm using has a transport like below, but when I use it "ros2 node list" no longer shows any nodes (even after starting/stopping ros2 daemons). I suppose that means discovery is no longer working, but why? I desperately need a working example, not pages of docs, sorry.
<transport_descriptors>
<transport_descriptor>
<transport_id>nonblocking_tcp_transport</transport_id>
<type>TCPv4</type>
<maxMessageSize>10000</maxMessageSize>
<sendBufferSize>92160</sendBufferSize>
<listening_ports>
<port>7400</port>
</listening_ports>
<interfaceWhiteList>
<address>10.42.0.1</address>
<address>127.0.0.1</address>
</interfaceWhiteList>
<wan_addr>10.42.0.1</wan_addr>
</transport_descriptor>
</transport_descriptor
Asked by Bernd Pfrommer on 2023-02-14 13:53:53 UTC
ROS 2 daemon should also be configured with the TCP transport in order to discover the nodes and topics. TCP transport there is no multicast, so no out-of-the-box discovery of new entities if those entities does not know where to ping. I take note about adding a tutorial of how to configure a TCP transport in ROS 2 environment. I know you are not looking for more documentation, but right now it is the only way I can help you. You may consider looking at these tutorials that use eProsima's ROS 2 Router.
Asked by JLBuenoLopez-eProsima on 2023-02-15 01:16:49 UTC
That link takes me to Vulcanexus and now I'm no longer certain what of that documentation is specific to Vulcanexus, and what to ROS2. I realize that I'll have to gain a fairly deep understanding of FastRTPS in order to just pull a stream of images over wifi. Not what I'd hoped for but I'll sink the time in. To be continued...
Asked by Bernd Pfrommer on 2023-02-15 11:21:47 UTC
Nope, I just gave up. At the moment I do not have the time to figure out how to configure async fastrtps via tcp.
Asked by Bernd Pfrommer on 2023-08-03 15:35:53 UTC
Comments
I can't find any references right now, but isn't async publishing the default for at least FastRTPS? Afaik, Cyclone doesn't support it.
re: even with async you see 'blockages': according to this, default QoS would be: keep last, queue depth (history) of 10, reliable, volatile. Even if the RMW has an internal queue it uses to serialise messages to, QoS might still be causing the behaviour you describe.
Have you tried configuring a
sensorQoS? That could result in dropped messages (especially with large payloads), but should not block the sender.Edit: according to ros2/rmw_fastrtps/README.md@galactic:
so the config
.xmlwould not change anything.[..]
Asked by gvdhoorn on 2023-02-12 02:53:25 UTC
[..] Note that it has changed. The Humble default is
SYNCHRONOUS. And an observation:pedantic perhaps, but DDS QoS is rather complex. It's perfectly possible for this to happen. I believe the default in many DDS implementations to be fully synchronous and behaviour with lossy links is entirely dependant on QoS parameters.
This seems to describe what you observe:
[..]
Asked by gvdhoorn on 2023-02-12 03:00:35 UTC
[..] And the (main) author of Cyclone provides an insightful overview of the behaviour of a (hypothetical)
dds_publish(..)in this comment.Asked by gvdhoorn on 2023-02-12 03:11:40 UTC
I tried setting replacing the qos line above with:
rclcpp::SensorDataQoS qos;
and set RMW_FASTRTPS_PUBLICATION_MODE=ASYNCHRONOUS and for good measure also provided an XML file, but the call is still blocking. I'll next try the non_blocking_send flag as suggested in the answer.
Asked by Bernd Pfrommer on 2023-02-13 12:02:33 UTC