how to get asynchronous publish working on ROS2 Galactic with FastRTPS

asked 2023-02-11 14:03:37 -0500

Bernd Pfrommer
78 ●4 ●8 ●12

updated 2023-02-13 09:40:59 -0500

I'm observing the publisher to be block when sending over a slow link. This behavior is unlike any other pub/sub middleware I've used before (including ROS1). So just running a subscriber on a remote node causes the publish rate of the publisher to drop:

 [1676142719.758977019] [test_publisher]: size: 1000000 rate: 40.00 bw:  319.995 Mbits/s
 [1676142720.783972503] [test_publisher]: size: 1000000 rate: 40.00 bw:  320.002 Mbits/s
 [1676142721.817557893] [test_publisher]: size: 1000000 rate: 26.12 bw:  208.979 Mbits/s
 [1676142722.849145421] [test_publisher]: size: 1000000 rate: 12.60 bw:  100.823 Mbits/s
 [1676142723.903613207] [test_publisher]: size: 1000000 rate: 14.22 bw:  113.796 Mbits/s

Yes, this is over a Wifi link but the wifi is working just fine. It is simply out of bandwidth. In this case I would expect the publisher to drop messages, but not block in publish(). The latter behavior is really bad if you want to e.g. transmit camera images that the robot is also using for state estimation. After first seeing this in ROS2 Galactic with the default cyclone DDS I switched over to fastrtps using asynchronous mode following these instructions. No matter what I try the publisher always blocks when there is insufficient network bandwidth. I played around with qos policies and kernel buffer memory settings but to no avail. Here is the code for creating the publisher. Looks pretty standard:

 rclcpp::QoS qos(1);
  pub_ = create_publisher<StringMsg>("test_string", qos);

The variables were set as follows:

export RMW_IMPLEMENTATION=rmw_fastrtps_cpp                                                                                                                                   
export FASTRTPS_DEFAULT_PROFILES_FILE=`pwd`/SyncAsync.xml                                                                                                                    
export RMW_FASTRTPS_USE_QOS_FROM_XML=1                                                                                                                                       
export RMW_FASTRTPS_PUBLICATION_MODE=ASYNCHRONOUS

and this is the xml config file:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <!-- default publisher profile -->
    <publisher profile_name="default_publisher" is_default_profile="true">
        <historyMemoryPolicy>DYNAMIC</historyMemoryPolicy>
        <qos>
            <publishMode>
                <kind>ASYNCHRONOUS</kind>
            </publishMode>
        </qos>
    </publisher>
 </profiles>

The code can be found in a little github repo here

[Edit] With new keywords for searching provided by one of the answers, I've discovered that this has been noticed by others already: https://github.com/ros2/rmw_fastrtps/... [/Edit]

edit retag flag offensive close merge delete

Comments

I can't find any references right now, but isn't async publishing the default for at least FastRTPS? Afaik, Cyclone doesn't support it.

re: even with async you see 'blockages': according to this, default QoS would be: keep last, queue depth (history) of 10, reliable, volatile. Even if the RMW has an internal queue it uses to serialise messages to, QoS might still be causing the behaviour you describe.

Have you tried configuring a sensor QoS? That could result in dropped messages (especially with large payloads), but should not block the sender.

Edit: according to ros2/rmw_fastrtps/README.md@galactic:

If RMW_FASTRTPS_PUBLICATION_MODE is not set, then both rmw_fastrtps_cpp and rmw_fastrtps_dynamic_cpp behave as if it were set to ASYNCHRONOUS.

so the config .xml would not change anything.

[..]

gvdhoorn ( 2023-02-12 01:53:25 -0500 )edit

[..] Note that it has changed. The Humble default is SYNCHRONOUS. And an observation:

This behavior is unlike any other pub/sub middleware I've used before (including ROS1).

pedantic perhaps, but DDS QoS is rather complex. It's perfectly possible for this to happen. I believe the default in many DDS implementations to be fully synchronous and behaviour with lossy links is entirely dependant on QoS parameters.

This seems to describe what you observe:

Nevertheless, the ASYNCHRONOUS publishing mode could also block if the History is filled and it is not possible to add a new change. To prevent this from happening, it is important to tune correctly the Quality of Service Policies related to the History management (Resource Limits, History, Reliability and Durability) [..]

[..]

gvdhoorn ( 2023-02-12 02:00:35 -0500 )edit

[..] And the (main) author of Cyclone provides an insightful overview of the behaviour of a (hypothetical) dds_publish(..) in this comment.

gvdhoorn ( 2023-02-12 02:11:40 -0500 )edit

I tried setting replacing the qos line above with:

rclcpp::SensorDataQoS qos;

and set RMW_FASTRTPS_PUBLICATION_MODE=ASYNCHRONOUS and for good measure also provided an XML file, but the call is still blocking. I'll next try the non_blocking_send flag as suggested in the answer.

Bernd Pfrommer ( 2023-02-13 11:02:33 -0500 )edit

add a comment

Comments

Could you add a hint on how the OP could do this in a ROS 2 context? I don't believe that flag is exposed by any infrastructure in rclcpp.

gvdhoorn ( 2023-02-13 04:16:22 -0500 )edit

This flag can be set in the XML configuration file. The link to the documentation above has an example. This configuration should be added to the XML configuration file the user is already using.

JLBuenoLopez-eProsima ( 2023-02-13 04:33:57 -0500 )edit

Thanks for the pointers! By using this xml file I can indeed decouple sender and receiver. Alas the UDP protocol is not suitable for transfering image frames with a size of 4.4MB over a link that drops packets (because of congestion control or otherwise). Is something similar possible with TCP? This worked in ROS1. Can fastrtps configured to perform similarly?

Bernd Pfrommer ( 2023-02-13 14:09:34 -0500 )edit

TCP transport has its own reliability protocol and Fast DDS does not handle it. Maybe setting a Flow Controller may help in your use case. I have pointed to the Flow Controllers version applying to ROS 2 Galactic (Fast DDS v2.3.x branch). Please, be aware that Flow Controllers were refactored in Fast DDS v2.4.0 so the latest Fast DDS documentation does not apply in ROS 2 Galactic.

JLBuenoLopez-eProsima ( 2023-02-14 00:53:26 -0500 )edit

@JLBuenoLopez-eProsima: if UDP packets are lost due to whatever reason, would flow controllers still work? Wireless networks are prime examples of links which drop packets/frames due to other reasons than congestion, which flow controllers appear to address mostly (by artificially limiting the maximum bandwidth).

Should @Bernd Pfrommer not be looking at the QoS configuration?

gvdhoorn ( 2023-02-14 01:39:59 -0500 )edit

Working with DDS (fastrtps or others) makes me feel dumb. For instance now I cannot even get a basic tcp transport working. The config file I'm using has a transport like below, but when I use it "ros2 node list" no longer shows any nodes (even after starting/stopping ros2 daemons). I suppose that means discovery is no longer working, but why? I desperately need a working example, not pages of docs, sorry.

  <transport_descriptors>
    <transport_descriptor>
      <transport_id>nonblocking_tcp_transport</transport_id>
      <type>TCPv4</type>
      <maxMessageSize>10000</maxMessageSize>
      <sendBufferSize>92160</sendBufferSize>
      <listening_ports>
        <port>7400</port>
      </listening_ports>
      <interfaceWhiteList>
        <address>10.42.0.1</address>
        <address>127.0.0.1</address>
      </interfaceWhiteList>
      <wan_addr>10.42.0.1</wan_addr>
    </transport_descriptor>
  </transport_descriptor

Bernd Pfrommer ( 2023-02-14 12:53:53 -0500 )edit

ROS 2 daemon should also be configured with the TCP transport in order to discover the nodes and topics. TCP transport there is no multicast, so no out-of-the-box discovery of new entities if those entities does not know where to ping. I take note about adding a tutorial of how to configure a TCP transport in ROS 2 environment. I know you are not looking for more documentation, but right now it is the only way I can help you. You may consider looking at these tutorials that use eProsima's ROS 2 Router.

JLBuenoLopez-eProsima ( 2023-02-15 00:16:49 -0500 )edit

That link takes me to Vulcanexus and now I'm no longer certain what of that documentation is specific to Vulcanexus, and what to ROS2. I realize that I'll have to gain a fairly deep understanding of FastRTPS in order to just pull a stream of images over wifi. Not what I'd hoped for but I'll sink the time in. To be continued...

Bernd Pfrommer ( 2023-02-15 10:21:47 -0500 )edit

see more comments

how to get asynchronous publish working on ROS2 Galactic with FastRTPS

Comments

1 Answer

Comments

Question Tools

Stats

Related questions

how to get asynchronous publish working on ROS2 Galactic with FastRTPS edit

Comments

1 Answer

Comments

Question Tools

Stats

Related questions

how to get asynchronous publish working on ROS2 Galactic with FastRTPS