ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
2

how to get asynchronous publish working on ROS2 Galactic with FastRTPS

asked 2023-02-11 14:03:37 -0500

Bernd Pfrommer gravatar image

updated 2023-02-13 09:40:59 -0500

I'm observing the publisher to be block when sending over a slow link. This behavior is unlike any other pub/sub middleware I've used before (including ROS1). So just running a subscriber on a remote node causes the publish rate of the publisher to drop:

 [1676142719.758977019] [test_publisher]: size: 1000000 rate: 40.00 bw:  319.995 Mbits/s
 [1676142720.783972503] [test_publisher]: size: 1000000 rate: 40.00 bw:  320.002 Mbits/s
 [1676142721.817557893] [test_publisher]: size: 1000000 rate: 26.12 bw:  208.979 Mbits/s
 [1676142722.849145421] [test_publisher]: size: 1000000 rate: 12.60 bw:  100.823 Mbits/s
 [1676142723.903613207] [test_publisher]: size: 1000000 rate: 14.22 bw:  113.796 Mbits/s

Yes, this is over a Wifi link but the wifi is working just fine. It is simply out of bandwidth. In this case I would expect the publisher to drop messages, but not block in publish(). The latter behavior is really bad if you want to e.g. transmit camera images that the robot is also using for state estimation. After first seeing this in ROS2 Galactic with the default cyclone DDS I switched over to fastrtps using asynchronous mode following these instructions. No matter what I try the publisher always blocks when there is insufficient network bandwidth. I played around with qos policies and kernel buffer memory settings but to no avail. Here is the code for creating the publisher. Looks pretty standard:

 rclcpp::QoS qos(1);
  pub_ = create_publisher<StringMsg>("test_string", qos);

The variables were set as follows:

export RMW_IMPLEMENTATION=rmw_fastrtps_cpp                                                                                                                                   
export FASTRTPS_DEFAULT_PROFILES_FILE=`pwd`/SyncAsync.xml                                                                                                                    
export RMW_FASTRTPS_USE_QOS_FROM_XML=1                                                                                                                                       
export RMW_FASTRTPS_PUBLICATION_MODE=ASYNCHRONOUS

and this is the xml config file:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <!-- default publisher profile -->
    <publisher profile_name="default_publisher" is_default_profile="true">
        <historyMemoryPolicy>DYNAMIC</historyMemoryPolicy>
        <qos>
            <publishMode>
                <kind>ASYNCHRONOUS</kind>
            </publishMode>
        </qos>
    </publisher>
 </profiles>

The code can be found in a little github repo here

[Edit] With new keywords for searching provided by one of the answers, I've discovered that this has been noticed by others already: https://github.com/ros2/rmw_fastrtps/... [/Edit]

edit retag flag offensive close merge delete

Comments

I can't find any references right now, but isn't async publishing the default for at least FastRTPS? Afaik, Cyclone doesn't support it.

re: even with async you see 'blockages': according to this, default QoS would be: keep last, queue depth (history) of 10, reliable, volatile. Even if the RMW has an internal queue it uses to serialise messages to, QoS might still be causing the behaviour you describe.

Have you tried configuring a sensor QoS? That could result in dropped messages (especially with large payloads), but should not block the sender.


Edit: according to ros2/rmw_fastrtps/README.md@galactic:

If RMW_FASTRTPS_PUBLICATION_MODE is not set, then both rmw_fastrtps_cpp and rmw_fastrtps_dynamic_cpp behave as if it were set to ASYNCHRONOUS.

so the config .xml would not change anything.

[..]

gvdhoorn gravatar image gvdhoorn  ( 2023-02-12 01:53:25 -0500 )edit

[..] Note that it has changed. The Humble default is SYNCHRONOUS. And an observation:

This behavior is unlike any other pub/sub middleware I've used before (including ROS1).

pedantic perhaps, but DDS QoS is rather complex. It's perfectly possible for this to happen. I believe the default in many DDS implementations to be fully synchronous and behaviour with lossy links is entirely dependant on QoS parameters.

This seems to describe what you observe:

Nevertheless, the ASYNCHRONOUS publishing mode could also block if the History is filled and it is not possible to add a new change. To prevent this from happening, it is important to tune correctly the Quality of Service Policies related to the History management (Resource Limits, History, Reliability and Durability) [..]

[..]

gvdhoorn gravatar image gvdhoorn  ( 2023-02-12 02:00:35 -0500 )edit

[..] And the (main) author of Cyclone provides an insightful overview of the behaviour of a (hypothetical) dds_publish(..) in this comment.

gvdhoorn gravatar image gvdhoorn  ( 2023-02-12 02:11:40 -0500 )edit

I tried setting replacing the qos line above with:

rclcpp::SensorDataQoS qos;

and set RMW_FASTRTPS_PUBLICATION_MODE=ASYNCHRONOUS and for good measure also provided an XML file, but the call is still blocking. I'll next try the non_blocking_send flag as suggested in the answer.

Bernd Pfrommer gravatar image Bernd Pfrommer  ( 2023-02-13 11:02:33 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2023-02-13 00:55:49 -0500

JLBuenoLopez-eProsima gravatar image

In this case I would expect the publisher to drop messages, but not block in publish().

Fast DDS allows to configure the UDP transport with a non_blocking_send flag in the UDP transport descriptor. More information about how to configure a custom UDP transport can be also found in Fast DDS documentation.

edit flag offensive delete link more

Comments

Could you add a hint on how the OP could do this in a ROS 2 context? I don't believe that flag is exposed by any infrastructure in rclcpp.

gvdhoorn gravatar image gvdhoorn  ( 2023-02-13 04:16:22 -0500 )edit

This flag can be set in the XML configuration file. The link to the documentation above has an example. This configuration should be added to the XML configuration file the user is already using.

JLBuenoLopez-eProsima gravatar image JLBuenoLopez-eProsima  ( 2023-02-13 04:33:57 -0500 )edit

Thanks for the pointers! By using this xml file I can indeed decouple sender and receiver. Alas the UDP protocol is not suitable for transfering image frames with a size of 4.4MB over a link that drops packets (because of congestion control or otherwise). Is something similar possible with TCP? This worked in ROS1. Can fastrtps configured to perform similarly?

Bernd Pfrommer gravatar image Bernd Pfrommer  ( 2023-02-13 14:09:34 -0500 )edit

TCP transport has its own reliability protocol and Fast DDS does not handle it. Maybe setting a Flow Controller may help in your use case. I have pointed to the Flow Controllers version applying to ROS 2 Galactic (Fast DDS v2.3.x branch). Please, be aware that Flow Controllers were refactored in Fast DDS v2.4.0 so the latest Fast DDS documentation does not apply in ROS 2 Galactic.

JLBuenoLopez-eProsima gravatar image JLBuenoLopez-eProsima  ( 2023-02-14 00:53:26 -0500 )edit

@JLBuenoLopez-eProsima: if UDP packets are lost due to whatever reason, would flow controllers still work? Wireless networks are prime examples of links which drop packets/frames due to other reasons than congestion, which flow controllers appear to address mostly (by artificially limiting the maximum bandwidth).

Should @Bernd Pfrommer not be looking at the QoS configuration?

gvdhoorn gravatar image gvdhoorn  ( 2023-02-14 01:39:59 -0500 )edit

Working with DDS (fastrtps or others) makes me feel dumb. For instance now I cannot even get a basic tcp transport working. The config file I'm using has a transport like below, but when I use it "ros2 node list" no longer shows any nodes (even after starting/stopping ros2 daemons). I suppose that means discovery is no longer working, but why? I desperately need a working example, not pages of docs, sorry.

  <transport_descriptors>
    <transport_descriptor>
      <transport_id>nonblocking_tcp_transport</transport_id>
      <type>TCPv4</type>
      <maxMessageSize>10000</maxMessageSize>
      <sendBufferSize>92160</sendBufferSize>
      <listening_ports>
        <port>7400</port>
      </listening_ports>
      <interfaceWhiteList>
        <address>10.42.0.1</address>
        <address>127.0.0.1</address>
      </interfaceWhiteList>
      <wan_addr>10.42.0.1</wan_addr>
    </transport_descriptor>
  </transport_descriptor
Bernd Pfrommer gravatar image Bernd Pfrommer  ( 2023-02-14 12:53:53 -0500 )edit

ROS 2 daemon should also be configured with the TCP transport in order to discover the nodes and topics. TCP transport there is no multicast, so no out-of-the-box discovery of new entities if those entities does not know where to ping. I take note about adding a tutorial of how to configure a TCP transport in ROS 2 environment. I know you are not looking for more documentation, but right now it is the only way I can help you. You may consider looking at these tutorials that use eProsima's ROS 2 Router.

JLBuenoLopez-eProsima gravatar image JLBuenoLopez-eProsima  ( 2023-02-15 00:16:49 -0500 )edit

That link takes me to Vulcanexus and now I'm no longer certain what of that documentation is specific to Vulcanexus, and what to ROS2. I realize that I'll have to gain a fairly deep understanding of FastRTPS in order to just pull a stream of images over wifi. Not what I'd hoped for but I'll sink the time in. To be continued...

Bernd Pfrommer gravatar image Bernd Pfrommer  ( 2023-02-15 10:21:47 -0500 )edit

Question Tools

2 followers

Stats

Asked: 2023-02-11 14:03:37 -0500

Seen: 924 times

Last updated: Feb 13 '23