ROS2 Multithreaded executor fails to invoke subscriber callback

asked 2020-09-16 14:28:02 -0500

cxrandolph gravatar image

updated 2020-09-16 14:37:23 -0500

Problem

I have created a simple ROS2 application with a set of timers, publishers, and subscribers. However, one callback fails to be called shortly after launch using (specifically) the RCLCPP MultiThreadedExecutor. The example is provided at the end of my question.

Application

  • The application uses a single executor. The message type is: std_msgs::msg::Int64 dummy test message.
  • Node Radar: Publishes to "radar_topic" at 0.5Hz. Publisher QoS: reliable
  • Node Lidar: Publishes to "lidar_topic" at 0.125Hz. Publisher QoS: reliable
  • Node Control:
    1. Callback obstacle_detect subscribed to "radar_topic" -> Busy waits for some seconds then publishes to "topic_brake"
    2. Callback local_planning subscribed to "lidar_topic" -> Busy waits for some seconds (does no publishing)
    3. Callback on_brake subscribed to "topic_brake" -> No publishing or busy waiting (just prints)

I expect to see the callback on_brake invoked now and then, seeing as the source callback for it runs at 0.5Hz, and the unrelated other callbacks run at 0.125Hz (every 8 seconds only).

Outcome

Only single-threaded executor occasionally runs on_brake. The MultiThreadedExecutor may run it one or two times, but then permanently chokes and never runs it again. However, it continues to run the other callbacks.

Testing

+-------------+-----+------+------------+------------------------+---------+
| OS (Ubuntu) | Ins | ROS  | DDS        | Executor Class         | Success |
+-------------+-----+------+------------+------------------------+---------+
| 20.04.1     | Bin | Foxy | FastRTPS   | MultiThreadedExecutor  | N       |
+-------------+-----+------+------------+------------------------+---------+
|             |     |      |            | SingleThreadedExecutor | Y       |
+-------------+-----+------+------------+------------------------+---------+
| 20.04.1     | Bin | Foxy | CycloneDDS | MultiThreadedExecutor  | N       |
+-------------+-----+------+------------+------------------------+---------+
|             |     |      |            | SingleThreadedExecutor | Y       |
+-------------+-----+------+------------+------------------------+---------+
| 18.04.5     | Src | Foxy | FastRTPS   | MultiThreadedExecutor  | N       |
+-------------+-----+------+------------+------------------------+---------+
|             |     |      |            | SingleThreadedExecutor | Y       |
+-------------+-----+------+------------+------------------------+---------+

Summary

MultiThreadedExecutor ceases to invoke the on_brake callback after several iterations, despite the load being quite light.

Reproducibility

The demo is a C++ package that goes into your workspace and can be compiled with colcon build --packages-select eventchains. You may run it with ros2 run eventchains executor_1

You will observe that the MultiThreadedExecutor might invoke on_brake() maybe up to ten times before it stops on a higher end system, and never executed the callback at all on a lower end system. Only SingleThreadedExecutor eventually executes the callback.

I cannot attach a ZIP file for (obvious) security reasons by the website, so I am providing a link to the github folder containing it.

edit retag flag offensive close merge delete

Comments

Could you clarify if messages are publishing? Are there any problems with timers? There were some issues with multithreaded executors and timers in Dashing (e.g., https://github.com/ros2/rclcpp/issues...). I can confirm that Eloquent was affected too. But in Foxy timers are working with MultiThreadedExecutor (Except this issue in Foxy.)

Here is the suggestion to use callback groups, but honestly I cannot grasp the idea how to apply it for several nodes case.

rrrand gravatar image rrrand  ( 2020-09-23 06:41:39 -0500 )edit

@rrand Well, in this example the callbacks doing the publishing were indeed completing. However, some of the callbacks that were listening to those aforementioned topics never were called. Usually they would run a few times at first, then fall quiet. The program then runs in a loop running the callbacks that do the publishing over and over, but never running those subscribed to the topics. It seems as if there is some kind of starvation or priority issue occurring.

cxrandolph gravatar image cxrandolph  ( 2020-09-25 12:50:27 -0500 )edit