ROS2 Multithreaded executor fails to invoke subscriber callback
Problem
I have created a simple ROS2 application with a set of timers, publishers, and subscribers. However, one callback fails to be called shortly after launch using (specifically) the RCLCPP MultiThreadedExecutor. The example is provided at the end of my question.
Application
- The application uses a single executor. The message type is:
std_msgs::msg::Int64
dummy test message. - Node
Radar
: Publishes to"radar_topic"
at 0.5Hz. Publisher QoS: reliable - Node
Lidar
: Publishes to"lidar_topic"
at 0.125Hz. Publisher QoS: reliable - Node
Control
:- Callback
obstacle_detect
subscribed to"radar_topic"
-> Busy waits for some seconds then publishes to"topic_brake"
- Callback
local_planning
subscribed to"lidar_topic"
-> Busy waits for some seconds (does no publishing) - Callback
on_brake
subscribed to"topic_brake"
-> No publishing or busy waiting (just prints)
- Callback
I expect to see the callback on_brake
invoked now and then, seeing as the source callback for it runs at 0.5Hz, and the unrelated other callbacks run at 0.125Hz (every 8 seconds only).
Outcome
Only single-threaded executor occasionally runs on_brake
. The MultiThreadedExecutor may run it one or two times, but then permanently chokes and never runs it again. However, it continues to run the other callbacks.
Testing
+-------------+-----+------+------------+------------------------+---------+
| OS (Ubuntu) | Ins | ROS | DDS | Executor Class | Success |
+-------------+-----+------+------------+------------------------+---------+
| 20.04.1 | Bin | Foxy | FastRTPS | MultiThreadedExecutor | N |
+-------------+-----+------+------------+------------------------+---------+
| | | | | SingleThreadedExecutor | Y |
+-------------+-----+------+------------+------------------------+---------+
| 20.04.1 | Bin | Foxy | CycloneDDS | MultiThreadedExecutor | N |
+-------------+-----+------+------------+------------------------+---------+
| | | | | SingleThreadedExecutor | Y |
+-------------+-----+------+------------+------------------------+---------+
| 18.04.5 | Src | Foxy | FastRTPS | MultiThreadedExecutor | N |
+-------------+-----+------+------------+------------------------+---------+
| | | | | SingleThreadedExecutor | Y |
+-------------+-----+------+------------+------------------------+---------+
Summary
MultiThreadedExecutor ceases to invoke the on_brake
callback after several iterations, despite the load being quite light.
Reproducibility
The demo is a C++ package that goes into your workspace and can be compiled with colcon build --packages-select eventchains
. You may run it with ros2 run eventchains executor_1
You will observe that the MultiThreadedExecutor might invoke on_brake()
maybe up to ten times before it stops on a higher end system, and never executed the callback at all on a lower end system. Only SingleThreadedExecutor eventually executes the callback.
I cannot attach a ZIP file for (obvious) security reasons by the website, so I am providing a link to the github folder containing it.
Could you clarify if messages are publishing? Are there any problems with timers? There were some issues with multithreaded executors and timers in Dashing (e.g., https://github.com/ros2/rclcpp/issues...). I can confirm that Eloquent was affected too. But in Foxy timers are working with MultiThreadedExecutor (Except this issue in Foxy.)
Here is the suggestion to use callback groups, but honestly I cannot grasp the idea how to apply it for several nodes case.
@rrand Well, in this example the callbacks doing the publishing were indeed completing. However, some of the callbacks that were listening to those aforementioned topics never were called. Usually they would run a few times at first, then fall quiet. The program then runs in a loop running the callbacks that do the publishing over and over, but never running those subscribed to the topics. It seems as if there is some kind of starvation or priority issue occurring.