[ROS2][Humble] async_send_request service result seldomly never returns using executors. [closed]

asked 2022-06-23 10:11:20 -0600

MarcoB gravatar image

updated 2022-06-23 11:06:31 -0600

Hi Guys, I am writing because I am dealing with the following strange behaviour:

One node calls periodically (from a thread each 100ms) and sequentially N different services (4 in my example) using N clients. Each call is executed calling async_send_request and waited for using "wait_for" on the returned future. The node is added to an executor (tested with both Multithreaded and Singlehtreaded). Everything apparently works fine for a while, but seldomly the wait_for will hang and never get the result, even if the server generated a response. This happens much more frequently if the load on the system is increased (using for example an online multi-thread stress test like silver.urih) or on slow machines. Adding each client to a different callback group doesn't solve the issue, even with a multithreaded executor (but reduces its frequency). Using a timer to execute the calls periodically instead of a thread doesn't help either. If the service is called again after the timeout of wait_for expired, it will usually respond properly.

I am currently running on a docker container (based on osrf/ros:humble-desktop).


Here is an example of client implementation running into the issue (the whole code can be found on https://github.com/MarcoMatteoBassa/r...)

#include <memory>
#include <thread>

#include "example_interfaces/srv/add_two_ints.hpp"

    #include "rclcpp/rclcpp.hpp"

using namespace std::placeholders;
using namespace std::chrono_literals;

class ServiceClientTest : public rclcpp::Node {
  ServiceClientTest(const rclcpp::NodeOptions& options) : rclcpp::Node("service_client", options) {
    auto servers = declare_parameter("servers_names", std::vector<std::string>{});
    for (auto& server_name : servers) {
      // Adding one callback per client, or one callback for all clients, doesn't solve
          // auto callback_group_client = create_callback_group(
      //     rclcpp::CallbackGroupType::MutuallyExclusive);  // Using Reentrant doesn't help
      // callback_groups_clients_.push_back(callback_group_client);
      rclcpp::Client<example_interfaces::srv::AddTwoInts>::SharedPtr client =
              server_name, rmw_qos_profile_services_default);  //, callback_group_client);
    node_test_thread_ = std::make_shared<std::thread>(&ServiceClientTest::callThread, this);

      ~ServiceClientTest() {
    if (node_test_thread_->joinable()) node_test_thread_->join();

  // Using a timer instead of a thread doesn't help.
  void callThread() {
    while (rclcpp::ok()) {
      for (auto& client : clients_) {
        if (!client->wait_for_service(3s))
          RCLCPP_ERROR_STREAM(get_logger(), "Waiting service failed");  // Never happens
        auto request = std::make_shared<example_interfaces::srv::AddTwoInts::Request>();
    RCLCPP_INFO_STREAM(get_logger(), "Calling service " << client->get_service_name());
auto result = client->async_send_request(request);  // Specifying a callback doesn't help
if (result.wait_for(3s) != std::future_status::ready) {  // More time doesn't help
      get_logger(), "Waiting result failed for server " << client->get_service_name());
} else {
      get_logger(), "Waiting succeeded for server " << client->get_service_name());

  std::vector<rclcpp::Client<example_interfaces::srv::AddTwoInts>::SharedPtr> clients_;
      callback_groups_clients_;  // Using only one cb group for all the clients also doesn't help
  std::shared_ptr<std::thread> node_test_thread_;
int main(int argc, char** argv) {
  rclcpp::init(argc, argv);
  // The more threads are assigned and the less likely it is to get a deadlock if using multiple
  // callback groups. Confirmed to happen with 5 threads.
  rclcpp::executors::MultiThreadedExecutor executor(rclcpp::ExecutorOptions(), 4);
  auto test_node = std::make_shared<ServiceClientTest>(rclcpp::NodeOptions());
  return 0;

The servers code isn't ... (more)

edit retag flag offensive reopen merge delete

Closed for the following reason duplicate question by MarcoB
close date 2022-06-24 06:03:36.528114


There have been some issues with FastRTPS in Humble (see this discourse thread and ros2/rmw_fastrtps#613).

You could test using Cyclone DDS and see whether the behaviour changes.

Note: this is not a solution. It might just provide an extra data point.

gvdhoorn gravatar image gvdhoorn  ( 2022-06-23 11:06:04 -0600 )edit

Thank you for the reply, we tested using Cyclone DDS too but unfortunately the issue perists. We did more testing and we could only reproduce the issue if running from a docker container (that's why I opened https://github.com/osrf/docker_images...). I'll close this and follow up there.

MarcoB gravatar image MarcoB  ( 2022-06-24 06:02:48 -0600 )edit