Robotics StackExchange | Archived questions

[ROS2][Humble] async_send_request service result seldomly never returns using executors.

Hi Guys, I am writing because I am dealing with the following strange behaviour:

One node calls periodically (from a thread each 100ms) and sequentially N different services (4 in my example) using N clients. Each call is executed calling asyncsendrequest and waited for using "waitfor" on the returned future. The node is added to an executor (tested with both Multithreaded and Singlehtreaded). Everything apparently works fine for a while, but seldomly the waitfor will hang and never get the result, even if the server generated a response. This happens much more frequently if the load on the system is increased (using for example an online multi-thread stress test like silver.urih) or on slow machines. Adding each client to a different callback group doesn't solve the issue, even with a multithreaded executor (but reduces its frequency). Using a timer to execute the calls periodically instead of a thread doesn't help either. If the service is called again after the timeout of wait_for expired, it will usually respond properly.

I am currently running on a docker container (based on osrf/ros:humble-desktop).

env | grep ROS: ROSVERSION=2 ROSPYTHONVERSION=3 ROSLOCALHOSTONLY=0 ROSDISTRO=humble

Here is an example of client implementation running into the issue (the whole code can be found on https://github.com/MarcoMatteoBassa/reproduce_client_bug)

#include <memory>
#include <thread>

#include "example_interfaces/srv/add_two_ints.hpp"

    #include "rclcpp/rclcpp.hpp"

using namespace std::placeholders;
using namespace std::chrono_literals;

class ServiceClientTest : public rclcpp::Node {
 public:
  ServiceClientTest(const rclcpp::NodeOptions& options) : rclcpp::Node("service_client", options) {
    auto servers = declare_parameter("servers_names", std::vector<std::string>{});
    for (auto& server_name : servers) {
      // Adding one callback per client, or one callback for all clients, doesn't solve
          // auto callback_group_client = create_callback_group(
      //     rclcpp::CallbackGroupType::MutuallyExclusive);  // Using Reentrant doesn't help
      // callback_groups_clients_.push_back(callback_group_client);
      rclcpp::Client<example_interfaces::srv::AddTwoInts>::SharedPtr client =
          create_client<example_interfaces::srv::AddTwoInts>(
              server_name, rmw_qos_profile_services_default);  //, callback_group_client);
      clients_.push_back(client);
    }
    node_test_thread_ = std::make_shared<std::thread>(&ServiceClientTest::callThread, this);
  }

      ~ServiceClientTest() {
    if (node_test_thread_->joinable()) node_test_thread_->join();
  }

  // Using a timer instead of a thread doesn't help.
  void callThread() {
    while (rclcpp::ok()) {
      for (auto& client : clients_) {
        if (!client->wait_for_service(3s))
          RCLCPP_ERROR_STREAM(get_logger(), "Waiting service failed");  // Never happens
        auto request = std::make_shared<example_interfaces::srv::AddTwoInts::Request>();
    RCLCPP_INFO_STREAM(get_logger(), "Calling service " << client->get_service_name());
auto result = client->async_send_request(request);  // Specifying a callback doesn't help
if (result.wait_for(3s) != std::future_status::ready) {  // More time doesn't help
  RCLCPP_ERROR_STREAM(
      get_logger(), "Waiting result failed for server " << client->get_service_name());
  return;
} else {
  RCLCPP_INFO_STREAM(
      get_logger(), "Waiting succeeded for server " << client->get_service_name());
    }
  }
  get_clock()->sleep_for(rclcpp::Duration(100ms));
    }
  }

 private:
  std::vector<rclcpp::Client<example_interfaces::srv::AddTwoInts>::SharedPtr> clients_;
  std::vector<rclcpp::CallbackGroup::SharedPtr>
      callback_groups_clients_;  // Using only one cb group for all the clients also doesn't help
  std::shared_ptr<std::thread> node_test_thread_;
};
int main(int argc, char** argv) {
  rclcpp::init(argc, argv);
  // The more threads are assigned and the less likely it is to get a deadlock if using multiple
  // callback groups. Confirmed to happen with 5 threads.
  rclcpp::executors::MultiThreadedExecutor executor(rclcpp::ExecutorOptions(), 4);
  auto test_node = std::make_shared<ServiceClientTest>(rclcpp::NodeOptions());
  executor.add_node(test_node->get_node_base_interface());
  executor.spin();
  rclcpp::shutdown();
  return 0;
}

The servers code isn't very relevant (the response is generated), but I'll add it for completeness.

#include <memory>

#include "example_interfaces/srv/add_two_ints.hpp"
#include "rclcpp/rclcpp.hpp"

using namespace std::placeholders;

class ServiceServerTest : public rclcpp::Node {
 public:
  ServiceServerTest(const rclcpp::NodeOptions& options) : rclcpp::Node("service_server", options) {


   service_name_ = declare_parameter("service_name", std::string{});
    service_ = create_service<example_interfaces::srv::AddTwoInts>(
        service_name_, std::bind(&ServiceServerTest::add, this, _1, _2));
  }

  void add(const std::shared_ptr<example_interfaces::srv::AddTwoInts::Request> request,
      std::shared_ptr<example_interfaces::srv::AddTwoInts::Response> response) {
    response->sum = request->a + request->b;
    RCLCPP_INFO_STREAM(get_logger(), "Server " << service_name_ << "Responding");
  }

     private:

  rclcpp::Service<example_interfaces::srv::AddTwoInts>::SharedPtr service_;
  std::string service_name_;
};

int main(int argc, char** argv) {
  rclcpp::init(argc, argv);

  rclcpp::executors::SingleThreadedExecutor executor;


    auto fixture = std::make_shared<ServiceServerTest>(rclcpp::NodeOptions());

  executor.add_node(fixture->get_node_base_interface());
  executor.spin();
  rclcpp::shutdown();

  return 0;
}

Everything can be launched with a launchfile like:

from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    return LaunchDescription([
        Node(
        package='iw_lifecycle_manager',
    executable='service_server',
    name='service_server_1',
    parameters=[{'service_name': 'service_server_1'}]
    ),
    Node(
        package='iw_lifecycle_manager',
    executable='service_server',
    name='service_server_2',
    parameters=[{'service_name': 'service_server_2'}]
),
Node(
        package='iw_lifecycle_manager',
        executable='service_server',
        name='service_server_3',
    parameters=[{'service_name': 'service_server_3'}]
),
    Node(
    package='iw_lifecycle_manager',
        executable='service_server',
        name='service_server_4',
        parameters=[{'service_name': 'service_server_4'}]
    ),
    Node(
        package='iw_lifecycle_manager',
        executable='service_client',
       name='service_client',
    parameters=[{'servers_names': ['service_server_1', 'service_server_2', 'service_server_3', 'service_server_4']}]
    )
])

An example of output is:

[service_client-5] [INFO] [1655994199.332906692] [service_client]: Calling service /service_server_1
[service_server-1] [INFO] [1655994199.333261599] [service_server_1]: Server service_server_1Responding
[service_client-5] [INFO] [1655994199.333474084] [service_client]: Waiting succeeded for server /service_server_1
[service_client-5] [INFO] [1655994199.333529454] [service_client]: Calling service /service_server_2
[service_server-2] [INFO] [1655994199.333676860] [service_server_2]: Server service_server_2Responding
[service_client-5] [INFO] [1655994199.333871010] [service_client]: Waiting succeeded for server /service_server_2
[service_client-5] [INFO] [1655994199.333898254] [service_client]: Calling service /service_server_3
[service_server-3] [INFO] [1655994199.334040428] [service_server_3]: Server service_server_3Responding
[service_client-5] [INFO] [1655994199.334178310] [service_client]: Waiting succeeded for server /service_server_3
[service_client-5] [INFO] [1655994199.334209875] [service_client]: Calling service /service_server_4
[service_server-4] [INFO] [1655994199.334348107] [service_server_4]: Server service_server_4Responding
[service_client-5] [INFO] [1655994199.334464206] [service_client]: Waiting succeeded for server /service_server_4
[service_client-5] [INFO] [1655994199.434629161] [service_client]: Calling service /service_server_1
[service_server-1] [INFO] [1655994199.434977860] [service_server_1]: Server service_server_1Responding
[service_client-5] [INFO] [1655994199.435268193] [service_client]: Waiting succeeded for server /service_server_1
[service_client-5] [INFO] [1655994199.435379712] [service_client]: Calling service /service_server_2
[service_server-2] [INFO] [1655994199.435540431] [service_server_2]: Server service_server_2Responding
[service_client-5] [ERROR] [1655994202.435855214] [service_client]: Waiting result failed for server /service_server_2

Did anyone run into similar issue? Could it be an issue with the executors implementation in humble?

Asked by MarcoB on 2022-06-23 10:11:20 UTC

Comments

There have been some issues with FastRTPS in Humble (see this discourse thread and ros2/rmw_fastrtps#613).

You could test using Cyclone DDS and see whether the behaviour changes.

Note: this is not a solution. It might just provide an extra data point.

Asked by gvdhoorn on 2022-06-23 11:06:04 UTC

Thank you for the reply, we tested using Cyclone DDS too but unfortunately the issue perists. We did more testing and we could only reproduce the issue if running from a docker container (that's why I opened https://github.com/osrf/docker_images/issues/628). I'll close this and follow up there.

Asked by MarcoB on 2022-06-24 06:02:48 UTC

Answers