Robotics StackExchange | Archived questions

Can ROS2 services still be expected to be 'flakey'?

Hello,

I have stumbled upon a problem, when trying to use services in ROS2 humble. I haven't tried other distros.

In short: My service-clients occasionally doesn't receive a service-response, when I have multiple clients calling the same service at the same time. It can also happen with 1 client however it occurs more rarely and also depends on which DDS/RTPS vendor I use.

I have scoured varies ROS2-related packages, issue lists and so forth. I found this issue, which seems to be a similar issue. From the history, then I can see that they have added 'flakey' services to the list of known issues for ROS2 foxy. Yet, the later released distributions' release notes doesn't contain this as a known issue. I can see that some of the related subissues are open.
Are anyone aware whether this behavior is still to be expected?

If yes, are you aware of any good workarounds? Currently, the only way I have found is to wait for a timeout, prune pending requests and try again. However, this is not a particular great pattern for my application.

Thanks in advance! Let me know if you need more details.

Recreate issue To recreate the issue, then compile the provided package cpp_pubsub and execute the following commands in 3 different terminals:

ros2 run cpp_pubsub service --ros-args -r __node:=service --log-level debug

ros2 run cpp_pubsub client --ros-args -r __node:=client1 --log-level debug

ros2 run cpp_pubsub client --ros-args -r __node:=client2 --log-level debug

My platform

Ubuntu 22 kernel 5.15.0-58-generic
I also use rmwcycloneddscpp as the RMW_implementation

Alternatively you can see the package files here

clientmemberfunction.cpp:

#include <chrono>
#include <functional>
#include <memory>
#include <string>
#include "rclcpp/rclcpp.hpp"
#include <cpp_pubsub/srv/node_call.hpp>
int main(int argc, char * argv[])
{
    rclcpp::init(argc, argv);
    std::shared_ptr<rclcpp::Node> node = rclcpp::Node::make_shared("client");
    auto client =node->create_client<cpp_pubsub::srv::NodeCall>("/service");
    client->wait_for_service();
    uint limit = 500000;
    for(size_t i = 0; i < limit ; i ++)
    {
    auto request = std::make_shared<cpp_pubsub::srv::NodeCall::Request>();
    request->node_name = node->get_name();
    client->wait_for_service();
    auto shared_future = client->async_send_request(request);
   if (rclcpp::spin_until_future_complete(node, shared_future, std::chrono::seconds(10)) == rclcpp::FutureReturnCode::SUCCESS)
    {
        std::stringstream str;
        str << "Received response " << i;
        RCLCPP_INFO(rclcpp::get_logger("rclcpp"), str.str().c_str());
    } else {
        RCLCPP_ERROR(rclcpp::get_logger("rclcpp"), "Failed to call service. Stopping execution now");
        return 1;
    }
   }
   rclcpp::shutdown();
   return 0;
 }

servicememberfunction.cpp

#include <functional>
#include <memory>
#include "rclcpp/rclcpp.hpp"
#include <cpp_pubsub/srv/node_call.hpp>
class MinimalService : public rclcpp::Node
{
public:
    MinimalService()
    : Node("Service")
    {
        service_ = create_service<cpp_pubsub::srv::NodeCall>("/service", std::bind(&MinimalService::callback,this, std::placeholders::_1,std::placeholders::_2));
    }

private:
    void callback(cpp_pubsub::srv::NodeCall::Request::ConstSharedPtr req ,  cpp_pubsub::srv::NodeCall::Response::SharedPtr)
    {
        RCLCPP_INFO(get_logger(), "Received request from %s",req->node_name.c_str());
    }
    rclcpp::Service<cpp_pubsub::srv::NodeCall>::SharedPtr service_;

};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  auto node = std::make_shared<MinimalService>();
  rclcpp::spin(node);
  rclcpp::shutdown();
  return 0;
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.8)
project(cpp_pubsub)

if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang")
  add_compile_options(-Wall -Wextra -Wpedantic)
endif()

# find dependencies
find_package(ament_cmake REQUIRED)
# uncomment the following section in order to fill in
# further dependencies manually.
find_package(rclcpp REQUIRED)
find_package(std_msgs REQUIRED)
find_package(std_srvs REQUIRED)
find_package(rosidl_default_generators REQUIRED)

rosidl_generate_interfaces(${PROJECT_NAME}
  "srv/NodeCall.srv"
  #DEPENDENCIES builtin_interfaces
)
rosidl_get_typesupport_target(cpp_typesupport_target "${PROJECT_NAME}" "rosidl_typesupport_cpp")

add_executable(client src/client_member_function.cpp)
target_link_libraries(client "${cpp_typesupport_target}")
ament_target_dependencies(client rclcpp std_msgs std_srvs )

add_executable(service src/service_member_function.cpp)
target_link_libraries(service "${cpp_typesupport_target}")
ament_target_dependencies(service rclcpp std_msgs std_srvs)

install(TARGETS
  client
  service
  DESTINATION lib/${PROJECT_NAME})

ament_export_dependencies(rosidl_default_runtime)
ament_package()

package.xml

<?xml version="1.0"?>
<?xml-model href="http://download.ros.org/schema/package_format3.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
<package format="3">
  <name>cpp_pubsub</name>
  <version>0.0.0</version>
  <description>TODO: Package description</description>
  <maintainer email="jp@todo">nordbo</maintainer>
  <license>TODO: License declaration</license>

  <buildtool_depend>ament_cmake</buildtool_depend>

  <test_depend>ament_lint_auto</test_depend>
  <test_depend>ament_lint_common</test_depend>
  <depend>rclcpp</depend>
  <depend>std_msgs</depend>
  <depend>std_srvs</depend>

  <build_depend>rosidl_default_generators</build_depend>
  <exec_depend>rosidl_default_runtime</exec_depend>
  <member_of_group>rosidl_interface_packages</member_of_group>

  <export>
    <build_type>ament_cmake</build_type>
  </export>
</package>

NodeCall.srv:

string node_name
---

Asked by jbp on 2023-01-26 08:09:08 UTC

Comments

I can confirm that your code is not working reliably on a default ROS2-humble installation with FastDDS or CyclonDDS (using docker). After a few seconds of running it, it starts to "stutter", sometimes one or both clients even crash with the error message: [ERROR] [1674831280.798867979] [rclcpp]: Failed to call service. Stopping execution now after receiving some 1000-messages successfully before.

I am also very interested in a workaround. Your code does not compile on foxy, but I would like to test that as well.

Asked by brean on 2023-01-27 10:06:38 UTC

Thanks for trying it out, Brean.

By examining the logs I found that the client never receives the response as an event. The server reports that it was sent. However, the stuck client would still receive responses from the second node that's still running.

It's an odd behaiviour and I expect it to be a bug in the implementation. With the current interfaces for client/services I find it difficult to detect if it should occur unless I implement a timeout that waits 'long enough'. However this approach is not particular great in a application, where timing is critical.

I tried looking into accessing the rmw_request_id_t that follows the request, but I haven't found a way to access it on the client side(expect the sequence number). If I could get it, then it would be easy to wrap the service/client in a class that would additionally publish/subscribe to this information, so I would have an extra method that could detect loss of packages.
However, it would be a hotfix.

Asked by jbp on 2023-01-27 10:25:18 UTC

Answers