[ROS2] how does the server's queue work?

asked 2019-02-27 09:47:27 -0600

alsora gravatar image

updated 2019-02-28 06:29:02 -0600

Hi,

assuming to have a service server node and multiple service client nodes, how is the server handling all the requests?

My first assumption was that it has a unique queue where it process all the requests in a FIFO order. However, some tests suggests me that something is not working like this.

This is my setup (P.S. I know that I should not use services like this, but it is just a test):

  • 1 service server node, default QoS, spinning with rclcpp::spin()
  • 10 service client nodes, default QoS, sending requests at 100Hz and rclcpp::spin_until_future_complete(...)
  • The service interface has the request and the response field made only of an header message

Assume an experiment lasting for 10 seconds. In an ideal case, i.e. the server is fast enough to process all the requests, each client should send/receive 1000 responses.

However my server node is not so fast, and, each client receives only approximately 700 responses. What is really strange to me is that almost every experiment 1 or 2 clients receive 0 response.

This is strange because all clients are starting to send requests at the same time so how is possible that some clients are communicating with the server, while another one is still waiting for its first response?

At first I was thinking at a problem with the depth of the queue, i.e. the oldest request is removed from the queue before being processed by the server. Is it something possible? However changing the history to KEEP_ALL did not change anything.

I have been able to solve the issue by adding a timeout to rclcpp::spin_until_future_complete(...). However I do not understand why is this needed.

EDIT: It looks like that the problem is not present when using the master version of Crystal. So if there was a bug, it has been fixed.

Thanks.

edit retag flag offensive close merge delete

Comments

A small example would be useful to reproduce this locally (the timeout for spin_until_future_complete affecting the outcome is puzzling to me without more context).

William gravatar image William  ( 2019-02-27 16:18:18 -0600 )edit

Which rmw implementation are you using? This could be a case where the impl behavior is not consistent. I think Connext may use instances (by using a key for the requester id) which may prevent the FIFO behavior. On the other hand, I don't know if Fast-RTPS provides FIFO either, I know Connext does.

William gravatar image William  ( 2019-02-27 16:19:35 -0600 )edit

I'm using FastRTPS. I solved the problem compiling from the master and with the latest FastRTPS release (they improved network usage so maybe that was the cause). I think we can close.

alsora gravatar image alsora  ( 2019-02-28 06:30:09 -0600 )edit