Ask Your Question
0

Limit on concurrent connections between nodes?

asked 2020-01-21 04:02:16 -0600

FabianMene gravatar image

updated 2020-01-22 02:26:26 -0600

Edit:

In the original question I observed a perceived limit on action client-server connections. It turns out that this limit affected any type of connections (topic, service, action) between nodes.

Original question:

I have a network of several hundred of nodes (let's call them server nodes), each implementing an action server. Then there are a handful of client nodes, where each node can potentially open up an action client to communicate with each of the server nodes.

In tests I have observed that there appears to be a specific limit on concurrent connections between action client and action server, which appears to be at 196 such connections, as the 197th one consistently fails. In roscpp I can initialize the 197th action client, but waitForServer() will block indefinitely. When sending a goal anyway, the server does not receive it. This occurs with both the ActionServer and SimpleActionServer.

I am interested in the limits to ROS' scalability (in terms of number of nodes and connections). Why is it that action connections appear to be hard-capped at 196 and what other such limits exist?

edit retag flag offensive close merge delete

Comments

1

Would you not be running into a limit of (the) Linux (network stack)? On 'consumer' Linux distributions (or those configured with a consumer 'profile'), settings like max nr of sockets, file descriptors, etc are typically set relatively conservatively.

You don't tell us which OS you're using, but it would probably be a good idea to check those.

I'm not claiming there are no further limits caused by how ROS is designed and architected, but the OS is a good source of (artificial) limits that you should first investigate.

Finally:

I have a network of several hundred of nodes [..]

for your sort of question I would suggest to be specific. How many exactly?

gvdhoorn gravatar imagegvdhoorn ( 2020-01-21 06:27:39 -0600 )edit

Thank you for your response. You seem to be correct in that I'm dealing with some OS side limitations rather than something directly related to ROS. Actions, topics and services all seem to use up the same 'resource' (whatever that may be, I have to investigate yet). I'm running Kubuntu Bionic.

To answer your questions regarding the number of nodes: It depends on the scenario, but let's say 800 server nodes. With 800 nodes, the following combinations of connections reached the limit:

actions    topics    services
156        200       156
176        100       176
194        10        100
FabianMene gravatar imageFabianMene ( 2020-01-21 09:56:50 -0600 )edit
1

I would first try increasing maximum nr of file descriptors for your user (or the user you're running these experiments with). ulimit -a should show you the current limits.

gvdhoorn gravatar imagegvdhoorn ( 2020-01-21 10:18:09 -0600 )edit

Thank you, that seems to indeed have been the problem. The original limit was set to 1024 user-side; I knocked it up quite a bit and am now unable to reproduce the connection limit.

FabianMene gravatar imageFabianMene ( 2020-01-22 02:23:19 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2020-01-22 02:21:45 -0600

FabianMene gravatar image

Based on @gvdhoorn's comment, the solution was indeed to increase the file descriptor limit, which was set to 1024 by default:

ulimit -n 32768 (arbitrary choice)

edit flag offensive delete link more

Comments

Please note that -- of course -- you are now just delaying the point at which you'll again run into this same limit.

It may be that other limitations will get in the way first (memory fi), but it's good to realise this.

And pedantic (but seeing as you seem to be investigating this): the question title as-is suggests a cause, it's not a description of the observation. The question text itself is slightly better, but it still suggests a cause: "Why is it that action connections appear to be hard-capped". They weren't, the only thing that happened was:

In tests I have observed that there appears to be a specific limit on concurrent connections between action client and action server, which appears to be at 196 such connections, as the 197th one consistently fails.

this is already a better description of an observation, but it still says ...(more)

gvdhoorn gravatar imagegvdhoorn ( 2020-01-22 02:45:26 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2020-01-21 04:02:16 -0600

Seen: 25 times

Last updated: Jan 22