Poor performance during discovery of namespaced robots even with discovery server

asked 2023-04-04 08:29:35 -0500

anders-clement gravatar image

Hi,

I am a part of a group developing a test bed of several mobile robots based on Raspberry Pi4 8GB CPUs running ubuntu 22.04 server 64-bit with ROS2 Humble and basic hardware. In order to run multiple robots simultaneously while having access to all robots from a central controller, as well as select communication between robots, an approach where each robot is launched in separate namespaces is taken.

If a better approach exists, we are open to that, however, I have only been able to find discussions mentioning the namespacing approach, or separating robots by domain ID, and making a home-grown solution for inter-robot communication and communication to a central server. A similar approach is mentioned at:

The issue: We experience high CPU spikes during start up of other robots (Approximately a doubling of CPU usage). All robots only share data on /tf, with each robot publishing a single transform at 10 Hz. All other data is namespaced. (Each robot's own tf tree is under /namspace/tf and a node is relaying the transform from base_link -> map on /tf). The stack is build in release mode, although the nav2 stack binaries are installed with apt (Which i presume is build in release mode as well)

The problem arises with two robots, but is worse when a third robot is introduced. Thus if e.g. two robots are running a task, starting a third robot will overload the cpu of the others, thus leading to failures.

Unlike a previous question: https://answers.ros.org/question/3799... the CPU spike is spread across all running nodes.

If all robots are run with separate domain IDs, this CPU spike does not occur, however, this does not allow for ROS2 communication between the robots, or even just a central controller.

This issue has persisted through lots of effort, and we are thus asking for any knowledge or pointers to documentation which can help us understand the cause of this performance issue. Essentially, we would like to know if this is 'just' the performance cost of the ROS2 architecture, or if there is options for improving performance.

Network setup: The robots are connected through 5GHz wifi to a central laptop which has a wired connection to the router. Link speed is ~6 MB/s tested by copying a large file over the network. This is limited by the raspberry pi's network card, as the speed is much higher between other devices on the network.

The issue persists with both CycloneDDS, FastRTPS, and FastRTPS using a discovery server. However, using the discovery server does reduce the size of the CPU spike to a lesser extent.

Software stack: We run a base stack with: lidar drivers (10 Hz), robot_localization ekf filter (50 Hz), and a micro-ros agent connecting to a Teensy which subscribes to cmd_vel and publishes odometry (50Hz) and imu data (50 Hz). On top of this, we run ... (more)

edit retag flag offensive close merge delete