ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

This is still a problem with ROS 2 foxy in 2021.

I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case presents a bottle-leck (I would like to upload the results of the profiling, but I do not have enough karma points).

Using composition to keep all image processing in one process is a viable solution. ROS 2 performance is on par with ROS 1 inter-node communication. See my source for an example: https://github.com/fhwedel-hoe/pylon_usb-instant-camera/tree/ros2 It is compatible with the standard image_proc::RectifyNode.

This is still a problem with ROS 2 foxy in 2021.

I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case presents imposes a bottle-leck (I would like to upload the results bottle-leck: gporf visualisation of the profiling, but I do not have enough karma points).publishing unique_ptr of msg::Image

Using composition to keep all image processing in one process is a viable solution. ROS 2 performance is on par with ROS 1 inter-node communication. See my source for an example: https://github.com/fhwedel-hoe/pylon_usb-instant-camera/tree/ros2 It is compatible with the standard image_proc::RectifyNode.

This is still a problem with ROS 2 foxy in 2021.2021. Update: Or so I thought.

I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case imposes a bottle-leck: bottle-neck: gporf visualisation of publishing unique_ptr of msg::Image

Using Update: This may actually not be the case.

I suspect using composition to keep all image processing in one process is a viable solution. solution. ROS 2 performance is seems to be on par with ROS 1 inter-node communication. See communication.


Update September 2021

I did some more extensive benchmarking and all I can say: The results are inconclusive. Bulk data transport in ROS2 does not seem significantly slower than ROS1.

Some more details: My real-world use-case involves a camera from Basler. The SDK is called pylon. Since the official ROS node does not support ROS2 yet, I created my source own. With my node, the camera runs considerably faster, yielding a higher data-rate from the get-go. I am using a system with four cores (the total percentage of CPU being used can be 400). Under some circumstances, some nodes maxed out on CPU while still maintaining top speed. Nonetheless, I assume results would be skewed. I ended up comparing apples to oranges for an example: https://github.com/fhwedel-hoe/pylon_usb-instant-camera/tree/ros2 It is compatible quite some time.

I wanted to eliminate the camera from the equation and came up with the standard image_proc::RectifyNode.

a relatively simple test featuring a talker and a listener. The listener accesses some values of the image data to simulate some work being done, replacing the actual computer_vision node. Without touching the data, the compiler liked to optimize away some parts of the implementation and I was not sure if the data transport was affected.

I dare to infer: In ROS2, having a stand-alone (as in "not in the composite container") subscriber incurs some kind of overhead in the publisher. One needs to keep in mind that rqt, rviz and ros2 topic are also stand-alone subscribers.

ROS1, official Basler driver

Camera running at 25 fps (not fit for comparison).

%CPU CMD
78.1 camera (includes rectification)
66.4 computer_vision

ROS1, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps.

%CPU CMD
31.3 camera
78.9 image_proc (ROS1 rectification)
 107 computer_vision

ROS2, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps. component_container maxed out (not fit for comparison).

%CPU CMD
 105 component_container (camera and image_proc)
 109 computer_vision

ROS2, talker-listener test, stand-alone nodes

Publishing 45 images per second (same image size as camera).

%CPU CMD
19.5 talker
21.0 listener

ROS2, talker-listener test, composited nodes

Publishing 45 images per second (same image size as camera).

%CPU CMD
20.1 component_container (talker and listener)

This is still a problem with ROS 2 foxy in 2021. Update: Or so I thought.

I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case imposes a bottle-neck: gporf visualisation of publishing unique_ptr of msg::Image
Update: This being a bottle-neck introduced by ROS 2 may actually not be the case.case. It is quite possible that the data transfer from the camera itself is slowing down memcpy.

I suspect using composition to keep all image processing in one process is a viable solution. ROS 2 performance seems to be on par with ROS 1 inter-node communication.


Update September 2021

I did some more extensive benchmarking and all I can say: The results are inconclusive. Bulk data transport in ROS2 ROS 2 does not seem significantly slower than ROS1.ROS 1.

Some more details: My real-world use-case involves a camera from Basler. The SDK is called pylon. Since the official ROS node does not support ROS2 yet, I created my own. With my node, the camera runs considerably faster, yielding a higher data-rate from the get-go. I am using a system with four cores (the total percentage of CPU being used can be 400). Under some circumstances, some nodes maxed out on CPU while still maintaining top speed. Nonetheless, I assume results would be skewed. I ended up comparing apples to oranges for quite some time.

I wanted to eliminate the camera from the equation and came up with a relatively simple test featuring a talker and a listener. The listener accesses some values of the image data to simulate some work being done, replacing the actual computer_vision node. Without touching the data, the compiler liked to optimize away some parts of the implementation and I was not sure if the data transport was affected.

I dare to infer: In ROS2, having a stand-alone (as in "not in the composite container") subscriber incurs some kind of overhead in the publisher. One needs to keep in mind that rqt, rviz and ros2 topic are also stand-alone subscribers.

ROS1, official Basler driver

Camera running at 25 fps (not fit for comparison).

%CPU CMD
78.1 camera (includes rectification)
66.4 computer_vision

ROS1, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps.

%CPU CMD
31.3 camera
78.9 image_proc (ROS1 rectification)
 107 computer_vision

ROS2, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps. component_container maxed out (not fit for comparison).

%CPU CMD
 105 component_container (camera and image_proc)
 109 computer_vision

ROS2, talker-listener test, stand-alone nodes

Publishing 45 images per second (same image size as camera).

%CPU CMD
19.5 talker
21.0 listener

ROS2, talker-listener test, composited nodes

Publishing 45 images per second (same image size as camera).

%CPU CMD
20.1 component_container (talker and listener)

This is still a problem with ROS 2 foxy in 2021. Update: Or so I thought.

I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case imposes a bottle-neck: gporf visualisation of publishing unique_ptr of msg::Image
Update: This being a bottle-neck introduced by ROS 2 may actually not be the case. It is quite possible that the data transfer from the camera itself is slowing down memcpy.

I suspect using composition to keep all image processing in one process is a viable solution. ROS 2 performance seems to be on par with ROS 1 inter-node communication.


Update September 2021

I did some more extensive benchmarking and all I can say: The results are inconclusive. Bulk data transport in ROS 2 does not seem significantly slower than ROS 1.

Some more details: My real-world use-case involves a camera from Basler. The SDK is called pylon. Since the official ROS node does not support ROS2 yet, I created my own. With my node, the camera runs considerably faster, yielding a higher data-rate from the get-go. I am using a system with four cores (the total percentage of CPU being used can be 400). Under some circumstances, some nodes maxed out on CPU while still maintaining top speed. Nonetheless, I assume results would be skewed. I ended up comparing apples to oranges for quite some time.

I wanted to eliminate the camera from the equation and came up with a relatively simple test featuring a talker and a listener. The listener accesses some values of the image data to simulate some work being done, replacing the actual computer_vision node. Without touching the data, the compiler liked to optimize away some parts of the implementation and I was not sure if the data transport was affected.

I dare to infer: In ROS2, having a stand-alone (as in "not in the composite container") subscriber incurs some some kind of overhead in the publisher. publisher. One needs to keep in mind that rqt, rviz and ros2 topic are also stand-alone subscribers.subscribers, so the overall performance may change depending on what tools you use to examine the set-up.

ROS1, official Basler driver

Camera running at 25 fps (not fit for comparison).

%CPU CMD
78.1 camera (includes rectification)
66.4 computer_vision

ROS1, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps.

%CPU CMD
31.3 camera
78.9 image_proc (ROS1 rectification)
 107 computer_vision

ROS2, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps. component_container maxed out (not fit for comparison).

%CPU CMD
 105 component_container (camera and image_proc)
 109 computer_vision

ROS2, talker-listener test, stand-alone nodes

Publishing 45 images per second (same image size as camera).

%CPU CMD
19.5 talker
21.0 listener

ROS2, talker-listener test, composited nodes

Publishing 45 images per second (same image size as camera).

%CPU CMD
20.1 component_container (talker and listener)