ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange |
1 | initial version |
This is still a problem with ROS 2 foxy in 2021.
I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case presents a bottle-leck (I would like to upload the results of the profiling, but I do not have enough karma points).
Using composition to keep all image processing in one process is a viable solution. ROS 2 performance is on par with ROS 1 inter-node communication. See my source for an example: https://github.com/fhwedel-hoe/pylon_usb-instant-camera/tree/ros2 It is compatible with the standard image_proc::RectifyNode
.
2 | No.2 Revision |
This is still a problem with ROS 2 foxy in 2021.
I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case presents imposes a bottle-leck (I would like to upload the results bottle-leck: gporf visualisation of the profiling, but I do not have enough karma points).publishing unique_ptr of msg::Image
Using composition to keep all image processing in one process is a viable solution. ROS 2 performance is on par with ROS 1 inter-node communication. See my source for an example: https://github.com/fhwedel-hoe/pylon_usb-instant-camera/tree/ros2 It is compatible with the standard image_proc::RectifyNode
.
3 | No.3 Revision |
This is still a problem with ROS 2 foxy in 2021.2021. Update: Or so I thought.
I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case imposes a bottle-leck: bottle-neck: gporf visualisation of publishing unique_ptr of msg::Image
Using Update: This may actually not be the case.
I suspect using composition to keep all image processing in one process is a viable solution. solution.
ROS 2 performance is seems to be on par with ROS 1 inter-node communication. See communication.
I did some more extensive benchmarking and all I can say: The results are inconclusive. Bulk data transport in ROS2 does not seem significantly slower than ROS1.
Some more details: My real-world use-case involves a camera from Basler. The SDK is called pylon. Since the official ROS node does not support ROS2 yet, I created my source own. With my node, the camera runs considerably faster, yielding a higher data-rate from the get-go. I am using a system with four cores (the total percentage of CPU being used can be 400). Under some circumstances, some nodes maxed out on CPU while still maintaining top speed. Nonetheless, I assume results would be skewed. I ended up comparing apples to oranges for an example: https://github.com/fhwedel-hoe/pylon_usb-instant-camera/tree/ros2 It is compatible quite some time.
I wanted to eliminate the camera from the equation and came up with the standard image_proc::RectifyNode
.
I dare to infer: In ROS2, having a stand-alone (as in "not in the composite container") subscriber incurs some kind of overhead in the publisher. One needs to keep in mind that rqt
, rviz
and ros2 topic
are also stand-alone subscribers.
Camera running at 25 fps (not fit for comparison).
%CPU CMD
78.1 camera (includes rectification)
66.4 computer_vision
Camera running at 45 fps.
%CPU CMD
31.3 camera
78.9 image_proc (ROS1 rectification)
107 computer_vision
Camera running at 45 fps. component_container maxed out (not fit for comparison).
%CPU CMD
105 component_container (camera and image_proc)
109 computer_vision
Publishing 45 images per second (same image size as camera).
%CPU CMD
19.5 talker
21.0 listener
Publishing 45 images per second (same image size as camera).
%CPU CMD
20.1 component_container (talker and listener)
4 | No.4 Revision |
This is still a problem with ROS 2 foxy in 2021. Update: Or so I thought.
I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case imposes a bottle-neck: gporf visualisation of publishing unique_ptr of msg::Image
Update: This being a bottle-neck introduced by ROS 2 may actually not be the case.case. It is quite possible that the data transfer from the camera itself is slowing down memcpy
.
I suspect using composition to keep all image processing in one process is a viable solution. ROS 2 performance seems to be on par with ROS 1 inter-node communication.
I did some more extensive benchmarking and all I can say: The results are inconclusive. Bulk data transport in ROS2 ROS 2 does not seem significantly slower than ROS1.ROS 1.
Some more details: My real-world use-case involves a camera from Basler. The SDK is called pylon. Since the official ROS node does not support ROS2 yet, I created my own. With my node, the camera runs considerably faster, yielding a higher data-rate from the get-go. I am using a system with four cores (the total percentage of CPU being used can be 400). Under some circumstances, some nodes maxed out on CPU while still maintaining top speed. Nonetheless, I assume results would be skewed. I ended up comparing apples to oranges for quite some time.
I wanted to eliminate the camera from the equation and came up with a relatively simple test featuring a talker and a listener. The listener accesses some values of the image data to simulate some work being done, replacing the actual computer_vision node. Without touching the data, the compiler liked to optimize away some parts of the implementation and I was not sure if the data transport was affected.
I dare to infer: In ROS2, having a stand-alone (as in "not in the composite container") subscriber incurs some kind of overhead in the publisher. One needs to keep in mind that rqt
, rviz
and ros2 topic
are also stand-alone subscribers.
Camera running at 25 fps (not fit for comparison).
%CPU CMD
78.1 camera (includes rectification)
66.4 computer_vision
Camera running at 45 fps.
%CPU CMD
31.3 camera
78.9 image_proc (ROS1 rectification)
107 computer_vision
Camera running at 45 fps. component_container maxed out (not fit for comparison).
%CPU CMD
105 component_container (camera and image_proc)
109 computer_vision
Publishing 45 images per second (same image size as camera).
%CPU CMD
19.5 talker
21.0 listener
Publishing 45 images per second (same image size as camera).
%CPU CMD
20.1 component_container (talker and listener)
5 | No.5 Revision |
This is still a problem with ROS 2 foxy in 2021. Update: Or so I thought.
I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case imposes a bottle-neck: gporf visualisation of publishing unique_ptr of msg::Image
Update: This being a bottle-neck introduced by ROS 2 may actually not be the case. It is quite possible that the data transfer from the camera itself is slowing down memcpy
.
I suspect using composition to keep all image processing in one process is a viable solution. ROS 2 performance seems to be on par with ROS 1 inter-node communication.
I did some more extensive benchmarking and all I can say: The results are inconclusive. Bulk data transport in ROS 2 does not seem significantly slower than ROS 1.
Some more details: My real-world use-case involves a camera from Basler. The SDK is called pylon. Since the official ROS node does not support ROS2 yet, I created my own. With my node, the camera runs considerably faster, yielding a higher data-rate from the get-go. I am using a system with four cores (the total percentage of CPU being used can be 400). Under some circumstances, some nodes maxed out on CPU while still maintaining top speed. Nonetheless, I assume results would be skewed. I ended up comparing apples to oranges for quite some time.
I wanted to eliminate the camera from the equation and came up with a relatively simple test featuring a talker and a listener. The listener accesses some values of the image data to simulate some work being done, replacing the actual computer_vision node. Without touching the data, the compiler liked to optimize away some parts of the implementation and I was not sure if the data transport was affected.
I dare to infer: In ROS2, having a stand-alone (as in "not in the composite container") subscriber incurs some some kind of overhead in the publisher. publisher. One needs to keep in mind that rqt
, rviz
and ros2 topic
are also stand-alone subscribers.subscribers, so the overall performance may change depending on what tools you use to examine the set-up.
Camera running at 25 fps (not fit for comparison).
%CPU CMD
78.1 camera (includes rectification)
66.4 computer_vision
Camera running at 45 fps.
%CPU CMD
31.3 camera
78.9 image_proc (ROS1 rectification)
107 computer_vision
Camera running at 45 fps. component_container maxed out (not fit for comparison).
%CPU CMD
105 component_container (camera and image_proc)
109 computer_vision
Publishing 45 images per second (same image size as camera).
%CPU CMD
19.5 talker
21.0 listener
Publishing 45 images per second (same image size as camera).
%CPU CMD
20.1 component_container (talker and listener)