ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

ros2 megapixel image pub/sub cpu usage is very high

asked 2019-01-16 12:30:30 -0500

lucasw gravatar image

updated 2019-03-18 12:42:03 -0500

When I try to publish and subscribe to rgb8 Images of around 1000x1000 at 20 Hz (or even 1 Hz) the cpu usage seems very high. (TODO make an apples-apples comparison with lower resolution images- maybe 4x the update rate at 1/4 the pixels?)

ROS1 vs. ROS2


I've made identical nodes in ros1 and ros2 that do nothing but publish images out and subscribe to them, the publishers look equivalent in cpu usage but the ros2 subscriber is 3 times as much cpu as ros1: 9% cpu for 1024x1024x20Hz vs. 3% in ros1.

The ros1 code is in the ros1 branch of

ros2 launch ros2_cpp_py frame_rate:=20


roslaunch ros2_cpp_py pub_sub.launch frame_rate:=20

Testing with more frame rates it's looking like there is just a nearly constant extra overhead, it doesn't scale with higher framerates. (TODO what about higher resolution?)


Some manual characterization of timing shows that none of my 'user space' code in timer callbacks is taking more than a few milliseconds, which means it is all inside the underlying Node spinning?

gprof shows output like:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 50.00      0.01     0.01      797     6.27     6.27  std::_Sp_counted_ptr_inplace<sensor_msgs::msg::Image_<std::allocator<void> >, std::allocator<sensor_msgs::msg::Image_<std::allocator<void> > >, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
 50.00      0.01     0.01                             rclcpp::Subscription<sensor_msgs::msg::Image_<std::allocator<void> >, std::allocator<void> >::create_serialized_message()

(TODO revisit what gprof output actually means, re-run to see if output is similar)

I'm using to experiment with. I'm holding on to the same shared_ptr after creating it at init and republishing it, maybe I should create it anew for every publish? But if it is getting copied anyhow then that shouldn't be any different.

There isn't much cpu usage without a node receiving the image, when I do subscribe I observe the following with top:

  • ros2 topic echo /image_raw shows a ros2 process at 104% cpu (!). (possibly related )
  • rqt_image_view shows a python3 process going up to 55% cpu. (In general I've stayed away from using python in ros2 for anything except transitory initialization scripts because cpu usage seems generally high, this is no exception)
  • rviz2 Image viewer only takes about 35% cpu with same settings.
  • My home grown imgui_ros takes about 25% cpu, though it has a lot going on within it that isn't a good example to dig into - the dependency on imgui and usage of raw OpenGL may be irrelevant to the issue but make the code more complicated.

And in all the cases the image publisher cpu jumps from 1 or 2 to 10%.

RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT results in no images received at all in rqt_image_view, but clicking Unreliable in rviz2 makes them show up. ros2 topic echo will show the ... (more)

edit retag flag offensive close merge delete


Maybe related to - though non-intra-process performance here is worse than non-nodelet performance in ros1 as I found in

lucasw gravatar image lucasw  ( 2019-01-22 16:44:46 -0500 )edit

The number of threads created by each ros2 process maybe is related, though most of them are at zero cpu- possibly they contribute to the high memory usage

lucasw gravatar image lucasw  ( 2019-02-14 10:47:23 -0500 )edit

From the answer to #q319218

[ROS2 DDS] comes at the cost of complexity and some performance ([ROS 1] TCP on the local host is _really_ good), but should allow knowledgeable users to get good results in more situations.

lucasw gravatar image lucasw  ( 2019-03-24 07:55:12 -0500 )edit "Is ROS2 slower than ROS1 or is there something I’m missing?". also "at some point in the ROS2 system we have built we are dropping messages"

lucasw gravatar image lucasw  ( 2021-05-02 11:45:57 -0500 )edit

2 Answers

Sort by » oldest newest most voted

answered 2021-04-21 07:48:43 -0500

fhwedel-hoe gravatar image

updated 2021-09-02 05:12:47 -0500

This is still a problem with ROS 2 foxy in 2021. Update: Or so I thought.

I am trying to pull 1600 x 1200 pixel RGB images at 45 fps from a camera using my custom node. Even for local inter-node communication ROS 2 (FastRTPS) performance is abyssimal compared to ROS 1. libfastrtps implicitly creates a copy of the image data which in my case imposes a bottle-neck: gporf visualisation of publishing unique_ptr of msg::Image
Update: This being a bottle-neck introduced by ROS 2 may actually not be the case. It is quite possible that the data transfer from the camera itself is slowing down memcpy.

I suspect using composition to keep all image processing in one process is a viable solution. ROS 2 performance seems to be on par with ROS 1 inter-node communication.

Update September 2021

I did some more extensive benchmarking and all I can say: The results are inconclusive. Bulk data transport in ROS 2 does not seem significantly slower than ROS 1.

Some more details: My real-world use-case involves a camera from Basler. The SDK is called pylon. Since the official ROS node does not support ROS2 yet, I created my own. With my node, the camera runs considerably faster, yielding a higher data-rate from the get-go. I am using a system with four cores (the total percentage of CPU being used can be 400). Under some circumstances, some nodes maxed out on CPU while still maintaining top speed. Nonetheless, I assume results would be skewed. I ended up comparing apples to oranges for quite some time.

I wanted to eliminate the camera from the equation and came up with a relatively simple test featuring a talker and a listener. The listener accesses some values of the image data to simulate some work being done, replacing the actual computer_vision node. Without touching the data, the compiler liked to optimize away some parts of the implementation and I was not sure if the data transport was affected.

I dare to infer: In ROS2, having a stand-alone (as in "not in the composite container") subscriber incurs some kind of overhead in the publisher. One needs to keep in mind that rqt, rviz and ros2 topic are also stand-alone subscribers, so the overall performance may change depending on what tools you use to examine the set-up.

ROS1, official Basler driver

Camera running at 25 fps (not fit for comparison).

78.1 camera (includes rectification)
66.4 computer_vision

ROS1, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps.

31.3 camera
78.9 image_proc (ROS1 rectification)
 107 computer_vision

ROS2, fhwedel-hoe instant Pylon camera driver

Camera running at 45 fps. component_container maxed out (not fit for comparison).

 105 component_container (camera and image_proc)
 109 computer_vision

ROS2, talker-listener test, stand-alone nodes

Publishing 45 images per second (same image size as camera).

19.5 talker
21.0 listener

ROS2, talker-listener test, composited nodes

Publishing 45 images per second (same image size as camera ... (more)

edit flag offensive delete link more


You might have enough karma now?

The composition solution breaks as soon as any command line or gui tools are introduced, like an image viewer, because they would have to be injected into an already running process- maybe there's a solution for that now? I'd also want a solution that performs on par with ros1 on a small reliable LAN, I shouldn't have to run tools to interact with the system on the same computer, and again there's the simple use case of wanting to view an image.

I haven't spent any time on ros2 since running into this performance problem, what would make it compelling to try again is set of benchmarks comparing ros1 to ros2 performance that are run on a regular basis with latest versions. It would make for super useful example source code if there is a solution for ros2 but ...(more)

lucasw gravatar image lucasw  ( 2021-04-21 08:57:55 -0500 )edit

I take back my original accusations of ROS2 being slow. I ran some benchmarks and added the results to the answer.

fhwedel-hoe gravatar image fhwedel-hoe  ( 2021-09-02 05:07:06 -0500 )edit

answered 2019-02-01 09:14:00 -0500

lucasw gravatar image

What I've done is slapped together a pub/sub framework for Images that optionally publishes out to ros2 or only passes shared_ptrs directly to subscribers in the same process using node composition. Every publisher can be switched over to doing ros2 publishes with gui checkboxes, later config scripts could set whether to publish only internally or not.

It doesn't fully work with existing image processing nodes (though I don't happen to need any, not even image_rect)- they can be brought in but then the full transport costs are incurred. It has built in image viewer and I can easily integrate other capabilities. Other ros tools can be temporarily made functional via the ros2 enable checkboxes.

It's also error prone (both in the framework itself and the possibility of concurrent Image read/writes in different nodes) and lacks important features (like pub/sub objects deconstructing when no longer in use), but I need it for the image processing bandwidth.

edit flag offensive delete link more


Shared memory using dds (much preferred over home-grown solution- need to look at this

lucasw gravatar image lucasw  ( 2019-02-04 13:26:26 -0500 )edit

Question Tools



Asked: 2019-01-16 12:30:30 -0500

Seen: 1,573 times

Last updated: Sep 02 '21