Ask Your Question

ros2 megapixel image pub/sub cpu usage is very high

asked 2019-01-16 12:30:30 -0600

lucasw gravatar image

updated 2019-03-18 12:42:03 -0600

When I try to publish and subscribe to rgb8 Images of around 1000x1000 at 20 Hz (or even 1 Hz) the cpu usage seems very high. (TODO make an apples-apples comparison with lower resolution images- maybe 4x the update rate at 1/4 the pixels?)

ROS1 vs. ROS2


I've made identical nodes in ros1 and ros2 that do nothing but publish images out and subscribe to them, the publishers look equivalent in cpu usage but the ros2 subscriber is 3 times as much cpu as ros1: 9% cpu for 1024x1024x20Hz vs. 3% in ros1.

The ros1 code is in the ros1 branch of

ros2 launch ros2_cpp_py frame_rate:=20


roslaunch ros2_cpp_py pub_sub.launch frame_rate:=20

Testing with more frame rates it's looking like there is just a nearly constant extra overhead, it doesn't scale with higher framerates. (TODO what about higher resolution?)


Some manual characterization of timing shows that none of my 'user space' code in timer callbacks is taking more than a few milliseconds, which means it is all inside the underlying Node spinning?

gprof shows output like:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 50.00      0.01     0.01      797     6.27     6.27  std::_Sp_counted_ptr_inplace<sensor_msgs::msg::Image_<std::allocator<void> >, std::allocator<sensor_msgs::msg::Image_<std::allocator<void> > >, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
 50.00      0.01     0.01                             rclcpp::Subscription<sensor_msgs::msg::Image_<std::allocator<void> >, std::allocator<void> >::create_serialized_message()

(TODO revisit what gprof output actually means, re-run to see if output is similar)

I'm using to experiment with. I'm holding on to the same shared_ptr after creating it at init and republishing it, maybe I should create it anew for every publish? But if it is getting copied anyhow then that shouldn't be any different.

There isn't much cpu usage without a node receiving the image, when I do subscribe I observe the following with top:

  • ros2 topic echo /image_raw shows a ros2 process at 104% cpu (!). (possibly related )
  • rqt_image_view shows a python3 process going up to 55% cpu. (In general I've stayed away from using python in ros2 for anything except transitory initialization scripts because cpu usage seems generally high, this is no exception)
  • rviz2 Image viewer only takes about 35% cpu with same settings.
  • My home grown imgui_ros takes about 25% cpu, though it has a lot going on within it that isn't a good example to dig into - the dependency on imgui and usage of raw OpenGL may be irrelevant to the issue but make the code more complicated.

And in all the cases the image publisher cpu jumps from 1 or 2 to 10%.

RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT results in no images received at all in rqt_image_view, but clicking Unreliable in rviz2 makes them show up. ros2 topic echo will show the ... (more)

edit retag flag offensive close merge delete


Maybe related to - though non-intra-process performance here is worse than non-nodelet performance in ros1 as I found in

lucasw gravatar image lucasw  ( 2019-01-22 16:44:46 -0600 )edit

The number of threads created by each ros2 process maybe is related, though most of them are at zero cpu- possibly they contribute to the high memory usage

lucasw gravatar image lucasw  ( 2019-02-14 10:47:23 -0600 )edit

From the answer to #q319218

[ROS2 DDS] comes at the cost of complexity and some performance ([ROS 1] TCP on the local host is _really_ good), but should allow knowledgeable users to get good results in more situations.

lucasw gravatar image lucasw  ( 2019-03-24 07:55:12 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2019-02-01 09:14:00 -0600

lucasw gravatar image

What I've done is slapped together a pub/sub framework for Images that optionally publishes out to ros2 or only passes shared_ptrs directly to subscribers in the same process using node composition. Every publisher can be switched over to doing ros2 publishes with gui checkboxes, later config scripts could set whether to publish only internally or not.

It doesn't fully work with existing image processing nodes (though I don't happen to need any, not even image_rect)- they can be brought in but then the full transport costs are incurred. It has built in image viewer and I can easily integrate other capabilities. Other ros tools can be temporarily made functional via the ros2 enable checkboxes.

It's also error prone (both in the framework itself and the possibility of concurrent Image read/writes in different nodes) and lacks important features (like pub/sub objects deconstructing when no longer in use), but I need it for the image processing bandwidth.

edit flag offensive delete link more


Shared memory using dds (much preferred over home-grown solution- need to look at this

lucasw gravatar image lucasw  ( 2019-02-04 13:26:26 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools



Asked: 2019-01-16 12:30:30 -0600

Seen: 311 times

Last updated: Mar 18 '19