ROS dropping messages - How to predict it
Good day.
Situation: On A RPI4 (8gb RAM, Ubuntu 18.04 server), I'm recording a rosbag with three different topics (IMU at 200 hz with messages of 0.35 kb; Lidar at 10 hz with messages of 190 kb; and video (different resolutions of compressed images at 30 hz). All of them using c++/ROS packages.
When recording the bag using 640x480 video, there is no problem during replaying as the frecuency is the right one for all topics. BUT, when recording at 1920 x 1080 (400 kb per message) set at 30 hz, the recorded bag is actually at 11 hz for the video and the IMU topic get affected decreasing the frecuency to only 160 hz.
Im using a Sandisk extreme mini sd card (up to 70 mb per second) and I need to record all messages in the right order so increasing the buffer size will not help.
Question: - ROS related: Is there a way in ROS to indicate or predict the limiting factor (i.e: RAM usage )? - Aditional question: Any good solutions for this? Will increase RAM memory solve this problem?
Thanks in advance! -
Asked by camilo.chiang on 2021-10-24 14:08:25 UTC
Answers
Just to be clear: You are recording the image_raw/compressed
topic, and the compression type is set to JPEG, correct? I believe the CPU power required for JPEG compression on the Raspberry Pi is the bottleneck, so it has nothing to do directly with ROS, writing to the SD card, rosbag or the amount of RAM. You could check this by running rostopic hz
on the compressed image topic on the live system. Also running top
or htop
could help - it should show you if the CPU is maxed out.
Unfortunately, saving the uncompressed images is probably also not an option, since the data rate is in the order of 1920×1080×3×30/(1024×1024) = 178 MB/s. If you are going to record very short rosbags only, you can increase the amount of RAM buffer that rosbag uses with the command line option rosbag record --buffsize=SIZE
, where SIZE
is the buffer size in MB (default: 256 MB), but at that data rate, it will only last a minute or so.
I guess recording a 30 Hz image stream in 1080p is just not going to work on a Raspi4.
Edit in response to comment:
Online compression to JPEG for an even bigger image stream (3840x2160@30Hz) will be a challenge on most "embedded" CPUs. You can benchmark your own system by installing sudo apt install libjpeg-turbo-test
and running tjbench
. On my laptop CPU (Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz), I get a compression throughput of 233 Megapixels/sec and decompression throughput of 272 Megapixels/sec. The DJI Manifold 2-C has a Intel Core i7-8550U, which gets a decompression throughput of 176 Megapixels/sec according to this page, so the compression throughput should be about 150 Megapixels/sec (around 3 times faster than the Raspberry Pi 4).
You would need:
- for 640×480@30Hz: 9.2 Megapixels/sec
- for 1920×1080@30Hz: 62.2 Megapixels/sec
- for 3840×2160@30Hz: 248.8 Megapixels/sec
Note that the values from tjbench
are for the libjpeg-turbo
library, but OpenCV (which is used by the ROS image_transport_plugin
for compression) uses the slower libjpeg
library, which can be up to 2-6x slower. It is possible to build OpenCV with libjpeg-turbo
and then rebuild all ROS packages that depend on OpenCV. I have done it, but it's a huge hassle and I wouldn't recommend it if it can be avoided.
The Raspberry Pi's ARMv8 Cortex-A72 has a decompression throughput of 61 Megapixels/sec according to this page, so let's guess the compression throughput should be around 52 Megapixels/sec. What you have actually observed is 1920×1080@11Hz = 22.8 Megapixels/sec. This could be explained by the libjpeg
vs. libjpeg-turbo
issue discussed above, so calculate some large margin of error when speccing your CPU. In other words, I guess that the DJI Manifold 2-C realistically has a compression throughput closer to 60 Megapixels/sec, so it can just barely handle 1920×1080@30Hz and will definitely not be able to handle 3840×2160@30Hz.
Maybe one thing you can try: image_transport_plugins has a plugin called theora_image_transport
, which is a plugin that compressed image topics using the Theora video codec. Maybe it can compress faster than JPEG, I haven't tested that yet. You can easily switch to Theora by running rosrun rqt_rqconfigure rqt_reconfigure
, so you don't even need to change any code.
Asked by Martin Günther on 2021-10-25 08:02:45 UTC
Comments
Hei Martin! 1. Thanks for your answer. You were right. After rechecking the live system , the CPU usage is above 100 at 30 and 15 hz. I can still use it at 6 hz (25% CPU usage) but is not ideal. 2. Then, I will assume that there is no a standard way from ROS to inform about the lack of CPU power to reach the required frequency. My option is to change the code and check loop_rate.sleep() . If this value is lower than 0 then i will add a warning. 3. Non ROS related : Do you have any suggestions of how I could calculate which CPU should be able to do this? I'm planning to use higher resolutions in the future but I'm afraid that the new CPU could not manage it (3840x2160 - 30 hz. I have been thinking about DJI manifold-C ).
Asked by camilo.chiang on 2021-10-25 08:54:41 UTC
See my edit for my response!
Asked by Martin Günther on 2021-10-25 10:01:42 UTC
This needs more upvotes.
Instead of vagueries based on intuition and previous experience, you actually used a representative benchmark and describe some objective measurements.
This question (+1 @camilo.chiang for asking) and your answer Martin are a really nice example of how something may seem like it's bottlenecked "on ROS" or "message passing", but where it turns out even the basic IO of (de)compression is just too much for the system used.
I guess that the DJI Manifold 2-C realistically has a compression throughput closer to 60 Megapixels/sec, so it can just barely handle 1920×1080@30Hz [..]
even stronger: it can barely handle a single stream. The op describes several more, with different resolutions.
Edit: perhaps switching to something like swri-robotics/imagezero_transport (or other alternative compression) could help. But it will all depend on performance of the system used.
Asked by gvdhoorn on 2021-10-25 10:05:48 UTC
And if the system has hw support for jpeg/png/x264 compression, there are image_transport
plugins which can use those (for JPEG you could even just package the already compressed payload).
Asked by gvdhoorn on 2021-10-25 10:11:12 UTC
Yes, using HW support actually seems like the best way to do it. Many cameras have the option of outputting a mjpeg stream. E.g., this node reads the mjpeg stream from a network camera and directly publishes it as a compressed image: https://github.com/chadrockey/mjpeg_network_camera . Many USB cameras can also output a mjpeg stream. If you can get a mjpeg stream from your camera and directly copy the bitstream into a CompressedImage message, that would eliminate any need for jpeg compression on the CPU and should easily work with huge images.
Asked by Martin Günther on 2021-10-27 02:16:47 UTC
In #q379644 we discussed some (hw accelerated via FFMPEG's NVENC support fi) x264 image_transport
plugins.
May also be an option.
Asked by gvdhoorn on 2021-10-27 02:42:04 UTC
Thanks guys :) I learn so much with this! I really appreciate it
Asked by camilo.chiang on 2021-10-27 03:00:45 UTC
Comments