Slow reading from many rosbags in python. No speedup from parallelization.

asked 2021-03-18 19:09:37 -0500

cahartsell
1 ●1 ●2 ●1

Hi everyone, I've run into a performance issue with slow read speeds when reading from many rosbag files. For context, all these results were gathered on my personal machine with a WD SN750 SSD (quoted max sequential read speed around 3400 MB/s) and an Intel i7 4-core (8 threads). I am using ROS Noetic on Ubuntu 20.04 for the read experiments I describe below, but the rosbag files were generated in a docker image running ROS Kinetic on Ubuntu 16.04.

I've setup a small python test program which sequentially opens 81 different bag files, reads all messages from a single odometry topic in that bag file, and stores the data in a python list. It does not do any other processing with the data after this. Here are the results when I run that program in a single-threaded manner:

Number of workers: 1
Bag file count, total size (MB), avg size (MB): 81, 532.4, 6.6
Total time (s): 9.639
Time per file (s): 0.119
Throughput (MB/s): 55.2

Note that the "throughput" reported here is likely a gross over-estimate of the true number since I am only reading a very small subset of the full data stored in each bag file. This number was computed as total_size / total_time.

I monitored CPU and I/O performance with htop and iotop respectively while this ran. The I/O impact appeared almost negligible, but I noticed the CPU usage was at 100% of a single CPU thread. The 7 remaining threads were idle (besides some light background use). Since this seemed to be a CPU bottleneck, I tried parallelizing the operation with the Python concurrent library by using a separate thread to load and read each bag file. Here are the results:

Number of workers: 8
Bag file count, total size (MB), avg size (MB): 81, 532.4, 6.6
Total time (s): 10.985
Time per file (s): 0.136
Throughput (MB/s): 48.5

So parallelizing the operation actually resulted in slightly decreased performance, even though each bagfile is entirely independent from the others. I also still saw 100% utilization on only a single CPU thread while the other 7 threads sat idle. However, the utilization was now divided over 8 python threads with each thread consuming roughly 12.5%.

It seems like the Python rosbag API might not support being parallelized in this way, but I am not familiar those kind of implementation details. In the past I've tried converting to other data formats (e.g. HDF5) and gotten better read performance, but performing the conversion takes some time and has other disadvantages. Any ideas what may be causing this, and/or potential solutions to improve read speed?

I can post the full test script as well if anyone is interested.

Thanks, Charlie

answered 2021-04-12 17:50:35 -0500

cahartsell
1 ●1 ●2 ●1

This turned out to be a limitation of the Python threading model, not the rosbag API. Python supports multi-threading, but all threads must execute on the same core Python interpreter process though the Global Interpreter Lock (GIL). Essentially a single Python process is restricted to a single CPU core regardless of the thread count. This is useful for I/O-limited tasks which spend a lot of time blocked in wait queues, but not very useful for CPU-limited tasks. For more info, see https://realpython.com/python-gil/

The best way around this (that I've found so far) is to use multiple Python processes. I modified my test script to use Python's multiprocessing library instead of the concurrent library and got the following results:

Number of workers: 8
Bag file count, total size (MB), avg size (MB): 81, 532.4, 6.6
Total time (s): 2.439
Time per file (s): 0.030
Throughput (MB/s): 218.3

I'm still not blown away by the read speed of the rosbag API and open to any suggestions for further improvements, but I think this answers my original question.

edit flag offensive delete link

Comments

Hi Charlie,

I am also trying to use multiprocessing to speed up reading multiple bag files. I have been getting this error:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 463, in _handle_results
    task = get()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
ModuleNotFoundError: No module named 'tmp33pnvn34'

According to this issue, it seems like the way the rosbag API unpacks this doesn't play nice with the way multiprocess serializes data between workers/processes.

But you seem to have gotten it to work! I was hoping you might be able to provide some assistance, or post a code snippet so I can see where I went wrong.

Cheers, Sam

slapkan ( 2021-08-05 15:08:26 -0500 )edit

Hi Sam,

I remember running into some serialization errors, but I don't remember the specific exception message. At first I was trying to pass complete ROS message objects between the separate processes which didn't work. Instead, I had to extract the information I wanted from the ROS message, store it in a python list (numpy array or similar would probably also work), then return the python list from the worker process to the main process.

The python script I used to run the tests for this post is available here:

https://gist.github.com/cahartsell/f813b415ae28c26010e03108b4aaf8e1

You'll need to supply your own bag files if you want to run it, but hopefully it's clear how I was able to run with multi-processing.

Thanks, Charlie

cahartsell ( 2021-08-06 10:39:41 -0500 )edit

add a comment

Slow reading from many rosbags in python. No speedup from parallelization.

1 Answer

Comments

Question Tools

Stats

Related questions

Slow reading from many rosbags in python. No speedup from parallelization. edit

1 Answer

Comments

Question Tools

Stats

Related questions

Slow reading from many rosbags in python. No speedup from parallelization.