Put many ros1 bags into a database?
I'd like to be able to extract topic data from hundreds of gigabytes of rosbags in a timely fashion through scripts. Looping over every bag in a list and then loading them individually, processing them, and then moving on to the next doesn't scale beyond a handful, even if parallelized, especially if it has to be repeated again if the processing step needs altering, or the topic of choice changes.
Processing each bag once and uploading all the data to a database seems like the best path, so how to convert bags which includes custom message types generically? Convert ros1 bags to the sqlite ros2 format, then use non-ros generic tools upload them to a database of choice? Hopefully the solution doesn't involve playing bags, and instead can loop through contents as quickly as possible (and could be inherently parallelizable with multiple processes uploading into the same database simultaneously).
All-in-one solutions that have integrated web interfaces seem frequently incomplete, clunky and slow, and are likely to become unmaintained (like many of the answers to the related #q218678 and #q277427). I want to get the data into a place where either existing non-ros generic tools can be applied, or I can quickly write my own.
I'm not sure how relevant it still is (there could have been others created in the meantime), but Working with large ROS bag files on Hadoop and Spark seems to use "non-ros generic tools" (links to: valtech/ros_hadoop).
Similarly, but a commercial product it seems, there is the Spark support for rosbags by autovia. They presented recently at the ROS-Industrial Conf 2019 (search for:
Analytics for Autonomous Driving: Large-scale sensor data processing
) and it did look like a very convenient way to scale up bag processing and related operations. They also have afuse
plugin that exposes rosbags as a file and directory structure. Very convenient for scripting and non-ROS tools.On a more DIY level: mongodb_store and provided scripts could probably be massaged into something useful for this. The
message_store_node.py
script can be changed to take input from a rosbag and store everything in a mongo db instance. After that you'd be free to use whatever tool you'd like.That's a few levels down from having a hadoop or spark cluster to import data into though.