rosbag, the rosbag file format, and the rosbag python and C++ APIs are optimized to read data one message at a time, sequentially. In particular, the API only reads messages from disk as you request them (so you don't pay for the read until you do it) but the API for seeking isn't very efficient, so if you want to access your data in random order it may not be the best choice.
If you find yourself wishing for something better, consider:
- Your disk is much slower than RAM; if you need the whole dataset, the fastest and most efficient thing to do is load the whole dataset into RAM and then do random accesses on it (if it fits in RAM).
- Try to measure the read speed of your disk drive to get an idea of how fast the most efficient file format could possibly be, and compare that to your existing tools before you invent something new. (if there's not much to be gained, it's not worth spending time on it)
- Disks (particularly magnetic drives, but SSDs too) are much better at sequential reads than they are at random access. If you can store your data in chunks that are processed together, this will make your disk accesses faster and more efficient. (ROS messages stored in a bag might make good chunks)
- The most effective thing to do may be to invest in a fast SSD to make disk reads faster.