# Recover corrupted bag file

I have a large recording (~16GB) of lidar data (point clouds, objects...) in a rosbag .bag file (done with rqt_bag).

The file produced an error when playing back:

 Error:   Received an invalid TCPROS header.  Each line must have an equals sign.
at line 103 in /tmp/binarydeb/ros-noetic-cpp-common-0.7.2/src/header.cpp
[FATAL] [1598618534.165082911]: Error reading FILE_HEADER record


I used rosbag reindex and the resulting file does not give any more errors, however there seem to be no topics recognized. rosbag info only shows

 version: 2.0
size:    16.0 GB


I then tried to use rosbag fix, which produces an empty bag (4.1kb in size).

During recording, the computer may have gone into sleep mode for a brief moment, which I suspect is the reason for the corruption. When viewing the file in vim, I see that the data should be there, so I hope there is some way of recovering it. The experiment is hard to repeat, so I would be extremely grateful for some hint about what I can do.

I uploaded the first lines of the bag file before reindexing here: https://drive.google.com/file/d/1Tzab...

ROS distro: noetic

edit retag close merge delete

## Comments

Corruption with .bags is typically non-trivial to fix, and there aren't really any tools which automate the process. Re-indexing is typically also not something which fixes these kind of problems.

I'm not saying it's impossible, but unless it's a trivial fix, I would not expect an easy solution.

During recording, the computer may have gone into sleep mode for a brief moment, which I suspect is the reason for the corruption

What did you do after the PC resumed? A common problem with .bags is for them to not have the required blocks at the end which rosbag needs to use the file.

Is the FILE_HEADER error thrown immediately at the beginning, or after some time?

( 2020-08-28 11:25:16 -0500 )edit

I did not do anything after the PC resumed - I only saw that the visualization in rviz was ok and rosbag was also still running and the file kept growing, so I assumed everything was ok. The FILE_HEADER error is thrown immediately.

Is there some documentation on the required blocks at the end? I only found http://wiki.ros.org/Bags/Format/2.0 on rosbags. I've also uploaded the last lines of the file to https://drive.google.com/file/d/1-ZCp... .

Thanks a lot for your help!

( 2020-08-29 01:49:50 -0500 )edit

I have a few ideas, and I'm willing to take a look (in the next couple weeks), but I'm going to need (much) more of the .bag and I can't guarantee anything (ie: whether it's fixable or not).

If you're willing/able/allowed to share it: it's not compressed right now (at least not the chunks I could identify), could you try something like this to compress it and see what the resulting size is:

pxz -z -k -v /path/to/corrupted.bag


Make sure you have the pxz package installed.

Note: this will create corrupted.bag.xz in /path/to and use all CPU cores available, so preferably use a high spec machine.

Also: as a first attempt: try removing the bytes from offset 0x0D to 0x50 (ie: remove the sequence of 68 bytes starting at offset 0x0D). I'm not sure what ...(more)

( 2020-08-30 05:48:58 -0500 )edit

I tried removing the bytes you mentioned. There's a new error:

 [FATAL] [1598810894.826603256]: Error reading from file: wanted 1030058105 bytes, read 4063660 bytes


It compressed to ~500MB and I uploaded it here: https://drive.google.com/file/d/1yFLn...

I'll also keep trying to get a better understanding of bags and fix it and will keep you updated here.

( 2020-08-31 02:26:46 -0500 )edit

I replaced the first part of the file with that of another bag file with the same setup. There's no error anymore, but it does not recognize any messages. Also tried opening it in spectx, which recognizes a few messages, but only from less than the first second of the recording. rosbag info shows the correct topics, but 0 messages. Don't know if what I'm doing makes much sense, but I want to try removing the first messages from the file. Is there some marker that separates messages?

( 2020-08-31 05:01:57 -0500 )edit

I replaced the first part of the file with that of another bag file with the same setup.

That works to 'silence' the errors, but not to make the bag usable. The bag header structure is located in that "first part" and it actually contains important information about the data in the .bag. Without that information, rosbag will not be able to find anything in it.

I've tried a few things, and I've gotten a bit further, but there is something really strange about the .bag. It's as-if rosbag (or any of the underlying OS systems) decided to distribute the contents (ie: bytes) of some blocks all around the file at random locations. I'm not entirely sure yet, but it's strange to "suddenly" see some bytes (which appear to belong to a CHUNK_INFO record fi) appear in the middle of a list of INDEX_DATA records ...(more)

( 2020-09-01 02:17:57 -0500 )edit

Just a quick update: I've not been able to make much progress. There still appear to be parts in the bag overwritten by others.

The most efficient way forward would likely be to scan the bag for CHUNKs and CONNECTION_INFO headers and then piece together a new bag (sort of what reindex does).

( 2020-09-06 01:54:14 -0500 )edit

Thanks for all the effort! I will look into identifying these headers, but at this point, I think it will be simpler to arrange to repeat the experiment. If I do make some progress recovering the file, I will update this post.

( 2020-09-08 06:02:29 -0500 )edit