Ask Your Question
1

rosbag exits recording abruptly after maximum file size is reached [closed]

asked 2014-09-30 14:55:34 -0500

ajain gravatar image

updated 2014-10-01 12:50:37 -0500

I'm running octomap_mapping on my robot, and also want to record data for post processing. The data mostly includes /tf messages and /pointCloud2 data. I running my code on a real robot, so I do wireless ssh into the robot and then record the data on my remote machine. In the middle of recording, rosbag exits after recording 167Mb data and prints "Killed" and exit, leaving a .active file.

I have not had this issue before (when I run rosbag on the same computer as running ROS), I was wondering if this is happening because I'm recording data on a remote machine rather than the computer on the robot. Can someone confirm this theory and suggest a solution?

Here's a link to someone else having similar issues: http://ros-users.122217.n3.nabble.com/rosbag-exits-abnormally-td3637291.html

Edit: Here is the actual output from remote machine running rosbag command over ssh. This time it didn't crap out at 167Mb, 13Mb instead. I think 167Mb is not even a significant number as this may be just a chunk of data it was trying to store before it exited.

terminate called after throwing an instance of 'rosbag::BagIOException'
  what():  Error writing to file: writing 13571512 bytes, wrote 7480895 bytes

Also, here is the kernel log:

[13694.190638] wlan0: deauthenticating from d8:67:d9:c2:d0:a0 by local choice (reason=3)
[13694.207447] cfg80211: Calling CRDA to update world regulatory domain
[13694.216192] cfg80211: World regulatory domain updated:
[13694.216198] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[13694.216202] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13694.216205] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13694.216208] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[13694.216211] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13694.216213] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13695.752099] PM: Syncing filesystems ... done.
[13695.982204] PM: Preparing system for mem sleep
[13697.548484] Freezing user space processes ... (elapsed 0.002 seconds) done.
[13697.550530] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[13697.551847] PM: Entering mem sleep
[13697.552000] Suspending console(s) (use no_console_suspend to debug)
[13697.552241] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[13697.552364] sd 0:0:0:0: [sda] Stopping disk
[13698.153598] [drm] Disabling audio 0 support
[13698.548633] PM: suspend of devices complete after 995.697 msecs
[13698.548815] PM: late suspend of devices complete after 0.179 msecs
[13698.564658] pcieport 0000:00:04.0: System wakeup enabled by ACPI
[13698.596559] ehci-pci 0000:00:12.2: System wakeup enabled by ACPI
[13698.612545] ohci-pci 0000:00:12.0: System wakeup enabled by ACPI
[13698.612632] PM: noirq suspend of devices complete after 63.761 msecs
[13698.612680] ACPI: Preparing to enter system sleep state S3
[13698.624727] PM: Saving platform NVS memory
[13698.628027] Disabling non-boot CPUs ...
[13698.628477] Broke affinity for irq 17 ...
(more)
edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by ajain
close date 2014-10-17 01:08:46.172271

Comments

Can you explain what you mean by "maximum file size reached" in the question title? Is the robot's disk full?

ahendrix gravatar imageahendrix ( 2014-09-30 15:02:43 -0500 )edit

When I start recording data on my remote machine, rosbag process exits after recording 167Mb of data. There is more than enough space on my remote machine to store up to GBs of data.

ajain gravatar imageajain ( 2014-09-30 15:22:22 -0500 )edit

Which command-line options are you passing to rosbag? Is there anything in your kernel log (dmesg) indicating that rosbag was killed because the system ran out of memory?

ahendrix gravatar imageahendrix ( 2014-09-30 16:21:51 -0500 )edit

Deafult max size of rosbag is 256MB

use -b option to speciy bag size and try recording

-b SIZE, --buffsize=SIZE
Use an internal buffer of SIZE MB (Default: 256, 0 = infinite).

rosbag record -b 1024 /chatter
bvbdort gravatar imagebvbdort ( 2014-09-30 17:30:45 -0500 )edit

-b specifies the buffer size, not the maximum bag file size. This is how much memory rosbag will use while recording messages. If the buffer fills up, additional messages will be dropped until some data can be written to disk.

ahendrix gravatar imageahendrix ( 2014-09-30 18:25:50 -0500 )edit

I was confused by the same thing. "internal buffer" refers to the internal memory that ROS uses to store data temporarily before dumping it on to the disk.

ajain gravatar imageajain ( 2014-09-30 18:43:32 -0500 )edit

Sorry, got blocked on hardware, will post results as soon as robot is working again. Thanks

ajain gravatar imageajain ( 2014-09-30 18:44:04 -0500 )edit

This pops up another question though, what does dropping additional data mean, will ROS not record more data? If so, how can I make sure data is continuously recording (and being stored on disk before filling up internal buffer) and I don't miss any pieces of it?

ajain gravatar imageajain ( 2014-09-30 18:47:42 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
4

answered 2014-10-01 13:27:47 -0500

ahendrix gravatar image

updated 2014-10-01 13:29:22 -0500

It doesn't look like rosbag is being killed because it's out of memory; rather it looks like it's dying because it doesn't write the expected number of bytes to disk.

The causes for this sort of failure are fairly obscure. The man page for write(2) suggests a few possible causes:

  1. The disk is full
  2. You've hit your maximum file size limit (RLIMIT_FSIZE)
  3. The call to write was interrupted

You've already indicated that that disk isn't full, which rules out #1 It fails at different sizes for different runs, which probably rules out #2

That leaves #3, which suggests that the call is being interrupted for some reason. I don't really have much insight into what might be interrupting it, and the error reporting within rosbag isn't reporting which errno was set after the failed write, so it's difficult to diagnose further.

The write man page also suggests that there are other, system-specific reasons that write could fail, but they're even more obscure and difficult to diagnose.

You may want to try some experiments with a simpler file creation tool to determine if the particular chunk size or file systems limits are getting in the way. I would try:

dd if=/dev/zero of=test1.data bs=13M count=1 # write a 13MB chunk to disk. If this fails, the chunk size may be the problem
dd if=/dev/zero of=test2.data bs=1M # write 1MB chunks to disk until it's full. This is similar to how rosbag writes, but much simpler. If this fails for any reason other than the disk being full, that is very interesting
dd if=/dev/zero of=test3.data bs=13M # write 13MB chunks to disk until it's full.

(you should probably remove the test files created between tests)

If you can provide the flags that you're passing to rosbag, those may also help debug the problem further.

edit flag offensive delete link more

Comments

Thanks for the explanation ahendrix. It makes sense. I haven't been using any flags other than -a for these experiments (rosbag record -a). Next thing worth trying would be to simply run rosbag on the robot computer itself rather than recording it on a remote machine over a network using ssh.

ajain gravatar imageajain ( 2014-10-01 13:47:43 -0500 )edit

You definitely do not want to record everything remotely. You will saturate your network quickly and lose a lot of information.

tfoote gravatar imagetfoote ( 2014-10-01 18:19:46 -0500 )edit

Question Tools

1 follower

Stats

Asked: 2014-09-30 14:55:34 -0500

Seen: 1,765 times

Last updated: Oct 01 '14