Nodelet manager throws ros::serialization::StreamOverrunException
Hi, I haven't been able to find enough online material related to serialization overruns, so I'm asking here for the first time. Feel free to point out any improvements that I can make to the question in order to fit the forum's rules.
Setup
I'm working with ROS kinetic and pcl-1.9.1, running the ros master in a server and launching the nodes/nodelets in a docker container running on the same machine.
I'm working with 3 KinectOne cameras which run simultaneously for 3D object recognition. My first try was to join the three pointclouds and use a single processing pipeline. As an attempt to speed up the process, i have ported my nodes to nodelets to exploit the zero-copy transport as I work with 540x960 pointclouds. Furthermore, I decided to make three processing pipelines, one for each camera, as I need to maintain the organized structure of the pointclouds for recognition purposes. This multiplied my number of nodelets by three, and now when I run the program I get the following error
terminate called after throwing an instance of ros::serialization::StreamOverrunException
what(): Buffer Overrun
[manager-1] process has died [pid 30560, exit code -6, cmd /opt/ros/kinetic/lib/nodelet/nodelet manager __name:=manager __log:=/root/.ros/log/ebd68120-3f4e-11e9-8235-bc305b9d52e9/manager-1.log].
log file: /root/.ros/log/ebd68120-3f4e-11e9-8235-bc305b9d52e9/manager-1*.log
Problem hints
I've tried to run the program adding one step of the process at a time, and the problem arises when adding the following nodelet to the manager: SurfaceSegmentationNodelet. At this point I have 15 nodelets loaded under the same manager, and my goal is to end up with 21 nodelets under one manager. 2 nodelets are required in order to communicate with each KinectOne, and I have written the other 15, all of which have their own dynamic_reconfigure server, and all of them publish at an approximate rate of 6Hz. All publihsers/subscribers have a queue size of 5. I think that the problem might come from any of the following sources:
- I'm asking too much for a single nodelet manager
- I need to increase the queue_size (I've seen this approach in other answers but as I see it, a queue size of five for a publisher that runs at 6 Hz should be enough)
- Having that many reconfigure servers somewhat overloads the manager's communication
Any help will be appreciated, thank you in advance!
UPDATE
Running only 1 pipeline instead of 3 throws no exception. I guess then that the problem might be that I launch too many nodelets for one single manager. Can anyone confirm that?
UPDATE #2
I ran both the manager and the nodelets under gdb
and got the following output (not showing the memory map). Apparently it does come from one of my SurfaceSegmentationNodelets, but I have little to no clue of what is causing the problem.
[pcl::ExtractIndices::applyFilter] The indices size exceeds the size of the input.
[pcl::ExtractIndices::applyFilter] The indices ...
It would probably be good to know where exactly that exception is thrown. Have you tried running the manager process in
gdb
and then look at the backtrace to see what is going on exactly?@gvdhoorn It seems to come from the suspicious surface segmentation nodelet. However, if I run three managers (one for each camera pipeline), the error does not happen. Any clue?
Sometimes the error is this one, also related to pointers and memory [pcl::ExtractIndices::applyFilter] The indices size exceeds the size of the input. * Error in `/root/ws/devel/lib/nodelet/nodelet': munmap_chunk(): invalid pointer: 0x00007ffe18013050 * ======= Backtrace: =========
Looks like indexing into some array or vector is not done correctly.
As to the stacktrace: did you build things with
Debug
symbols enabled? I'm not seeing any line nrs.I believe I did enable debugging symbols, as I added
-g
to theCMAKE_CXX_FLAGS
. If it helps, after the memory map I get the following line:But I've read that this is just an issue of libraries compiled in other directories.
I'm not too worried about "other libraries". It's your own for which it would be convenient to have line nrs. Typically gdb shows those if it can.
Another (unrelated) question btw: why are you running this as
root
?it's likely that this is actually the real issue here. Accessing memory out-of-bounds is a recipe for
SEGFAULT
s. Are you doing any input scaling, or manually setting up arrays/lists/vectors?