ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
1

Nodelet manager throws ros::serialization::StreamOverrunException

asked 2019-03-05 09:36:50 -0500

jgallostra gravatar image

updated 2019-03-06 06:23:16 -0500

gvdhoorn gravatar image

Hi, I haven't been able to find enough online material related to serialization overruns, so I'm asking here for the first time. Feel free to point out any improvements that I can make to the question in order to fit the forum's rules.

Setup

I'm working with ROS kinetic and pcl-1.9.1, running the ros master in a server and launching the nodes/nodelets in a docker container running on the same machine.

I'm working with 3 KinectOne cameras which run simultaneously for 3D object recognition. My first try was to join the three pointclouds and use a single processing pipeline. As an attempt to speed up the process, i have ported my nodes to nodelets to exploit the zero-copy transport as I work with 540x960 pointclouds. Furthermore, I decided to make three processing pipelines, one for each camera, as I need to maintain the organized structure of the pointclouds for recognition purposes. This multiplied my number of nodelets by three, and now when I run the program I get the following error

terminate called after throwing an instance of ros::serialization::StreamOverrunException
what():  Buffer Overrun
[manager-1] process has died [pid 30560, exit code -6, cmd /opt/ros/kinetic/lib/nodelet/nodelet manager __name:=manager __log:=/root/.ros/log/ebd68120-3f4e-11e9-8235-bc305b9d52e9/manager-1.log].
log file: /root/.ros/log/ebd68120-3f4e-11e9-8235-bc305b9d52e9/manager-1*.log

Problem hints

I've tried to run the program adding one step of the process at a time, and the problem arises when adding the following nodelet to the manager: SurfaceSegmentationNodelet. At this point I have 15 nodelets loaded under the same manager, and my goal is to end up with 21 nodelets under one manager. 2 nodelets are required in order to communicate with each KinectOne, and I have written the other 15, all of which have their own dynamic_reconfigure server, and all of them publish at an approximate rate of 6Hz. All publihsers/subscribers have a queue size of 5. I think that the problem might come from any of the following sources:

  • I'm asking too much for a single nodelet manager
  • I need to increase the queue_size (I've seen this approach in other answers but as I see it, a queue size of five for a publisher that runs at 6 Hz should be enough)
  • Having that many reconfigure servers somewhat overloads the manager's communication

Any help will be appreciated, thank you in advance!

UPDATE

Running only 1 pipeline instead of 3 throws no exception. I guess then that the problem might be that I launch too many nodelets for one single manager. Can anyone confirm that?

UPDATE #2

I ran both the manager and the nodelets under gdb and got the following output (not showing the memory map). Apparently it does come from one of my SurfaceSegmentationNodelets, but I have little to no clue of what is causing the problem.

[pcl::ExtractIndices::applyFilter] The indices size exceeds the size of the input.
[pcl::ExtractIndices::applyFilter] The indices ...
(more)
edit retag flag offensive close merge delete

Comments

1

It would probably be good to know where exactly that exception is thrown. Have you tried running the manager process in gdb and then look at the backtrace to see what is going on exactly?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-05 10:07:24 -0500 )edit

@gvdhoorn It seems to come from the suspicious surface segmentation nodelet. However, if I run three managers (one for each camera pipeline), the error does not happen. Any clue?

jgallostra gravatar image jgallostra  ( 2019-03-06 05:56:27 -0500 )edit

Sometimes the error is this one, also related to pointers and memory [pcl::ExtractIndices::applyFilter] The indices size exceeds the size of the input. * Error in `/root/ws/devel/lib/nodelet/nodelet': munmap_chunk(): invalid pointer: 0x00007ffe18013050 * ======= Backtrace: =========

jgallostra gravatar image jgallostra  ( 2019-03-06 06:02:05 -0500 )edit

Looks like indexing into some array or vector is not done correctly.

As to the stacktrace: did you build things with Debug symbols enabled? I'm not seeing any line nrs.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-06 06:24:05 -0500 )edit

I believe I did enable debugging symbols, as I added -g to the CMAKE_CXX_FLAGS. If it helps, after the memory map I get the following line:

54  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

But I've read that this is just an issue of libraries compiled in other directories.

jgallostra gravatar image jgallostra  ( 2019-03-06 07:01:10 -0500 )edit

I'm not too worried about "other libraries". It's your own for which it would be convenient to have line nrs. Typically gdb shows those if it can.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-06 07:42:04 -0500 )edit

Another (unrelated) question btw: why are you running this as root?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-06 07:42:20 -0500 )edit

The indices size exceeds the size of the input.

it's likely that this is actually the real issue here. Accessing memory out-of-bounds is a recipe for SEGFAULTs. Are you doing any input scaling, or manually setting up arrays/lists/vectors?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-06 07:43:26 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2019-03-07 10:23:45 -0500

jgallostra gravatar image

updated 2019-03-07 11:17:27 -0500

The ros::serialization::StreamOverrunException was caused by a bad memory management between nodelets. Thanks to @gvdhoorn, debugging the code showed that the exception was caused by:

  • Attempting to access a freed memory location
  • Attempting to free an already freed memory location

In the first nodelet of the pipeline I was sending a pointer to a PointCloud and forgot to re-allocate the pointer before modifying its contents. The subsequent nodelets therefore received a pointer to a memory location that had changed since the message was sent, causing the error. To sum up, make sure that you do not modify a sent message when working with nodelets.

EDIT

In my node, I published 4 messages:

void
SurfaceSegmentationNode::publishMsgs()
{
    this->marker_publisher_.publish(this->output_surfaces_markers_->getMarkerArray());
    this->surface_publisher_.publish(this->output_surfaces_msg_);
    this->full_cloud_publisher_.publish(this->input_cloud_full_);
    this->subsampled_cloud_publisher_.publish(this->input_cloud_subsampled_);
}

so before creating any new messages to send, I had to add

void
SurfaceSegmentationNode::resetMsgs()
{
    this->input_cloud_full_ = pointCloud::ConstPtr(new pointCloud());
    this->input_cloud_subsampled_ = pointCloud::ConstPtr(new pointCloud());
    this->output_surfaces_msg_ = scene_msgs::SurfaceArray::Ptr(new scene_msgs::SurfaceArray);
    this->output_surfaces_markers_->resetMarkers();
    this->surface_segmentation_.surface_vector.clear();
}

to allocate new space for the new messages and not mess with the memory. I hope this clarifies what I meant.

edit flag offensive delete link more

Comments

To sum up, make sure that you do not modify a sent message when working with nodelets.

ehm, that should be impossible, as all msgs are exchanged as ConstPtrs -- which is what is used to enforce the contract between publishers and subscribers.

I'm slightly confused by this statement:

In the first nodelet of the pipeline I was sending a pointer to a PointCloud [..]

nodelets typically just exchange messages, are you saying that you were actually sending around pointers to memory locations? I would not recommend doing that. Any subscriber that is not hosted by the same manager will be accessing "random memory". And that's just one of the potential problems.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-07 10:29:26 -0500 )edit

Well I thought that in order to make use of the zero-copy transport, nodelet messages should be published as shared pointers (as explained here), and that is what I meant by "sending a pointer to a PointCloud".

As for your first remark, I meant that the nodelet sends a message and after publishing it modifies its contents, without allocating a new shared pointer. This is stated as a practice to avoid in the last paragraph of the link commented above.

jgallostra gravatar image jgallostra  ( 2019-03-07 10:45:31 -0500 )edit

Question Tools

1 follower

Stats

Asked: 2019-03-05 09:09:46 -0500

Seen: 1,316 times

Last updated: Mar 07 '19