subscriber msg: shared_ptr vs. reference

asked 2015-08-05 07:11:19 -0500

ced
46 ●2 ●3 ●9

updated 2015-08-05 08:51:32 -0500

Hi!

I have a topic which is subscribed to by two different nodes. The subscribers used to have a

const boost::shared_ptr< nav_msgs::Odometry>

as variable. For several weeks, this worked well, until the code would randomly crash depending on how many lines of code were in the callback. Each time, it would kill the node and give a

boost::thread_interrupted

error. As an example, I could have three

std::cout << "This is my message" << std::endl;

outputs in the callback without problems, but only two of them would make the node crash. A single one would be fine again.

I don't know much about what this means or how these pointer "counters" work behind the scene, but I imagined that there could be a problem when two callback functions want to access the same pointer at the same time from different nodes, i.e. different threads.

By replacing the

const boost::shared_ptr< nav_msgs::Odometry>

by a reference

const nav_msgs::Odometry&

the problem disappeared (at least up to now), which would support my crazy theory.

So the question: Does this explanation remotely make sense?

Thanks!

EDIT: For the sake of completeness, here is the gdb output of the crash.

#0 0x00007ffff5997cc9 in raise () from /lib/x86_64-linux-gnu/libc.so.6

#1 0x00007ffff599b0d8 in abort () from /lib/x86_64-linux-gnu/libc.so.6

#2 0x00007ffff62a2535 in __gnu_cxx::__verbose_terminate_handler()

() from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

#3 0x00007ffff62a06d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

#4 0x00007ffff62a0703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

#5 0x00007ffff62a0922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

#6 0x00007ffff6e80815 in bool boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l="">

(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l=""> const&) () from /opt/ros/indigo/lib/libroscpp.so

#7 0x00007ffff6e7e6bd in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so

#8 0x00007ffff6ec24e5 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/indigo/lib/libroscpp.so

#9 0x00007ffff6eaaaeb in ros::spin() () from /opt/ros/indigo/lib/libroscpp.so

#10 0x0000000000671323 in main ()

edit retag flag offensive close merge delete

add a comment

1

answered 2015-08-17 08:27:40 -0500

ced
46 ●2 ●3 ●9

ok, problem found!! It had absolutely nothing to do with shared_ptr vs. references, etc. The reason for the crash was that two libraries I was including were using different versions of the boost library, which for some reason caused the nodes to crash. So, if you have mysterious node crashes with "boost::thread_interrupted" errors, check whether any of your packages have their own inclusion of boost.

edit flag offensive delete link

add a comment

2

answered 2015-08-05 07:48:27 -0500

paulbovbel

4518 ●12 ●49 ●80 http://www.bovbel.com/

You can see a list of supported signatures here:

http://wiki.ros.org/roscpp/Overview/P...

With regards to multiple nodes with const boost::shared_ptr< nav_msgs::Odometry> callbacks, each node would perform its own independent deserialization of the ROS message coming in on its socket connection to the publisher, since nodes are independent in that sense. Different nodes are not separate threads, they are separate processes, and ROS doesn't (natively) support any inter-process memory-sharing type of schemes.

According to roscpp spec, if you had separate threads in one node both processing a callback like that, because you don't get a ConstPtr (const boost::shared_ptr< const nav_msgs::Odometry>), each callback would actually get a deep-copy of the data under the hood, in case the callback tries to modify the contents of the odometry message.

As far as the exception you're getting, there's hopefully a more reasonable explanation that's due to your client code. Feel free to link to a gist or something. Otherwise if it's something more insidious, it should definitely be reported as a bug.

edit flag offensive delete link

Comments

For completeness (I'm sure you know @paulbovbel):

[..] ROS doesn't (natively) support any inter-process memory-sharing type of schemes.

It does, but only in nodelets, and there concurrent access of msgs can become an issue. But the OP is using nodes, not nodelets.

gvdhoorn ( 2015-08-05 08:07:34 -0500 )edit

Thanks! I deleted my previous answer as the error must be somewhere else. I ended up getting a boost::thread_interrupted error again, just with a different number of lines. I will try and see if I can fabricate a minimal example which exhibits this behaviour.

ced ( 2015-08-05 08:24:02 -0500 )edit

AFAIK nodelets are not inter-process, which is why they're loaded into a single nodelet manager process and share a threadpool.