Robotics StackExchange | Archived questions

Node exit code -11

Hello,

I am creating a really simple subscriber to a published message. The message is from another node that process image and produce odometry. My node consist of 3 files, RoverMoveMain.cpp, RoverMoveSimple.cpp, and RoverMoveSimple.h

RoverMoveSimple.h

#include <stdlib.h>
#include <cmath>

#include "helper/constant.h"
#include "helper/typedef.h"
#include "helper/helper.h"
#include "boost/atomic.hpp"
#include "tf2/convert.h"
#include "tf2/LinearMath/Matrix3x3.h"

#ifndef ROVER_MOVE_H
#define ROVER_MOVE_H

using namespace std;

class RoverMoveSimple {
    protected:
        // Attribute
        NodeHandle n;

        Sub cur_pos_sub;
    public:
        // method
        RoverMoveSimple(NodeHandle* nh);

        void UpdatePosCB(const OdometryPtr& cur_pos);
};

RoverMoveSimple.cpp

RoverMoveSimple::RoverMoveSimple(NodeHandle* nh) {
    this->cur_pos_sub = nh->subscribe("odom", 1000, &RoverMoveSimple::UpdatePosCB, this);
}

void RoverMoveSimple::UpdatePosCB(const OdometryPtr& cur_pos){
    cout << cur_pos->pose.pose.position.x << ";" << cur_pos->pose.pose.position.y << "\n";
}

RoverMoveMain.cpp

#include "rover_move/RoverMoveSimple.h"

int main(int argc, char** argv) {
    ros::init(argc, argv, "RoverMove");

    NodeHandle n;
    RoverMoveSimple rm = RoverMoveSimple(&n);
    ros::spin();
    return 0;
}

The problem is, my node run okay for soem time, and then it will stop working and error. It is said the error code is -11. I have checked that probably related to segmentation fault.Is this correct? The problem is, how can this error happened? This is really just a simple node.

Update backtrace using gdb. It seems there are some problem with boost thread used by ros when delete message or somehing.

#0  0x729012b0 in ?? ()
#1  0x76f85f0c in boost::detail::sp_counted_base::release() ()
   from /home/pi/rover_workspace_for_dat2019/devel/lib/libRoverMove.so
#2  0x76ee5820 in boost::detail::sp_counted_impl_pd<ros::MessageDeserializer*, boost::detail::sp_ms_deleter<ros::MessageDeserializer> >::dispose() () from /opt/ros/kinetic/lib/libroscpp.so
#3  0x76ee9e38 in ros::SubscriptionQueue::call() () from /opt/ros/kinetic/lib/libroscpp.so
#4  0x76e8ed3c in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/kinetic/lib/libroscpp.so
#5  0x76e8fffc in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/kinetic/lib/libroscpp.so
#6  0x76eee420 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/kinetic/lib/libroscpp.so
#7  0x76ed45c0 in ros::spin() () from /opt/ros/kinetic/lib/libroscpp.so
#8  0x00011048 in main ()

Using debug mode in building:

Thread 1 "RoverMoveNode" received signal SIGSEGV, Segmentation fault.
0x73001ca8 in ?? ()
(gdb) bt
#0  0x73001ca8 in ?? ()
#1  0x76f86388 in boost::detail::sp_counted_base::release (this=0x73000ad8)
    at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_spin.hpp:103
#2  0x76ee5820 in boost::detail::sp_counted_impl_pd<ros::MessageDeserializer*, boost::detail::sp_ms_deleter<ros::MessageDeserializer> >::dispose() () from /opt/ros/kinetic/lib/libroscpp.so
#3  0x76ee9e38 in ros::SubscriptionQueue::call() () from /opt/ros/kinetic/lib/libroscpp.so
#4  0x76e8ed3c in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/kinetic/lib/libroscpp.so
#5  0x76e8fffc in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/kinetic/lib/libroscpp.so
#6  0x76eee420 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/kinetic/lib/libroscpp.so
#7  0x76ed45c0 in ros::spin() () from /opt/ros/kinetic/lib/libroscpp.so
#8  0x00011048 in main (argc=1, argv=0x7effef54)
    at /home/pi/rover_workspace_for_dat2019/src/rover_move/src/RoverMoveMain.cpp:6

Asked by mrrius on 2019-11-19 21:11:47 UTC

Comments

One thing that would help is if you launch your node in gdb: http://wiki.ros.org/roslaunch/Tutorials/Roslaunch%20Nodes%20in%20Valgrind%20or%20GDB. If you configure your build (catkin, ament) to have -DCMAKE_BUILD_TYPE=RelWithDebInfo (or Debug) it will allow you to get a backtrace in gdb. When xterm opens your node and you get a gdb prompt, type run, wait for a seg fault, then type bt. That will show which line is triggering the seg fault. If you get that output you can edit your question with the new information.

Asked by Thomas D on 2019-11-19 21:39:15 UTC

Did you forget to add an include for nav_msgs/Odometry.h? Otherwise, I don't think you can know what OdometryPtr is.

Asked by Thomas D on 2019-11-19 21:42:18 UTC

I have included it and make an alias for it in typedef.h. The problem is, the node run for several seconds until it died. So, the error is in runtime.

Asked by mrrius on 2019-11-19 22:01:13 UTC

Also, can you explain about dbg without xterm? I run it in ssh and there is no xterm window. I have followed the tutorial from your link and it seems it will produce log file in $ROS_HOME/core.pid.

Asked by mrrius on 2019-11-19 22:14:15 UTC

I got the bt. Updated in question.

Asked by mrrius on 2019-11-19 22:27:51 UTC

GDB backtraces are nice, but only really informative if you've compiled your code with Debug symbols enabled.

For a Catkin workspace, you can do that either by forcing the Debug (or RelWithDebInfo) CMAKE_BUILD_TYPE in your CMakeLists.txt, or by invoking Catkin with: catkin_make -DCMAKE_BUILD_TYPE=Debug.

Then run your node in gdb again.

Asked by gvdhoorn on 2019-11-20 03:10:27 UTC

I have tried to compile using debug mode and run the module gdb. It shows same error:

Thread 1 "RoverMoveNode" received signal SIGSEGV, Segmentation fault. 0x72901ca8 in ?? ()

and I print the backtrace. Still same, no new info. Any method to get more information? I am using ssh to rasppi so I cannot use xterm now. I cannot set any breakpoint when run in separate launch.

Asked by mrrius on 2019-11-20 03:23:39 UTC

It shows same error:

Thread 1 "RoverMoveNode" received signal SIGSEGV, Segmentation fault. 0x72901ca8 in ?? ()

and I print the backtrace. Still same, no new info

then you haven't built your workspace with Debug symbols enabled.

You must remove your build and devel folders from your workspace (or at least those of the package), otherwise it's likely that nothing gets actually rebuilt.

At the very least, the last line (ie: frame 8 "in main()") should show the exact line number where the SEGFAULT is caused.

Asked by gvdhoorn on 2019-11-20 03:24:57 UTC

Ok, I have done that. I updated the post. It seems the line it error is ros:: spin()

Asked by mrrius on 2019-11-20 03:59:11 UTC

The gdb output does not clearly show me where the error is. However, a lot of information is missing. In your include, the helper includes are not shown. The NodeHandle, Sub, and OdometryPtr types are not defined. Your package.xml and CMakeLists.txt are not shown, and that can be a source of errors. Based on what is shown I can't find where the seg fault is being triggered. I modified your files locally and fixed up a few things (missing data types, removing unused includes, adding an #endif) and everything ran fine for me. I was able to use rostopic pub to publish on the odom topic and this node printed out position. If you can provide the same setup that is not working for you I would be happy to look again.

Asked by Thomas D on 2019-11-20 08:45:40 UTC

So, I have done like you did. Publishing odom topic. It got the same error. I set the frequency of the publisher to be 30Hz and it seems it was the caused. Probably due to the high freq rate, there is a condition where data is late to be read but it is already release by ros. If I set it to 10Hz, it seems work finely. I run this in RasPi that probably has limitation.

The weird thing is, if I change the message type to simple string, there is no problem with 30Hz. Maybe the message size also affecting this. Odom type is more complicated and maybe take more time than string in processing.

Asked by mrrius on 2019-11-20 17:49:35 UTC

Answers