Robotics StackExchange | Archived questions

Node seems to unwantedly automatically unsubcribe from topic

Hi,

Note: Edits have been made and I found a workaround , see EDIT 5, 6 and 7 below.

General information and context: WIndows 8.1 64-bit running Virtual Box VM: Ubuntu 16.04.6 LTS with ROS Kinetic. I'm running a setup where a Husarion Rosbot 2.0 is driving around in the generic maze world, publishing /odom, /scan and listening to /cmdvel among other things. The published odometry and laserscan are used by the obstacledetector package made by tysik (github. This all working fine.

The following is where the issue is: I have written two nodes in python, the first is called obstaclevisualisation and it subscribes to the /obstacles topic that the obstacledetector is publishing, as well as the /odom topic from Gazebo. It filters on range and reclassifies moving obstacles as agents, then publishes this to the /sfmmpdm_obstacles topic.

rostopic echo /sfmmpdm_obstacles continuously shows filtered obstacle data so this seems to be running correct as well.

The second node is called sfmmpdm and it subscribers to the /sfmmpdmobstacles topic as well as the /odom topic. In the callback methods for these subscriptions I have a print statement writing either 'callbackOdom' or 'callbackSFMObs'. The /odom continues to work, but somehow the /sfmmpdm_obstacles topic gets unsubscribed without me telling it to.

The rqt_graph while everything is working looks like:https://imgur.com/V5ooPo3

And when it breaks down it looks like:https://imgur.com/DhPiIgs

Sequentially the printed 'callbackSFMObs' also stop at that moment.

I subscribe to the topics using the following python code:

#!/usr/bin/env python

import rospy;
import numpy as np
import pygame

from obstacle_detector.msg import SFMObstacles
from nav_msgs.msg import Odometry
from geometry_msgs.msg import Twist
from tf.transformations import euler_from_quaternion

def main_loop():
    # Initialize node, publisher and subscribers.
    rospy.init_node('sfm_mpdm')
    rospy.Subscriber("sfmmpdm_obstacles", SFMObstacles, callbackSFMObs)
    rospy.Subscriber("odom", Odometry, callbackOdom)
    pub = rospy.Publisher('cmd_vel', Twist, queue_size = 1)

    # Loop at 'rate' times per second.
    elapsed = 0                 # Initialize in game delta time
    step_count = 0                  # Initialize forward simulation counter
    previous_time = rospy.get_rostime() # Get current time 

    rate = rospy.Rate(100)  # 100 Hz
    while not rospy.is_shutdown():
        # --- Time keeping, run at 100 Hz ---
        now = rospy.get_rostime()
        # Calculate elapsed time between loops in nanoseconds (a.k.a. delta time).
        time_difference =(now - previous_time).to_nsec()    
        previous_time = now
        # Calculate delta time from ms to s.
        seconds = elapsed / 1000.0      
        # Calculate from nanoseconds to milliseconds    
        elapsed = time_difference / 1000000.0   
        # Count the amount of steps in the elapsed time. Default is 10 per second.
        step_count += elapsed / 100.0   


        # Other code that runs in this loop, snipped for readability.

        # --- Set up and publish the /cmd_vel topic to the Rosbot. --- 
        twist = Twist()
        twist.linear.x = twist_linear_x
        twist.linear.y = 0
        twist.linear.z = 0
        twist.angular.x = 0
        twist.angular.y = 0
        twist.angular.z = twist_angular_z
        # pub.publish(twist) 
        rate.sleep()

# --- Starting point ---
if __name__ == '__main__':
    try:
        main_loop()
    except rospy.ROSInterruptException:
        pass

pygame.quit()
quit()

Why does is suddenly stop being subscribed to the /sfmmpdm_obstacles topic (the subscription to the /odom topic continues just fine, which is good).

EDIT: From the feedback in the comments I have adapted my code: I have replaced the pygame time keeping code with ros specific code and added a delta time function to replace the pygame delta time part. The changes have been made in the original code. However, after a while the node still unsubsribes.

EDIT2: As suggested by gvdhoorn I've taken a step back and made a bare-bones node, the full code looks like:

#!/usr/bin/env python

import rospy;
import numpy as np

from obstacle_detector.msg import SFMObstacles
from nav_msgs.msg import Odometry
from geometry_msgs.msg import Twist

# --- Subscriber callback methods ---
def callbackOdom(data):
    print("callbackOdom")

def callbackSFMObs(data):
    print("callbackSFMObs")

# --- End of Subscriber callback methods ---

def main_loop():
    # Initialize node, publisher and subscribers.
    rospy.init_node('sfm_mpdm')
    sfmobs_subscriber = rospy.Subscriber("sfmmpdm_obstacles", SFMObstacles, callbackSFMObs)
    odom_subscriber = rospy.Subscriber("odom", Odometry, callbackOdom)
    pub = rospy.Publisher('cmd_vel', Twist, queue_size = 1)

    # Loop at 'rate' times per second.
    elapsed = 0                         # Initialize in game delta time
    step_count = 0                          # Initialize forward simulation counter
    previous_time = rospy.get_rostime() # Get current time 

    rate = rospy.Rate(100)          # 100 Hz
    while not rospy.is_shutdown():
        # --- Time keeping, run at 100 Hz ---
        '''
        now = rospy.get_rostime()
        time_difference =(now - previous_time).to_nsec()    # Calculate elapsed time between loops in nanoseconds (a.k.a. delta time).
        previous_time = now
        seconds = elapsed / 1000.0          # Calculate delta time from ms to s.
        elapsed = time_difference / 1000000.0   # Calculate from nanoseconds to milliseconds
        step_count += elapsed / 100.0           # Count the amount of steps in the elapsed time. Default is 10 per second.
        '''

        # --- Set up and publish the /cmd_vel topic to the Rosbot. --- 
        twist = Twist()
        twist.linear.x = 0
        twist.linear.y = 0
        twist.linear.z = 0
        twist.angular.x = 0
        twist.angular.y = 0
        twist.angular.z = 0
        # pub.publish(twist)
        rate.sleep()

# --- End of Main ---

# --- Starting point ---
if __name__ == '__main__':

    try:
        main_loop()
    except rospy.ROSInterruptException:
        pass

    quit()

The issue still persist. The odom function works fine, but the sfmmpdm_obstacles doesn't come in anymore. When inspecting it with rosnode info:

$ rosnode info /sfm_mpdm 
--------------------------------------------------------------------------------
Node [/sfm_mpdm]
Publications: 
 * /cmd_vel [geometry_msgs/Twist]
 * /rosout [rosgraph_msgs/Log]

Subscriptions: 
 * /odom [nav_msgs/Odometry]
 * /sfmmpdm_obstacles [obstacle_detector/SFMObstacles]

Services: 
 * /sfm_mpdm/get_loggers
 * /sfm_mpdm/set_logger_level


contacting node http://192.168.1.138:43025/ ...
Pid: 3954    
Connections:
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /cmd_vel
    * to: /serial_node
    * direction: outbound
    * transport: TCPROS
 * topic: /cmd_vel
    * to: /rosbot_ekf
    * direction: outbound
    * transport: TCPROS
 * topic: /odom
    * to: /rosbot_ekf (http://192.168.1.130:36747/)
    * direction: inbound
    * transport: TCPROS

So this feels strange to me. It's not getting anything but it is marked as subscribed.

Similarly in the rostopic info /sfmmpdmobstacles Type: obstacledetector/SFMObstacles

Publishers: 
 * /obstacle_visualisation (http://192.168.1.138:43097/)

Subscribers: 
 * /sfm_mpdm (http://192.168.1.138:43025/)

Just to be sure I also checked it out with rqtgraph, there it does show the sfmmpdm node is not subscribe/connected to the /sfmmpm_obstacles topic.

EDIT 3: I noticed a certain inbound connection is lost when the SFMObs callbacks stop printing:

 * topic: /sfmmpdm_obstacles
    * to: /obstacle_visualisation (http://192.168.1.138:44046/)
    * direction: inbound
    * transport: TCPROS

That one disappears.

Also further testing with the sfmmpdm_obstacle publishing frequency showed that going from 100 Hz to 10 Hz extends the moment of disconnection (feels like 10x but don't pin down me on this) but in the end still disconnects. (addition: Setting the rate in the publisher to 200 Hz makes it halt at 12, 5, 23, 11 and 77 seconds, so on second inspection it doesn't feel very time related).

I have also tried out using timers and rospy.spin() and even putting it in a class instead like this page suggests: roboticsbackend

EDIT 4: I am running two seperate nodes (one is the bare sfm node, the other a copy but with a different name) both listening to the /sfmmpm_obstacles topic. One started well before the other but both stop at exactly the same time. Which would suggest the connection to both is lost at once? Something to do with the publishing node? Or the connection as a whole? Maybe the message being received is 'crashing' them?

EDIT 5: I've been looking into what exactly makes the nodes disconnect and it seems like it's the content of the message in the topic itself. When sending different message types or empty messages the nodes stay connected and the topic thus stays subscribed to, but when I apply some distance filtering, after a (random) time, the subscription fails. I've tried many different combination (starting with an empty message and slowly building up), here are the more interesting results of my test:

The code that takes the list, filters it and then adds it to the message to be send looks as follows:

# Variable initalization
segments_list = []  # Empty list of segments within range
incoming_data   = "empty"               # "Empty" incoming_data 

# Distance to line function
def distanceToLine(A, B, P):
    AP = P - A
    AB = B - A

    magnitudeAB = np.linalg.norm(AB) * np.linalg.norm(AB)
    ABAPproduct = np.dot(AP, AB)
    distance = ABAPproduct / magnitudeAB

    if (distance < 0):
        return np.linalg.norm(AP)
    elif (distance > 1):
        return np.linalg.norm(P - B)
    else:
        return np.linalg.norm(P - (A + AB * distance))

# Callback function
def callbackObs(data):
    global segments_list
    global incoming_data
    incoming_data = data
    # segment_size = len(data.segments)
    segments_list = []
    for segment in data.segments:
        a = np.array([segment.first_point.x, segment.first_point.y])
        b = np.array([segment.last_point.x, segment.last_point.y])
        p = np.array(latest_robot_position)
        if (distanceToLine(a, b, p) < 5):
            segments_list.append(segment)

# Publishing loop
while not rospy.is_shutdown():
    if incoming_data != "empty":
        sfm_obs = SFMObstacles()
        sfm_obs.header = incoming_data.header
        sfm_obs.segments = segments_list
        sfm_obs.circles = incoming_data.circles
        sfm_obs.agents = incoming_data.circles
        pub.publish(sfm_obs)
    rate.sleep()

I'm not sure what makes this fail.. Any idea's? Suggestions?

EDIT 6:

When adding a rospy.loginfo(sfm_obs) line right before publishing for some reason the failure also doesn't occur. Nothing else has been changed from the code above in edit 5. Also removing that line again makes it fail after a while once more. What is going on!? I'm not completely stumped. I thought it was the message content but maybe it's something else once again..

EDIT 7: I've found a work around. By setting the publishing rate lower (e.g. 30 Hz) and then calling rate.sleep() before publishing to the /sfmmpdm_obstacles topic the failure doesn't seem to happen anymore. The changed code part looks as follows:

if incoming_data != "empty":
    sfm_obs = SFMObstacles()
    sfm_obs.header = incoming_data.header
    sfm_obs.segments = segments_list
    sfm_obs.circles = obstacles_list
    sfm_obs.agents = agents_list
    rate.sleep()
    # rospy.loginfo(sfm_obs)
    pub.publish(sfm_obs)
else:
    rate.sleep()

For now I can continue my work, but it does not feel like the issue is really solved nor do I really understand it. Giving it more time before publishing seems to help for some reason..

Btw, this post is getting kind of long due to all the code. Is there a way to hide it in spoiler tags or the like?

Kind regards, Chubb

Asked by Chubb on 2019-12-10 11:29:35 UTC

Comments

You probably should put those rate/sleep calls back in.

Asked by jordan on 2019-12-10 13:58:55 UTC

Is there a way for rate or sleep supply a delta time, like the pygame clock.tick does? Because I need that for the remainder of my code in that node (left out for readability).

Asked by Chubb on 2019-12-10 14:42:47 UTC

I would also recommend to store the reference to your rospy.Subscriber instances in a local variable. This will prevent them from going out of scope, destroying the objects and deregistering the subscriptions.

Can you explain why you don't store the result of calling the rospy.Subscriber(..) constructor in a variable?

Asked by gvdhoorn on 2019-12-10 16:21:05 UTC

To be honest I didn't know you could store the reference to the Subscriber. I added this as such:

sfmobs_subscriber = rospy.Subscriber("sfmmpdm_obstacles", SFMObstacles, callbackSFMObs)
odom_subscriber = rospy.Subscriber("odom", Odometry, callbackOdom)

But the issue still persists; after a while 10~30 seconds sfmmpdm_obstacles is no longer subscribed to. I have yet to implement the rate and sleep instead of the pygame clock.tick. Trying to figure out how to get delta time from it.

Asked by Chubb on 2019-12-10 16:48:54 UTC

That appears to be quite a busy loop you have there. What stops it from pegging the cpu? You probably should have some sort of delay/sleep in there. Does the issue go away when you put those rate/sleep calls back in? I see no reason why you can't have both bits of code running at the same time.

Asked by jordan on 2019-12-10 21:14:34 UTC

I have replaced the pygame time keeping code with ros specific code and added a delta time function to replace the pygame delta time part. It now looks like this:

rate = rospy.Rate(100)          # 100 Hz
    while not rospy.is_shutdown():
        # --- Time keeping, run at 100 Hz ---
        now = rospy.get_rostime()
        time_difference =(now - previous_time).to_nsec()    # Calculate elapsed time between loops in nanoseconds (a.k.a. delta time).
        previous_time = now
        seconds = elapsed / 1000.0          # Calculate delta time from ms to s.
        elapsed = time_difference / 1000000.0   # Calculate from nanoseconds to milliseconds
        step_count += elapsed / 100.0           # Count the amount of steps in the elapsed time. Default is 10 per second.

        # All other code that is in the loop, snipped for readability

        rate.sleep()

However, after a while the node still unsubsribes.

Asked by Chubb on 2019-12-12 05:25:26 UTC

I would suggest to take a step back.

Create a minimal node that only subscribes, but doesn't really do anything.

Use regular rate.sleep().

Then start it in combination with the rest of your system (but of course not start your 'real node').

Now observe what happens.

If it still 'unsubscribes', we need to look into aspects outside your script perhaps.

If it doesn't, that would seem to give an indication that it's your code that causes this.

Post the code here, describe what happens and then we can start looking for a cause.

Right now there are too many variables -- including the rest of your application -- which we have no insight into which makes diagnosing this really difficult.

Finally: please update the code you show in your OP with the current state. Right now it still shows the incorrect version of the code.

Asked by gvdhoorn on 2019-12-12 05:56:55 UTC

I would also not use rqt_graph for this. We're probably only interested in the publications and subscriptions for the particular nodes involved here.

rosnode info and rostopic info would be sufficient for this.

Asked by gvdhoorn on 2019-12-12 05:58:15 UTC

I've made a new discovery with important information which should hopefully help pinpoint the issue; see EDIT 5 in the main topic.

Asked by Chubb on 2019-12-13 05:30:40 UTC

Ok another edit (EDIT 6). I'm now completely stumped. rospy.loginfo() seems to influence the failure as well.

Asked by Chubb on 2019-12-13 07:04:04 UTC

Found a workaround to the issue, adding the rate.sleep() before the publish line and lowering the rate (in this case to 30 Hz) seems to prevent the failure from happening. Can someone explain why this is and if this is indeed a workaround or the real fix?

Asked by Chubb on 2019-12-13 08:40:50 UTC

+1 for coming back and updating your work around.

Asked by billy on 2019-12-13 13:52:11 UTC

Answers