ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

[python]prevent callbacks from slowing down the main loop

asked 2021-01-28 06:05:53 -0600

felixN gravatar image

updated 2021-01-29 10:26:12 -0600

Hi,

I have a rather basic python2 node (it's the interface to the motor controllers)

  • a main loop running (theoretically) at 10Hz (using rate.sleep() )
  • 2 subscribers for velocities commands (only one is used at a given time, depending of the navigation mode) : the frequency should be about 10Hz on the active one (and 0 on the other one). Both have a queu size of 1
  • 1 subscriber for proximity sensors (about 10Hz). Queue_size=1
  • 2 publishers (publishing done from the main loop

Most of the time everything runs OK. However, sometimes, the main loop seems to be "stuck" for sometimes more than 1 second in a random place, while the message callbacks seem to be handled fine.

My guess is that sometimes, the main thread isn't executed for about 1 second, while the callback thread is still running.

From my understanding (please correct me if I'm wrong, I'm quite unsure about it), in Python there is only one thread running at a given time (even if multiple CPU cores are available). From what I understood in python (contrary to C++), the message callbacks are handled on a separate (python) thread).

So it seems that the callback thread gets executed but not the main thread.

NB : this even is rather execeptionnal (about once per minute under very heavy CPU load from other nodes, far less under lighter load).

For this specific use case, The main loop is more important than the message callbacks.

Is there a way to make sure the main loop gets executed (dropping messages in the input queues if needed) rather than processing all incomming messages and not executting fat enough the main loop?

A few potential ideas (but I'm not sure if they are and if so how) :

  • increase the main thread priority (or decrease the message handling thread) : from my understanding it is not possible because python is handling itself it's threads)

  • disable the message processing thread and une something like the C++ spinOnce function

  • have a way to explicitly aks for the latest data (ie not having any callbacks at all). If no new data arrived, getting "None" or getting the last recieved data are both fine to me

Thanks a lot in advance

Best

Felix

EDIT : as asked by mgruhler, I add my code (it's now the full version). NB : The code was started by a collegue who left the company and that I replaced (maybe based on an example he found on the web), but I did several changes afterwards (but I never modified the mutex part, so I'm not even sure if it is usefull).

#!/usr/bin/env python

import rospy
from math import pi as PI
import threading
import time

from roboclaw import Roboclaw
#import roboclaw_driver as roboclaw
from std_msgs.msg import String
from std_msgs.msg import Float64
from std_msgs.msg import Float32
from std_msgs.msg import Bool
from geometry_msgs.msg import Twist
from borobo.msg import capteur_proximite_msg

r = 0.155
e ...
(more)
edit retag flag offensive close merge delete

Comments

From what you described, I'd say this shouldn't be an issue. Are you sure it's the callbacks actually "blocking" the main thread? Maybe you could share your script, so that we could have a look...

mgruhler gravatar image mgruhler  ( 2021-01-29 01:04:37 -0600 )edit

I can't garantee that the callbakcs are blocking the main thread. But for the 3 times I looked in detail into the logs, each times the ros_loginfo printings of the main tread where blocked for a bit over 1 second (the blocking did occure in different places of the main loop each time, but then always continues iin the same order), while the printings from the callback continues to happen normally at 10Hz.

For the code, I will edit to add a simplified version of the code (if you think the full code is needed, I can ask my boss if he agrees that I publish it)

felixN gravatar image felixN  ( 2021-01-29 02:55:06 -0600 )edit

Just a gut feeling, but I'd first make sure that the mutex you are using is not blocking somewhere, especially in combination with the callbacks. I.e. nowhere in a callback should the mutex be used (and thus, also not in any function called from the callbacks) As you only show one but say you have three, I cannot have a look at that. Next thing to check would be everything that has to do with communication (i.e. maybe it is an issue with the serial communication to your robot, that is causing the delays?)

If you are sure that none of that is the issue, I would have a look again at how the callbacks might influence the main loop...

mgruhler gravatar image mgruhler  ( 2021-01-29 04:01:01 -0600 )edit

Thanks, The boss agreed that I publish the complete node, so there it is.

I never touched the part about the mutex, but I have the impression that it is never called from within any callback (I even have the impression that the mutex is useless).

For the serial communication, I will try to look deeper into it.

felixN gravatar image felixN  ( 2021-01-29 10:29:13 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2021-02-03 11:07:49 -0600

felixN gravatar image

I havn't completly eliminated the issue yet, but it seems that the blocking element is the call to rospy.loginfo (or warn).

If I remove all the rospy.logX calls, instead of the code getting blocked an average of 4.5 times per minute, it gets blocked only 0.2 times per minute (ie once every 5 minutes).

So the problem isn't totally solved, but at least quitte reduced.

I'm however wonderring :

  • why is rospy.log "hanging" at times?
  • what else might hand from time to time (for about 1s)? rate.sleep (with rate =10Hz)? rospy.publish? the callbacks? (or the serial communication to the motor controler?)

Thanks a lor in advance

Best

Felix

edit flag offensive delete link more

Comments

I'm not sure this should have been an answer. Anyways:

Logging to the console is slow, this will definitely slow down your program. As to whether this leads to what you describe as "hanging" I'm not so sure.

Without the real program (and HW, for that matter) it is really hard to diagnose the errors further. I suggest you start by stripping down your program one function at a time to see where the issue might actually be hidden. So basically, get rid of everything unnecessary in the first step, see if it is still appearing, then continue with that. Or you could also use a profiler to measure the time spent in functions. Maybe this will show you where your program is stalling...

Last, if your code is under version control (I do hope so), you are using git and there is a state where everything was ...(more)

mgruhler gravatar image mgruhler  ( 2021-02-04 01:21:29 -0600 )edit

Hi, thanks a lot! I admit that I'm not sure if this is really the cause, but removing the logging reduced the frequency of the problem by more than 20. So indeed, it might be that the logging is not the real issue, and that it is just by slowing down the program that it triggers the real cause.

Trying to use a profiler is a good idea, thanks!

The code is indeed under version control now, but I think that this bug goes back since before I started using git on it (the code was started by my predecessor). It was just that until now I had more important problems to solve. But otherwise, it would have been the best approach.

felixN gravatar image felixN  ( 2021-02-04 03:07:18 -0600 )edit

good luck :-) Hope you track down the issue...

mgruhler gravatar image mgruhler  ( 2021-02-04 03:29:13 -0600 )edit

Question Tools

2 followers

Stats

Asked: 2021-01-28 06:05:53 -0600

Seen: 954 times

Last updated: Feb 03 '21