[python]prevent callbacks from slowing down the main loop
Hi,
I have a rather basic python2 node (it's the interface to the motor controllers)
- a main loop running (theoretically) at 10Hz (using rate.sleep() )
- 2 subscribers for velocities commands (only one is used at a given time, depending of the navigation mode) : the frequency should be about 10Hz on the active one (and 0 on the other one). Both have a queu size of 1
- 1 subscriber for proximity sensors (about 10Hz). Queue_size=1
- 2 publishers (publishing done from the main loop
Most of the time everything runs OK. However, sometimes, the main loop seems to be "stuck" for sometimes more than 1 second in a random place, while the message callbacks seem to be handled fine.
My guess is that sometimes, the main thread isn't executed for about 1 second, while the callback thread is still running.
From my understanding (please correct me if I'm wrong, I'm quite unsure about it), in Python there is only one thread running at a given time (even if multiple CPU cores are available). From what I understood in python (contrary to C++), the message callbacks are handled on a separate (python) thread).
So it seems that the callback thread gets executed but not the main thread.
NB : this even is rather execeptionnal (about once per minute under very heavy CPU load from other nodes, far less under lighter load).
For this specific use case, The main loop is more important than the message callbacks.
Is there a way to make sure the main loop gets executed (dropping messages in the input queues if needed) rather than processing all incomming messages and not executting fat enough the main loop?
A few potential ideas (but I'm not sure if they are and if so how) :
increase the main thread priority (or decrease the message handling thread) : from my understanding it is not possible because python is handling itself it's threads)
disable the message processing thread and une something like the C++ spinOnce function
have a way to explicitly aks for the latest data (ie not having any callbacks at all). If no new data arrived, getting "None" or getting the last recieved data are both fine to me
Thanks a lot in advance
Best
Felix
EDIT : as asked by mgruhler, I add my code (it's now the full version). NB : The code was started by a collegue who left the company and that I replaced (maybe based on an example he found on the web), but I did several changes afterwards (but I never modified the mutex part, so I'm not even sure if it is usefull).
#!/usr/bin/env python
import rospy
from math import pi as PI
import threading
import time
from roboclaw import Roboclaw
#import roboclaw_driver as roboclaw
from std_msgs.msg import String
from std_msgs.msg import Float64
from std_msgs.msg import Float32
from std_msgs.msg import Bool
from geometry_msgs.msg import Twist
from borobo.msg import capteur_proximite_msg
r = 0.155
e ...
From what you described, I'd say this shouldn't be an issue. Are you sure it's the callbacks actually "blocking" the main thread? Maybe you could share your script, so that we could have a look...
I can't garantee that the callbakcs are blocking the main thread. But for the 3 times I looked in detail into the logs, each times the ros_loginfo printings of the main tread where blocked for a bit over 1 second (the blocking did occure in different places of the main loop each time, but then always continues iin the same order), while the printings from the callback continues to happen normally at 10Hz.
For the code, I will edit to add a simplified version of the code (if you think the full code is needed, I can ask my boss if he agrees that I publish it)
Just a gut feeling, but I'd first make sure that the mutex you are using is not blocking somewhere, especially in combination with the callbacks. I.e. nowhere in a callback should the mutex be used (and thus, also not in any function called from the callbacks) As you only show one but say you have three, I cannot have a look at that. Next thing to check would be everything that has to do with communication (i.e. maybe it is an issue with the serial communication to your robot, that is causing the delays?)
If you are sure that none of that is the issue, I would have a look again at how the callbacks might influence the main loop...
Thanks, The boss agreed that I publish the complete node, so there it is.
I never touched the part about the mutex, but I have the impression that it is never called from within any callback (I even have the impression that the mutex is useless).
For the serial communication, I will try to look deeper into it.