Ask Your Question
0

Python SimpleActionServer stops unexpectedly (deadlock!)

asked 2012-12-12 03:13:10 -0600

Miguel S. gravatar image

updated 2014-01-28 17:14:32 -0600

ngrennan gravatar image

Hi all,

I'm writing a small piece of code that leverages actionlib, specifically SimpleActionServer. The main intricacy is that I register a preempt callback, but then I set_preempted on another thread (see code below).

However, when the rate of requests is relatively high (~1Hz) sometimes actionlib stops processings new requests. It does not crash, it does not print nor log anything, it just idles.

Below is a simplified version of my code:

class Node():

    def __init__(self):   
        self.server = actionlib.SimpleActionServer(
            "Topic", 
            Action,
            execute_cb=self.run,
            auto_start=False )

        #Variables so that preemption works
        self.myLock = threading.Lock()
        self.taskID = None

        #register preempt callbacks
        self.server.register_preempt_callback( self.stop )

        #Start actionlib
        self.server.start()


    def run(self, goal):      
        with self.myLock:
            self.taskID =  startTask() #Launch task in another thread

        #Wait for task to be completed or killed
        if not (self.taskID is None):
            wait( self.taskID ) #Wait for task to finish

        #Do not allow task to be preempted from now
        with self.myLock:
            self.taskID = None

        # Get actionlib result
        result = ActionResult()
        getResult(result)

        if not valid(result): #valid() defined elsewhere
            self.server.set_aborted( result )        
        elif self.server.is_preempt_requested() :
            self.server.set_preempted( result )
        else :
            self.server.set_succeeded( result )

        # Sometimes actionlib will stop processing new requests after this!!


    def stop( self ):
        with self.myLock:
            if ( not ( self.taskID is None ) and self.server.is_preempt_requested() ):
                killTask( self.taskID ) # Function defined elsewhere
                self.taskID = None

Am I using actionlib in an invalid way? Could this be an actionlib bug?

Thanks, Miguel S.

edit retag flag offensive close merge delete

Comments

Just curious, which ROS distro are you using?

130s gravatar image130s ( 2012-12-15 13:17:11 -0600 )edit

It's Fuerte on Ubuntu 12.04

Miguel S. gravatar imageMiguel S. ( 2012-12-17 00:18:10 -0600 )edit

Can you interrupt the program and get a backtrace?

tfoote gravatar imagetfoote ( 2012-12-25 14:42:38 -0600 )edit

@tfoote See my answer. @IsaacSaito Also present in Groovy in Ubuntu 12.04.

Miguel S. gravatar imageMiguel S. ( 2013-07-12 09:02:54 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted
2

answered 2013-07-12 09:01:13 -0600

Miguel S. gravatar image

For a few months I sidestepped this problem. But today I've found the deadlock. I reckon there's a bug in simple_action_server.py.

I wasn't able to get the python debugger to produce the stack traces that explained the problem. But after looking at the roslog debug messages and the source code, I came up with the following reconstruction of the events:

Thread A (rospy's message dispatcher?)

  • Deadlocked in: self.execute_condition.acquire()
  • In function: SimpleActionServer.internal_goal_callback() [simple_action_server.py:211] which was called from: ActionServer.internal_goal_callback() [action_server.py:293]
  • This thread has ActionServer.lock and wants to acquire SimpleActionServer.lock (condition variable was initialised with the latter lock).

Thread B (SimpleActionServer's executeLoop thread)

  • Deadlocked in: with self.action_server.lock
  • In function ServerGoalHandle.set_accepted() [server_goal_handle.py:71] which was called from: SimpleActionServer.accept_new_goal()[simple_action_server.py:131] which was called from: SimpleActionServer.executeLoop()[simple_action_server.py:284] which at that point is holding SimpleActionServer.lock
  • This thread wants ActionServer.lock and has SimpleActionServer.lock

In summary, if if a new goal arrives at the same time executeLoop is trying to get a previous (but still new, SimpleActionServer will deadlock.

I suspect the solution involves calling accept_new_goal() [simple_action_server.py:284] without holding SimpleActionServer.lock. My intuition is that simply setting a flag will do, but I will have to study the code a bit more to make sure there no side-effects.

I will try to code the solution myself unless someone fixes it quicker :)

Best, Miguel S.

edit flag offensive delete link more

Comments

Thanks for debugging into it. I've opened a ticket here: https://github.com/ros/actionlib/issues/4

tfoote gravatar imagetfoote ( 2013-07-12 11:19:18 -0600 )edit

Issue has just been fixed! :)

Miguel S. gravatar imageMiguel S. ( 2013-07-23 07:13:36 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2012-12-12 03:13:10 -0600

Seen: 445 times

Last updated: Jul 12 '13