smach concurrence: race condition bug or bad usage

asked 2018-02-09 03:10:53 -0500

knxa gravatar image

updated 2018-02-12 05:46:41 -0500

I have been hit by a problem with the smach concurrence container. I believe it is a race condition problem.

Scenario:

  1. A preempt request is called on the concurrence state right when the children have completed, but the concurrence state itself is still executing.
  2. The concurrence state will then (sometimes) claim that the preempt was serviced, though none of its children serviced the preempt.

This violates how I intended to use the 'preempt' concept. Am I missing something?

For the sake of completeness, here is a python script that demonstrates the problem, it does not require any ROS code running. Since the problem is timing dependent, it might depend on your PC whether or not the script triggers the problem. The script just creates this state machine and tries to preempt it after 0.3 seconds:

  • Sequence(top)
    • Concurrence(Cc)
      • Delay1(0.3 seconds) (for demonstration purposes: the only child of Concurrence)
    • Delay2(0.5 seconds)

Test script:

#!/usr/bin/env python

from time import sleep
from smach import State, Concurrence, Sequence
from threading import Timer


class DelayState(State):
    """Delay state for testing purposes"""

    def __init__(self, delay):
        State.__init__(self, outcomes=['succeeded', 'preempted'])
        self.delay = delay

    def execute(self, userdata):
        # A better delay state should be able to preempt during its sleep state
        sleep(self.delay)
        if self.preempt_requested():
            self.service_preempt()
            return 'preempted'
        return 'succeeded'


def test_concurrence_preempt():
    """test demonstrating race condition

    Creates a state machine:

    - Sequence(top)
      - Concurrence(Cc)
        - Delay1(0.3 seconds) (for demonstration purposes: the only child of Concurrence)
      - Delay2(0.5 seconds)

    When preempting the state machine after ~0.3 seconds, the machine
    is expected to return 'preempted'
    """

    def outcome_cb(outcome_map):
        if 'preempted' in outcome_map.values():
            return 'preempted'
        return 'succeeded'

    cc = Concurrence(
        outcomes=['succeeded', 'preempted'],
        default_outcome='succeeded',
        outcome_cb=outcome_cb)
    with cc:
        Concurrence.add('Delay1', DelayState(delay=0.3))

    top = Sequence(outcomes=['succeeded', 'preempted'],
                   connector_outcome='succeeded')

    with top:
        Sequence.add('Cc', cc)
        Sequence.add('Delay2', DelayState(delay=0.5))

    # Execute state machine and try cancel after various milliseconds of delay
    for msDelay in range(290, 330, 2):
        print ('Cancel after delay {}'.format(msDelay))
        t = Timer(msDelay/1000.0, top.request_preempt)
        t.start()
        if top.execute() != 'preempted':
            print ('===== TEST FAILED, delay: %sms =====', msDelay)
            return

    print ('===== TEST OK =====')


test_concurrence_preempt()

Actually it seems that the above scenario is just one out of several cases, where a preempt request is kind of lost. I believe similar problems occurs when:

  1. A preempt request is called on the concurrence state right when the concurrence has been told to execute, but the children are not yet active. Then the preempt request is not propagating to the children and they just go on executing.
  2. Similar problems for SimpleActionState

My setup:

  • Ubuntu 16.04,
  • ROS kinetic
  • ros-kinetic-smach 2.0.1-0xenial-20170608-133042-0800
edit retag flag offensive close merge delete