[Bug in ROS | Solved] Extending SimpleActionServer/-Client to allow REJECTING goals

asked 2016-07-21 10:34:05 -0500

100 ●7 ●11 ●15

updated 2016-07-22 10:40:42 -0500

Update: This is a known bug, refer to my comment in this bug tracker on Github for a possible solution and more details. More information can also be found in my accepted answer below.

Hi folks,

I am trying to extend the functionality of the SimpleActionServer / -Client classes to allow rejecting new goals if (and only if) there are currently none being processed. That is, I would like to replace/disable the PENDING state logically in a sense that when there is currently an ACTIVE goal, it will not be preempted and the very recently received goal will automatically be rejected by the server. In contrast, once a new goal is received and there is no goal being currently processed, the server should accept the new goal. However, I would like to retain the ability to cancel/abort a goal (both by the client and the server) just in case this matters.

To achieve this, I've modified the actionlib tutorials (namely 1, 2, 3, and 4). You can find my modified learning_actionlib package for download here. Modifications are basically in the goalCallback() function, simply search for CHANGED in order_action_server_imp.h. You can compile it in any Indigo/Jade/Kinetic workspace and run it (from your catkin_ws dir) using

roslaunch src/learning_actionlib/launch/test.launch

or

roslaunch src/learning_actionlib/launch/test_avg.launch

The problem with that code is as follows: When one of the above launchfiles is started, a single server and multiple clients are being launched at the 'same' time (-> workload creation). On my slower machine (i7 4600U, VMware player), it regularily happens that 1 or more clients incorrectly timeout while one client always (correctly) returns SUCCEEDED and all the others correctly report REJECTED. However, I would assume that exactly one client succeeds and all others get rejected. It seems that the server correctly publishes the REJECTED states as the following (DEBUG) log shows but the clients don't get these information: http://pastebin.com/m9k5xniD . In particular and according to that log, the server correctly receives all goals, only processes the first (lines 7-12) and rejects new ones (lines 17, 23, 29) -> 4 clients. Note that it doesn't matter if test.launch or test_avg.launch is used, both have the same problem. I guess the problem is related to spinning/multithreading (and parallel code execution in general as this only rarely happens on my fast 12-core machine). Also, if I decrease the #cores in my VM, the probability of occurrence increases. Thus, if you don't have the problem, you may need to restart the launchfiles a few times or increase the number of clients.

Please also note that I am occasionally get a

double free or corruption

(related to a std::vector::push_back used inside a boost::mutex) and

average_server: /usr/include/boost/thread/pthread/recursive_mutex.hpp:110: void boost::recursive_mutex::lock(): Assertion `!pthread_mutex_lock(&m)' failed.

error when terminating roslaunch by Ctrl+C. Not sure if this is related. In general, I am not sure if ... (more)

edit retag flag offensive close merge delete

Comments

You can find my modified learning_actionlib package for download here.

The link is dead, is the code still up anywhere?

abhishek47 ( 2021-06-30 02:59:40 -0500 )edit

No, sorry. Unfortunately, I don't have the code anymore since I re-installed that computer. :-(

If I find a copy though, I will let you know!

CodeFinder ( 2021-07-02 04:42:02 -0500 )edit

add a comment

1

answered 2016-07-22 04:12:06 -0500

CodeFinder

100 ●7 ●11 ●15

Interestingly, I found the solution: presumably, it's a bug (?) in ROS (can confirm it for Indigo and Kinetic)!

The short story: the status_sub_, feedback_sub_, and result_sub_ subscriptions in /opt/$ROS_DIST/include/actionlib/client/action_client.h@initClient() are created with a queue size of only 1. This simply causes messages to get lost.

I can't think of any reason why to set a queue size of 1. IMHO, this may even cause other wired problems with high loads / less ressources between server/clients.

The long story: I've done the following things to track down the problem to its actual origin.

The debug output of the server suggests that it correctly sends the REJECTED states and its associated result (cf. link above). To verify this, I've used rostopic echo /fibonacci/result to check whether I receive 1 SUCCEEDED and N-1 REJECTED result msgs (given N clients).
As this was the case, I've subscribed to /fibonacci/result in the client to check whether these msgs also arrive at the client. (Note that all messages from all clients arrive at each client.) Interestingly, the msgs arrive only if I set the queue size to some large value (not 1). Setting it to 1 (like in action_client.h, see the short story above) causes messages to get lost.
Then I temporarily modified the queue sizes in /opt/$ROS_DIST/include/actionlib/client/action_client.h@initClient() and set them to 1000, and voila: after recompiling, all REJECTED states are received correctly and none of my clients timed out.
Summary: (simple / my) server code is correct, (simple / my) client code should be correct, ROS' ActionClient<T> impl. has a bug

Glad to receive some comments/feedback on this. If it's a bug, it should be fixed in the official sources. Can anyone confirm this (ideally, one of the ROS maintainer/developers)?

edit flag offensive delete link

Comments

If you really feel this is a bug (which I feel technically isn't correct, as a queue size of 1 is valid, and it may just be that the rationale for it is unclear), I suggest you open an issue on the tracker.

gvdhoorn ( 2016-07-22 06:02:08 -0500 )edit

Thanks for pointing me to that site! It is obviously a known bug so that I've added a comment and reference to this post (see https://github.com/ros/actionlib/issu... ). Can you accept my answer as the correct one, please? I am not having enough reputation yet.

CodeFinder ( 2016-07-22 10:09:48 -0500 )edit

It'd be nice if you could edit your answer to insert an update section at the top pointing users to the issue on the github tracker, so they don't have to scroll down and read everything. I'll give you some additional karma, so you can accept your own answer.

gvdhoorn ( 2016-07-22 10:26:21 -0500 )edit

Thanks, all done! ;-)

CodeFinder ( 2016-07-22 10:41:27 -0500 )edit

add a comment

0

answered 2016-07-21 11:15:07 -0500

Chrissi

flag of United Kingdom of Great Britain and Northern Ireland

1642 ●11 ●19 ●26

I think it is because of the mutex lock in the goal callback. As long as one client is processing, the others cannot access the callback and can therefore not be rejected. In general every client executes the goal callback in a new thread so you want to be able to run this function multiple times (even if it is just to reject new clients) to allow for concurrency.

And yes, it is safe to execute longer calculations in the execute callback. This can last as long as you like if the client did not set a timeout for this action.

edit flag offensive delete link

Comments

Thanks for your fast reply! Just to clarify (referring to "As long as one client is processing, the others cannot access the callback..."): a client only initiates an action but the processing is actually done by the server. What mutex are you referring to?

CodeFinder ( 2016-07-21 14:13:44 -0500 )edit

In particular in the Fibonacci example (refer to test.launch), my execute_callback is called from a separate thread (see order_action_server_impl.h@executeLoop() at the bottom) and the mutex "lock_" is released before calling the callback. So, this shouldn't be an issue. What do you think? ;-)

CodeFinder ( 2016-07-21 14:18:05 -0500 )edit

I somehow thought that you reimplemented all of this and questioned the locks, sorry. The only possible problem I could see is a race condition between acknowledging the receival of the goal and the rejection, leading to it not being properly communicated to the client

Chrissi ( 2016-07-21 15:32:31 -0500 )edit

I don't want to pretend, however, that I fully understand the thread management in the action server, so I am afraid I cannot really tell you the exact problem. Sorry for not being helpful.

Chrissi ( 2016-07-21 15:33:40 -0500 )edit

No worries and thanks anyway! Maybe someone else can help. ;-)

CodeFinder ( 2016-07-21 15:37:12 -0500 )edit

add a comment

[Bug in ROS | Solved] Extending SimpleActionServer/-Client to allow REJECTING goals

Comments

2 Answers

Comments

Comments

Question Tools

Stats

Related questions

[Bug in ROS | Solved] Extending SimpleActionServer/-Client to allow REJECTING goals edit

Comments

2 Answers

Comments

Comments

Question Tools

Stats

Related questions

[Bug in ROS | Solved] Extending SimpleActionServer/-Client to allow REJECTING goals