Ask Your Question
0

[Bug in ROS | Solved] Extending SimpleActionServer/-Client to allow REJECTING goals

asked 2016-07-21 10:34:05 -0600

CodeFinder gravatar image

updated 2016-07-22 10:40:42 -0600

Update: This is a known bug, refer to my comment in this bug tracker on Github for a possible solution and more details. More information can also be found in my accepted answer below.

Hi folks,

I am trying to extend the functionality of the SimpleActionServer / -Client classes to allow rejecting new goals if (and only if) there are currently none being processed. That is, I would like to replace/disable the PENDING state logically in a sense that when there is currently an ACTIVE goal, it will not be preempted and the very recently received goal will automatically be rejected by the server. In contrast, once a new goal is received and there is no goal being currently processed, the server should accept the new goal. However, I would like to retain the ability to cancel/abort a goal (both by the client and the server) just in case this matters.

To achieve this, I've modified the actionlib tutorials (namely 1, 2, 3, and 4). You can find my modified learning_actionlib package for download here. Modifications are basically in the goalCallback() function, simply search for CHANGED in order_action_server_imp.h. You can compile it in any Indigo/Jade/Kinetic workspace and run it (from your catkin_ws dir) using

roslaunch src/learning_actionlib/launch/test.launch

or

roslaunch src/learning_actionlib/launch/test_avg.launch

The problem with that code is as follows: When one of the above launchfiles is started, a single server and multiple clients are being launched at the 'same' time (-> workload creation). On my slower machine (i7 4600U, VMware player), it regularily happens that 1 or more clients incorrectly timeout while one client always (correctly) returns SUCCEEDED and all the others correctly report REJECTED. However, I would assume that exactly one client succeeds and all others get rejected. It seems that the server correctly publishes the REJECTED states as the following (DEBUG) log shows but the clients don't get these information: http://pastebin.com/m9k5xniD . In particular and according to that log, the server correctly receives all goals, only processes the first (lines 7-12) and rejects new ones (lines 17, 23, 29) -> 4 clients. Note that it doesn't matter if test.launch or test_avg.launch is used, both have the same problem. I guess the problem is related to spinning/multithreading (and parallel code execution in general as this only rarely happens on my fast 12-core machine). Also, if I decrease the #cores in my VM, the probability of occurrence increases. Thus, if you don't have the problem, you may need to restart the launchfiles a few times or increase the number of clients.

Please also note that I am occasionally get a

double free or corruption

(related to a std::vector::push_back used inside a boost::mutex) and

average_server: /usr/include/boost/thread/pthread/recursive_mutex.hpp:110: void boost::recursive_mutex::lock(): Assertion `!pthread_mutex_lock(&m)' failed.

error when terminating roslaunch by Ctrl+C. Not sure if this is related. In general, I am not sure if ... (more)

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
1

answered 2016-07-22 04:12:06 -0600

CodeFinder gravatar image

Interestingly, I found the solution: presumably, it's a bug (?) in ROS (can confirm it for Indigo and Kinetic)!

The short story: the status_sub_, feedback_sub_, and result_sub_ subscriptions in /opt/$ROS_DIST/include/actionlib/client/action_client.h@initClient() are created with a queue size of only 1. This simply causes messages to get lost.

I can't think of any reason why to set a queue size of 1. IMHO, this may even cause other wired problems with high loads / less ressources between server/clients.

The long story: I've done the following things to track down the problem to its actual origin.

  • The debug output of the server suggests that it correctly sends the REJECTED states and its associated result (cf. link above). To verify this, I've used rostopic echo /fibonacci/result to check whether I receive 1 SUCCEEDED and N-1 REJECTED result msgs (given N clients).
  • As this was the case, I've subscribed to /fibonacci/result in the client to check whether these msgs also arrive at the client. (Note that all messages from all clients arrive at each client.) Interestingly, the msgs arrive only if I set the queue size to some large value (not 1). Setting it to 1 (like in action_client.h, see the short story above) causes messages to get lost.
  • Then I temporarily modified the queue sizes in /opt/$ROS_DIST/include/actionlib/client/action_client.h@initClient() and set them to 1000, and voila: after recompiling, all REJECTED states are received correctly and none of my clients timed out.
  • Summary: (simple / my) server code is correct, (simple / my) client code should be correct, ROS' ActionClient<T> impl. has a bug

Glad to receive some comments/feedback on this. If it's a bug, it should be fixed in the official sources. Can anyone confirm this (ideally, one of the ROS maintainer/developers)?

edit flag offensive delete link more

Comments

If you really feel this is a bug (which I feel technically isn't correct, as a queue size of 1 is valid, and it may just be that the rationale for it is unclear), I suggest you open an issue on the tracker.

gvdhoorn gravatar imagegvdhoorn ( 2016-07-22 06:02:08 -0600 )edit

Thanks for pointing me to that site! It is obviously a known bug so that I've added a comment and reference to this post (see https://github.com/ros/actionlib/issu... ). Can you accept my answer as the correct one, please? I am not having enough reputation yet.

CodeFinder gravatar imageCodeFinder ( 2016-07-22 10:09:48 -0600 )edit

It'd be nice if you could edit your answer to insert an update section at the top pointing users to the issue on the github tracker, so they don't have to scroll down and read everything. I'll give you some additional karma, so you can accept your own answer.

gvdhoorn gravatar imagegvdhoorn ( 2016-07-22 10:26:21 -0600 )edit

Thanks, all done! ;-)

CodeFinder gravatar imageCodeFinder ( 2016-07-22 10:41:27 -0600 )edit
0

answered 2016-07-21 11:15:07 -0600

Chrissi gravatar image

I think it is because of the mutex lock in the goal callback. As long as one client is processing, the others cannot access the callback and can therefore not be rejected. In general every client executes the goal callback in a new thread so you want to be able to run this function multiple times (even if it is just to reject new clients) to allow for concurrency.

And yes, it is safe to execute longer calculations in the execute callback. This can last as long as you like if the client did not set a timeout for this action.

edit flag offensive delete link more

Comments

Thanks for your fast reply! Just to clarify (referring to "As long as one client is processing, the others cannot access the callback..."): a client only initiates an action but the processing is actually done by the server. What mutex are you referring to?

CodeFinder gravatar imageCodeFinder ( 2016-07-21 14:13:44 -0600 )edit

In particular in the Fibonacci example (refer to test.launch), my execute_callback is called from a separate thread (see order_action_server_impl.h@executeLoop() at the bottom) and the mutex "lock_" is released before calling the callback. So, this shouldn't be an issue. What do you think? ;-)

CodeFinder gravatar imageCodeFinder ( 2016-07-21 14:18:05 -0600 )edit

I somehow thought that you reimplemented all of this and questioned the locks, sorry. The only possible problem I could see is a race condition between acknowledging the receival of the goal and the rejection, leading to it not being properly communicated to the client

Chrissi gravatar imageChrissi ( 2016-07-21 15:32:31 -0600 )edit

I don't want to pretend, however, that I fully understand the thread management in the action server, so I am afraid I cannot really tell you the exact problem. Sorry for not being helpful.

Chrissi gravatar imageChrissi ( 2016-07-21 15:33:40 -0600 )edit

No worries and thanks anyway! Maybe someone else can help. ;-)

CodeFinder gravatar imageCodeFinder ( 2016-07-21 15:37:12 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2016-07-21 10:34:05 -0600

Seen: 468 times

Last updated: Jul 22 '16