# Global planner behavior upon planning failure

Hi,

I am using navigation stack and wondering what is the expected behavior of global planner upon the local planner fails to control robot according to thewaypoints. According to a test that I ran (link), it seems that it enters the global planner over and over again (without taking into account any info from sensors). That is not what I would expect - I expected it to return impossible to find a plan the second time it enters the planner. Is it really so, or did I make a mistake in the setup?

EDIT: I found suitable parameters to control number of time global planner is called - those are planner_patience and max_planning_retries (the latter one is exactly what I was looking for, as found here ).
However, even with this parameters set to smaller values (2 seconds or 4 attempts), makePlan is still invoked multiple times. It seems as if that happens in separate threads, because I get the info that max_planning_retries is exceeded, but makePlan is being called over and over again.
I have no idea how to detect which function or which node calls it multiple times. Does somebody know better?

EDIT2: Additionally, it is clear that infinite loop of calling makePlan starts once the max_planning_retries is exceeded and afterwards it does not check that condition.

EDIT3: disabling recovery behaviors all together keeps number of calls to makePlan lower (although, still higher than max_planning_retries)

edit retag close merge delete

+1

same issue with gavran.

( 2017-02-05 00:07:42 -0600 )edit

Sort by » oldest newest most voted

Hi all, I think the big and arbitrary number of calls to global planner is caused by a bug in move_base.cpp

I pull requested a fix. I would really apreciate if you can give it a try, as move_base.cpp code is quite chaotic and I'm afraid of having broked something else.

Btw, the number of calls to global planner is:

1+max_planning_retries + number_of_recovery_behavior*(1+max_planning_retries)

because max_planning_retries is the times to retry, after a first attempt we always do (so 0 means not retry, just 1 attempt)

Hope this helps!

more

I tested it and it works. However, not really clear what you did there to fix it. I suggest that we continue the discussion under pull request comments, if you would like to explain it a bit.

( 2016-10-05 05:20:33 -0600 )edit

Thanks for testing. I added a more detailed explanation in the PR. Can you please comment there that you tested the PR? I suppose that will speed up merging.

( 2016-10-05 17:17:31 -0600 )edit

oh, I already did (only there my nickname is different, gergia)

( 2016-10-06 02:29:11 -0600 )edit

Let me summarize the solutions I used to fix this problem (however, I won't mark this as accepted answer because I still lack proper understanding of the move_base behavior that caused a problem and would appreciate if somebody can give the explanation).

• in order to control number of attempts to make plan, use parameters max_planning_retries (tells how many times after a failure planner should try to run again, defaults to -1, which mean it is not used) and planner_patience (tells how many seconds global planner should wait till it gets global plan, defaults to 5 seconds). If one of these thresholds is passed, the clearing behaviors are performed.

• however, if recovery behaviors are enabled (recovery_behavior_enabled parameter), number of attempts would be higher than expected: not max_planning_retries + max_planning_retries * number_of_recovery_behaviors, but something greater than that (which I still don't understand)

• all of that also depends on the machine where it is executed: when I would execute on a weaker machine, it happened more often to loop indefinitely before giving up

Therefore, to control how many times makePlan would be called, the safest option is to specify parameter max_planning_retries and disable recovery behaviors (this however, comes at a cost because recovery behaviors are useful). Note that this still does not mean that it would be called max_planning_retries times: it might go few above this number, but it never gets out of control.

more

As far as I know, the robot moves according to the local planner, as the base_local_planner is responsible for computing velocity commands to send to the mobile base of the robot given a high-level plan. And the local trajectory is generated according to the global trajectory and the obstacles within the local costmap. So in my opinion, the global planner just serves for the local planner.

So if the local planner fails, the recovery behaviors(i.e. clear_costmap_recovery, rotate_recovery) will happen, and then if it fails again, the robot will stop, and you need to reset the Goal. According to the move_base doc, after each recovery behavior, move_base will attempt to make a plan. If planning is successful, move_base will continue normal operation. Otherwise, the next recovery behavior in the list(parameter recovery_behaviors) will be executed.

For more details, you can see the base_local_planner document.

As for the recovery behaviors, you can see move_base document.

EDIT

I think the infinite loop is because local costmap is stuck, but the global costmap is not. In global_costmap_params.yaml, there is a param update_frequency, which is the frequency in Hz for the global map to be updated. Generally, if update_frequency is not zero, that loop won't happen.

Did you encounter this infinite loop in your real robot? As I haven't encounter this problem.

EDIT 2

I'm working with P3-DX and ROS navigation stack. I have encoutered these two circumstances in which the nav stack aborts.

1. The robot is too close to the obstacles, it can't even rotate. These two error messages came out and the robot stopped.

[ERROR] [1472631068.711487006]: Rotate recovery can't rotate in place because there is a potential collision. Cost: -1.00

[ERROR] [1472631068.811355441]: Aborting because a valid plan could not be found. Even after executing all recovery behaviors

2. There are many obstacles arround the robot, but it can rotate. Then it rotated several times but still could't find a way out. Finally, it aborted. And this error message came out.

[ERROR] [1472633546.394854371]: Aborting because a valid plan could not be found. Even after executing all recovery behaviors

more

thanks. however, exactly this is a problem: planning is successful -> local planner fails to execute -> again planning is successful -> again local planner fails to execute. and it loops indefinitely. thoughts on that, how to avoid that loop?

( 2016-08-31 00:53:57 -0600 )edit

thanks for the ideas. but I guess it is not the case: my update_frequency is 5.0, yet the same behavior happens. I currently don't have a real robot to work on, only a simulation, but what makes you think that on the real robot there would be a difference?

( 2016-09-19 07:58:49 -0600 )edit