Robotics StackExchange | Archived questions

Robot drifts away from goal and laserscan skews on random interval

I have to apologize for the vague and/or convoluted title. We don't know where the issue is ourselves.

The robot I'm working on makes use of ROS Kinetic, the standard navigation stack with amcl, teb_local_planner, laser_scan_matcher and move_base. It is a front-wheeled differential drive with two caster wheels on the back with a rectangular body/footprint, built on the chassis of an electric wheelchair.

The issue we're facing is happening on a random interval: Whenever we set a goal in rviz, the robot does move towards his goal. After a random amount of time or after a certain distance has been traveled, two errors could occur:

The robot loses track of his position as the local costmap would become skewed, we see this by lines drawn from the LIDAR sensor, indicating cabinets and walls, becoming skewed and/or rotating from their location. This behaviour causes it to continue on "a path" until it does encounter an obstacle. Each of these erratic behaviours is visible within rviz and the console output can mention the errors Map Update/Control loop missed its desired rate <...> which we think might not be related to CPU load, as it is averaging at about 50% including when generating new paths. never has it gone above 90%.

Additionally, the robot would keep on its last set velocity (cmd_vel output) and rviz would no longer be able to set and display a new goal, rendering the robot unresponsive. We're able to clear cmd_vel by using manual control (teleop_twist_joy) and stopping the robot that way. Upon hitting the reset button on the bottom-right in rviz, we progressively lose the local and global costmap per press, accompanied by the warning "No map provided". Rebooting the ROS environment will restore everything until the "drift" occurs again.

We can consistently recreate this behaviour by having the robot drive various distances, believing it appears more often when traversing small distances of approximately 2-3 meters, on longer distances i.e 5-20 meters the errors occure less often, but the laserscan becoming askew is still prevalent. The LIDAR/local costmap becoming askew is easily reproducible by rotating the robot while stationary, when turning both left and right the laser scan and local costmap would rotate in short intervals and progressively move to the right a certain distance within rviz until no longer, where it is still skewed massively.

We've managed to make a quick recording of what happens when the robot is spinning around its axis once, alas, the video is rather shaky. https://www.dropbox.com/s/iywk2r9c6db0wda/VID_20190326_151328.mp4?dl=0 (The link will be removed if it's not allowed to post here.)

Below are the common configuration files requested:

Asked by halberd on 2019-03-22 07:36:00 UTC

Comments

Each of these erratic behaviours is visible within rviz and the console output can mention the errors Map Update/Control loop missed its desired rate <...>

that sounds like a resource issue.

What sort of hw are you running this on (ie: CPU, memory, platform, etc)?

Asked by gvdhoorn on 2019-03-22 07:41:01 UTC

The hardware consists of a laptop with an i5-3210M CPU (2.50GHz x4) and 16GB RAM running Ubuntu 16.04. CPU usage (with ROS) averages at 40-50% idle and 70-80% when given a goal.

Asked by halberd on 2019-03-22 07:59:00 UTC

averages at 40-50% idle and 70-80% when given a goal.

80% is quite close to maximum. Could be a red herring, but it'd be interesting to see what the utilisation level is when your problem manifests itself.

How did you install ROS? All using apt?

Also btw: why so many static_transform_publishers? Can you not create a URDF?

Asked by gvdhoorn on 2019-03-22 08:03:28 UTC

We're the 5th group assigned to this project, therefore we don't know the full details of what has happened where (the documentation was rather incomplete). We can safely assume the packages were installed via apt and our group installed the planner that way. In terms of CPU consumption, one or two of the cores does spike up to 65% or higher when recalculating paths, though we don't see any unusual spikes when the problem manifests. We did capture one such instance of the issue on video. About the URDF model, we can look into that at a later stage.

Asked by halberd on 2019-03-22 08:29:23 UTC

I'm not a navigation expert, so I don't believe I can help you. That's also why I only posted comments (so as to not discourage other board members from posting).

What I could suggest is to see whether this is a resource issue by monitoring CPU and memory usage over time. It could help to see whether a failing control loop (due to lack of resources) is the cause.

Asked by gvdhoorn on 2019-03-25 06:55:23 UTC

Also:

As a result, the robot would keep on its last set velocity (cmd_vel output) and rviz would no longer be able to set and display a new goal, rendering the robot unresponsive. We're able to clear cmd_vel by using manual control (teleop_twist_joy) and stopping the robot that way.

Are Twists still being published by move_base? If not: you should really add a time-out to your mobile base driver on incoming Twists, so the robot stops automatically if none are received withing a certain amount of time. That would avoid runaway robots like this.

Asked by gvdhoorn on 2019-03-25 06:56:57 UTC

In a recent test involving manual and autonomous driving, we're noticing that upon any kind of turn, the local costmap and laser scan skew and warp to the right within rviz. Currently, we made the robot spin in circles to demonstrate this behaviour, and the local costmap appropriately jumps on each revolution. To this point, we haven't yet found any reason why the laser output would skew after turning the robot around. Nor can we think which of the edits made in the past would cause this. We're still poking around and keeping track of what's what.

Asked by halberd on 2019-03-26 07:12:17 UTC

We've just discovered that the bug occurred again, confirming that move_base doesn't publish anything on cmd_vel after the goal has been reached and a message with all zeroes has been sent.

Asked by halberd on 2019-03-28 05:08:19 UTC

Are you rosbaging everything? Could/would help with post-mortem debugging.

re: cmd_vel: sounds like adding a time-out would be a good idea.

Asked by gvdhoorn on 2019-03-28 05:11:22 UTC

We'll look into making a time-out for cmd_vel, as it is currently a rarely occurring issue at the moment. Our priority is on the map skewing issue, for which I've indeed made a rosbag just now.

Asked by halberd on 2019-03-28 05:31:05 UTC

We'll look into making a time-out for cmd_vel, as it is currently a rarely occurring issue at the moment.

having such a time-out is considered a soft safety feature. I would not want to sit in a wheelchair that could potentially become uncontrollable ..

Asked by gvdhoorn on 2019-03-28 06:03:47 UTC

agreed. thankfully we only make use of the base of the wheelchair; nobody can sit on the robot either. On top of it is an emergency button we've used plenty of times as a result. During testing, I've found that there's several map update loop missed its desired refresh rate errors, as such I'll delve into those further, believing it might be related to the skewing issue.

Asked by halberd on 2019-03-28 06:09:57 UTC

It might pay to spend some time installing a monitoring tool. Something like collectd. Or perhaps ethz-asl/ros-system-monitor. That way you could monitor CPU usage time-synced with the other messages in your bag.

During testing, I've found that there's several map update loop missed its desired refresh rate errors

that was what I pointed at in my first comment as well.

Asked by gvdhoorn on 2019-03-28 06:24:35 UTC

indeed. that might help us get closer to discovering the cause. In the past ~20 minutes i've driven the robot about, both manual and autonomous, and all three issues did not occur; the CPU load was still light (maximum 40%), the laserscan did not get skewed and the drifting issue only occured once a person walked in front of it (Might've been recovery behaviour, but it continued going backwards infinitely.) I've recorded this session in a bag file, too.

Asked by halberd on 2019-03-28 06:29:55 UTC

Answers