Robot drifts away from goal and laserscan skews on random interval

asked 2019-03-22 07:36:00 -0500

halberd gravatar image

updated 2019-03-26 09:27:25 -0500

I have to apologize for the vague and/or convoluted title. We don't know where the issue is ourselves.

The robot I'm working on makes use of ROS Kinetic, the standard navigation stack with amcl, teb_local_planner, laser_scan_matcher and move_base. It is a front-wheeled differential drive with two caster wheels on the back with a rectangular body/footprint, built on the chassis of an electric wheelchair.

The issue we're facing is happening on a random interval: Whenever we set a goal in rviz, the robot does move towards his goal. After a random amount of time or after a certain distance has been traveled, two errors could occur:

The robot loses track of his position as the local costmap would become skewed, we see this by lines drawn from the LIDAR sensor, indicating cabinets and walls, becoming skewed and/or rotating from their location. This behaviour causes it to continue on "a path" until it does encounter an obstacle. Each of these erratic behaviours is visible within rviz and the console output can mention the errors Map Update/Control loop missed its desired rate <...> which we think might not be related to CPU load, as it is averaging at about 50% including when generating new paths. never has it gone above 90%.

Additionally, the robot would keep on its last set velocity (cmd_vel output) and rviz would no longer be able to set and display a new goal, rendering the robot unresponsive. We're able to clear cmd_vel by using manual control (teleop_twist_joy) and stopping the robot that way. Upon hitting the reset button on the bottom-right in rviz, we progressively lose the local and global costmap per press, accompanied by the warning "No map provided". Rebooting the ROS environment will restore everything until the "drift" occurs again.

We can consistently recreate this behaviour by having the robot drive various distances, believing it appears more often when traversing small distances of approximately 2-3 meters, on longer distances i.e 5-20 meters the errors occure less often, but the laserscan becoming askew is still prevalent. The LIDAR/local costmap becoming askew is easily reproducible by rotating the robot while stationary, when turning both left and right the laser scan and local costmap would rotate in short intervals and progressively move to the right a certain distance within rviz until no longer, where it is still skewed massively.

We've managed to make a quick recording of what happens when the robot is spinning around its axis once, alas, the video is rather shaky. https://www.dropbox.com/s/iywk2r9c6db... (The link will be removed if it's not allowed to post here.)

Below are the common configuration files requested:

  • navigation.launch

    <launch> <arg name="sim" default="false"/> <arg name="path"/>

    <node if="$(eval sim == 'true')" pkg="rosbag" type="play" name="play" args="$(arg path)/bags/t5_lidar.bag --delay=5 --clock"/>
    
        <node pkg="tf" type="static_transform_publisher" name="base_link_to_laser" args="0.0 0.0 0.0 0.0 0.0 0 ...
(more)
edit retag flag offensive close merge delete

Comments

Each of these erratic behaviours is visible within rviz and the console output can mention the errors Map Update/Control loop missed its desired rate <...>

that sounds like a resource issue.

What sort of hw are you running this on (ie: CPU, memory, platform, etc)?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-22 07:41:01 -0500 )edit

The hardware consists of a laptop with an i5-3210M CPU (2.50GHz x4) and 16GB RAM running Ubuntu 16.04. CPU usage (with ROS) averages at 40-50% idle and 70-80% when given a goal.

halberd gravatar image halberd  ( 2019-03-22 07:59:00 -0500 )edit

averages at 40-50% idle and 70-80% when given a goal.

80% is quite close to maximum. Could be a red herring, but it'd be interesting to see what the utilisation level is when your problem manifests itself.

How did you install ROS? All using apt?

Also btw: why so many static_transform_publishers? Can you not create a URDF?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-22 08:03:28 -0500 )edit

We're the 5th group assigned to this project, therefore we don't know the full details of what has happened where (the documentation was rather incomplete). We can safely assume the packages were installed via apt and our group installed the planner that way. In terms of CPU consumption, one or two of the cores does spike up to 65% or higher when recalculating paths, though we don't see any unusual spikes when the problem manifests. We did capture one such instance of the issue on video. About the URDF model, we can look into that at a later stage.

halberd gravatar image halberd  ( 2019-03-22 08:29:23 -0500 )edit

I'm not a navigation expert, so I don't believe I can help you. That's also why I only posted comments (so as to not discourage other board members from posting).

What I could suggest is to see whether this is a resource issue by monitoring CPU and memory usage over time. It could help to see whether a failing control loop (due to lack of resources) is the cause.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-25 06:55:23 -0500 )edit

Also:

As a result, the robot would keep on its last set velocity (cmd_vel output) and rviz would no longer be able to set and display a new goal, rendering the robot unresponsive. We're able to clear cmd_vel by using manual control (teleop_twist_joy) and stopping the robot that way.

Are Twists still being published by move_base? If not: you should really add a time-out to your mobile base driver on incoming Twists, so the robot stops automatically if none are received withing a certain amount of time. That would avoid runaway robots like this.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-25 06:56:57 -0500 )edit

In a recent test involving manual and autonomous driving, we're noticing that upon any kind of turn, the local costmap and laser scan skew and warp to the right within rviz. Currently, we made the robot spin in circles to demonstrate this behaviour, and the local costmap appropriately jumps on each revolution. To this point, we haven't yet found any reason why the laser output would skew after turning the robot around. Nor can we think which of the edits made in the past would cause this. We're still poking around and keeping track of what's what.

halberd gravatar image halberd  ( 2019-03-26 07:12:17 -0500 )edit

We've just discovered that the bug occurred again, confirming that move_base doesn't publish anything on cmd_vel after the goal has been reached and a message with all zeroes has been sent.

halberd gravatar image halberd  ( 2019-03-28 05:08:19 -0500 )edit