Robots Halting On Track Causing Crashes. Help To Identify Issue

asked 2021-08-07 03:36:48 -0500

hantoo gravatar image

Hi All, I currently have a live setup which involves four robots. Each pair of robots (R#) are located on a track. See Ascii diagram below.

--|R4|-----------|R3|--
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
--|R1|-----------|R2|--

We've been testing the setup for a while and it was fine for a few weeks however in the last couple of days we've found that the robots are randomly crashing into each other. I've come to ask for advice because the team are running out of ideas of what to do. Currently within our planner we've made it so that the robots shouldn't come within 4M. However it seems that sometimes one of the robots on the track disconnects from the motion server or halts the trajectory and causes the other robot on the track to crash into the other one. People on site sent photos of the teach pendants showing the warning messages saying motion server disconnected. We're using RobotWare 5 on our controllers and the robots are ABB irb6640. We're using ROS1 Melodic running on Ubuntu 18.04.

I couldn't attach the ROS Logs so please find them downloadable at the link below: https://we.tl/t-wHxDrbnOU7

Any help in trying to understand what is causing the robots to randomly halt on the tracks and not complete their trajectory would be very helpful as I'm at a loss now.

Below are some of the errors which stand out to me however I'm not sure if they are the reason for the robots halting.

rob1-motion_download_interface-3-stdlog | End Of File

[0m[ INFO] [1628300263.261909071]: Empty trajectory received, canceling current trajectory[0m
[0m[ INFO] [1628300263.262132804]: Joint trajectory handler: entering stopping state[0m

rosout.log | Line 1417

1628300263.261922623 INFO /rob1/motion_download_interface [/tmp/binarydeb/ros-melodic-industrial-robot-client-0.7.1/src/joint_trajectory_interface.cpp:138(joint_trajectory_interface::JointTrajectoryInterface::jointTrajectoryCB)] [topics: /rosout] Empty trajectory received, canceling current trajectory

rosout.log | Line 1421-1422

1628300263.260952750 INFO /move_group [/tmp/binarydeb/ros-melodic-moveit-ros-planning-1.0.7/trajectory_execution_manager/src/trajectory_execution_manager.cpp:1303(TrajectoryExecutionManager::executeThread)] [topics: /rosout, /move_group/planning_scene_monitor/parameter_descriptions, /move_group/planning_scene_monitor/parameter_updates, /move_group/monitored_planning_scene, /move_group/ompl/parameter_descriptions, /move_group/ompl/parameter_updates, /move_group/display_planned_path, /move_group/display_contacts, /rob1/joint_trajectory_action/goal, /rob1/joint_trajectory_action/cancel, /rob2/joint_trajectory_action/goal, /rob2/joint_trajectory_action/cancel, /rob3/joint_trajectory_action/goal, /rob3/joint_trajectory_action/cancel, /rob4/joint_trajectory_action/goal, /rob4/joint_trajectory_action/cancel, /move_group/trajectory_execution/parameter_descriptions, /move_group/trajectory_execution/parameter_updates, /move_group/plan_execution/parameter_descriptions, /move_group/plan_execution/parameter_updates, /move_group/sense_for_plan/parameter_descriptions, /move_group/sense_for_plan/parameter_updates, /execute_trajectory/result, /execute_trajectory/feedback, /execute_trajectory/status, /move_group/result, /move_group/feedback, /move_group/status, /pickup/result, /pickup/feedback, /pickup/status, /place/result, /place/feedback, /place/status] Completed trajectory execution with status TIMED_OUT ...

1628300263.261837014 INFO /move_group [/tmp/binarydeb/ros-melodic-moveit-ros-move-group-1.0.7/src/default_capabilities/execute_trajectory_action_capability.cpp:118(MoveGroupExecuteTrajectoryAction::executePath)] [topics: /rosout, /move_group/planning_scene_monitor/parameter_descriptions, /move_group/planning_scene_monitor/parameter_updates, /move_group/monitored_planning_scene, /move_group/ompl/parameter_descriptions, /move_group/ompl/parameter_updates, /move_group/display_planned_path, /move_group/display_contacts, /rob1/joint_trajectory_action/goal, /rob1/joint_trajectory_action/cancel, /rob2/joint_trajectory_action/goal, /rob2/joint_trajectory_action/cancel, /rob3/joint_trajectory_action/goal, /rob3/joint_trajectory_action/cancel, /rob4/joint_trajectory_action/goal, /rob4/joint_trajectory_action/cancel, /move_group/trajectory_execution/parameter_descriptions, /move_group/trajectory_execution/parameter_updates, /move_group/plan_execution/parameter_descriptions, /move_group/plan_execution/parameter_updates, /move_group/sense_for_plan/parameter_descriptions, /move_group/sense_for_plan ...
(more)
edit retag flag offensive close merge delete

Comments

1

This seems related to #q377749, #q372956 and #q376461, correct?

And I guess it's this setup:

constant gardners

gvdhoorn gravatar image gvdhoorn  ( 2021-08-07 06:59:51 -0500 )edit

When did this start happening? Did the ROS PC install any updates recently? Say on 2021-08-04?

And the log excerpts you show don't really help.

The "empty trajectory received" messages are from the driver which prints those when it gets a goal abort from MoveIt. MoveIt aborts any running goals when you ctrl+c the session.


Edit: looking through your logs I notice this (from the rob1-motion_download_interface):

Sending trajectory points, size: 554

that's much longer than the standard 100 points. Did you edit abb_driver RAPID code?

Could it be the controller is running out of memory?


Edit 2:

People on site sent photos of the teach pendants showing the warning messages saying motion server disconnected

please show (one of) those pictures.

gvdhoorn gravatar image gvdhoorn  ( 2021-08-07 07:20:04 -0500 )edit

When did this start happening? Did the ROS PC install any updates recently? Say on 2021-08-04?

no, it can't be an update, as the logs show you're still running 0.7.1 of industrial_robot_client.


Edit: something to check: network connectivity issues. No wireless segments between the IRC5 and the ROS PC? No wonky cabling? Visitors walking on cables? abb_driver disconnecting has been rarely observed, or if it did, it had its reasons.

The recently released update (in the form of 0.7.2 of simple_message for Melodic) includes some changes which would improve reconnection behaviour, but updating a deployed application might not necessarily be the best thing to do, unless you can easily roll back (ie: with an image backup or something).

To figure out which side is disconnecting: run Wireshark and capture the network traffic.

gvdhoorn gravatar image gvdhoorn  ( 2021-08-07 07:43:08 -0500 )edit
1

Trajectory state not received for 1.000000 seconds

this is actually a problem, as it suggests more than just the motion server has disconnected.

In addition to capturing network traffic with Wireshark, you could increase the logging level of the motion_download_interface nodes to DEBUG to see what they're doing when things go wrong.

gvdhoorn gravatar image gvdhoorn  ( 2021-08-07 08:32:30 -0500 )edit

@gvdhoorn You are indeed correct.

This started happening really recently actually. Within the last 5 days.

It could be the controller running out of memory although we've had trajectories with nearly 900 points for a few weeks and that has been running fine. Of course we try and keep it well below that though.

I'll have a look at capturing network data with wireshark. This is something I hadn't previously thought of. I did have a look an ensure all wired cables were plugged in securely and that seems fine.

Out of pure interest, could having the computer connected to the internet that is running RVis and sending the trajectories to the Robots over ethernet possibly cause a problem as well? My hunch is yes, but a second opinion would be good.

Here are the photos you asked for showing the error messages: https://photos.app.goo ...(more)

hantoo gravatar image hantoo  ( 2021-08-07 23:15:00 -0500 )edit

Hm. The TP photos seem to indicate the ROS side disconnected. Without more data / information it'll be hard to tell why.

This started happening really recently actually. Within the last 5 days.

just to rule it out you could check various logs (such as /var/log/dpkg.log) on your system to see whether it has auto-updated anything (if auto-updates are turned on: turn them off).

Out of pure interest, could having the computer connected to the internet that is running RVis and sending the trajectories to the Robots over ethernet possibly cause a problem as well?

it hasn't been in all my years using abb_driver. But I can't guarantee anything of course. Just too many variables. As long as your routing is OK it should all just work.

state and motion clients disconnecting seems like something isn't responding when it should, but at the socket ...(more)

gvdhoorn gravatar image gvdhoorn  ( 2021-08-08 04:46:06 -0500 )edit

If you have an opportunity, run your regular roslaunch command with --screen appended to it. That would force all output to the screen, which might make debugging easier.

We'll look at also updating to 0.7.2.

well, if things worked before, updating might not be the best thing to do.

What about environment factors? Have trajectories "suddenly" become longer? Has anyone changed any configuration on the IRC5s? Has the trajectory generator (multi_player?) been changed? Any configuration changed on the ROS PC? Etc.

Also:

Looking in common namespaces for param name: rob2/solve_type
Using solve type Speed

From those lines you appear to have trac_ik not configured for Distance solving. That would make planning much more predictable and avoids configuration changes most of the time. You've probably changed the joint limits already anyway, but using Distance would be what I'd do regardless.

gvdhoorn gravatar image gvdhoorn  ( 2021-08-08 04:52:30 -0500 )edit

So I've had a look in dpkg.log and things have been updating but as far as I can see, nothing that would affect ROS.
I've run roslaunch with --screen now to see if I can see any messages that would help identify the issue. Log files can be found here: https://we.tl/t-V9xAKa7ghB
Will --screen also save the new messages into the ROS log files? I've installed wireshark as you suggested and will get the team to to run it and save the log tomorrow for me to dig through. Does ROS have a certain header or bytes used to define their messages that I could use for scraping through the data easier?

Trajectories haven't changed, however we started getting issues after we swapped out a SMB board on one of the Robots, however an ABB engineer did this for us and resynced the ...(more)

hantoo gravatar image hantoo  ( 2021-08-09 09:19:31 -0500 )edit