How to debug this? Callbacks stop.

asked 2019-03-04 13:17:01 -0500

chrisalbertson gravatar image

I have a very standard looking mobile base controller written in Python. It subscribes to cmd_vel messages and publishes odometry. Odometry is published from inside a loop that includes rate.sleep and the cmd_vel callback talks directly to the hardware. Both have lots of debug messages inside so I can know what they are doing. It runs about 10 to 20Hzon a Raspberry Pi 3B and uses about 6% of a processor core. It seems to run fine and the robot base moves and turns as expected.

Here is the problem: It only works for a while, perhaps 5 to 10 minutes of random driving around and then the hardware motor controller never gets another update and the robot just keeps driving with the last motor speeds until power cycled.

At first, I suspect the motor controller but no...

I see (using rostopic echo /cmd_vel) thatcmd_vel messages are being published correctly and no odometry is beibg published. It seems like the base controller is terminated. But no...

"rosnode ping" works as does rosnode info. I se that my controller node is running and subscribed to cmd_vel but the callback is not being called and the loop that runs the odometry publisher is not running either

The effect is as if the node were stuck inside spinOnce, not frozen because ping and info work, just stuck.

The odd thing is that is all works fine for a while, then stops. I start it all with one launch file and "control-C" stops everything

QUESTION: How do I debug this? I have many debug log messages and none are logged because I assume the callback and lop are not running.

The effect is exactly as if by base controler node were stuck in

edit retag flag offensive close merge delete

Comments

As this is running over wireless: can you run a rostopic echo (and/or a rosbag record) of the cmd_vel topic on the robot to see whether that also stops?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-04 14:14:11 -0500 )edit
1

Also:

the hardware motor controller never gets another update and the robot just keeps driving with the last motor speeds until power cycled.

independent of this problem: I would really recommend to implement a command time-out in your mobile base driver. That is something all mobile robots ..

gvdhoorn gravatar image gvdhoorn  ( 2019-03-04 14:14:43 -0500 )edit
1

.. should have and prevents communication issues from doing any real damage with robots repeating or continuing the last received command.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-04 14:15:43 -0500 )edit

Yes, I have a timeout. If no cmd_vel messages are received I send a stop command to the hardware. But as I wrote, the while-rosok, dostuff() rate..sleep() loop does not run so time outs are not detected. (will write a watchdog node that monitors odometry and compares to cmd_vel. kills power)

chrisalbertson gravatar image chrisalbertson  ( 2019-03-04 15:09:49 -0500 )edit

It was not clear to me from your description that event processing / callback queues stop working. I had understood that it just appeared that no callbacks were being called / messages were not being received.

Do you have any other nodes on that system that exhibit the same problem? If you ..

gvdhoorn gravatar image gvdhoorn  ( 2019-03-04 15:14:01 -0500 )edit

Yes. I have monitored cmd_vel. the published cmd_vel is good and reacts to sensors. rosnode info shows the base controller is subscribed to cmd_vel. There is no difference between when it works and not that I can see with rostopic echo cmd_vel. rosnode ping works on all nodes.

chrisalbertson gravatar image chrisalbertson  ( 2019-03-04 15:14:33 -0500 )edit

.. disable parts of the base driver (ie: writing to/reading from hw fi), does this also occur?

How does your driver communnicate with the motor controller? Memory reads/writes? Serial port? Something else? If a serial port read hangs in a callback (for instance), that could explain this.

gvdhoorn gravatar image gvdhoorn  ( 2019-03-04 15:15:18 -0500 )edit

At first, I suspect the motor controller but no...

can you add how you reached this conclusion?

gvdhoorn gravatar image gvdhoorn  ( 2019-03-04 15:18:05 -0500 )edit