Robotics StackExchange | Archived questions

How to debug this? Callbacks stop.

I have a very standard looking mobile base controller written in Python. It subscribes to cmdvel messages and publishes odometry. Odometry is published from inside a loop that includes rate.sleep and the cmdvel callback talks directly to the hardware. Both have lots of debug messages inside so I can know what they are doing. It runs about 10 to 20Hzon a Raspberry Pi 3B and uses about 6% of a processor core. It seems to run fine and the robot base moves and turns as expected.

Here is the problem: It only works for a while, perhaps 5 to 10 minutes of random driving around and then the hardware motor controller never gets another update and the robot just keeps driving with the last motor speeds until power cycled.

At first, I suspect the motor controller but no...

I see (using rostopic echo /cmdvel) thatcmdvel messages are being published correctly and no odometry is beibg published. It seems like the base controller is terminated. But no...

"rosnode ping" works as does rosnode info. I se that my controller node is running and subscribed to cmd_vel but the callback is not being called and the loop that runs the odometry publisher is not running either

The effect is as if the node were stuck inside spinOnce, not frozen because ping and info work, just stuck.

The odd thing is that is all works fine for a while, then stops. I start it all with one launch file and "control-C" stops everything

QUESTION: How do I debug this? I have many debug log messages and none are logged because I assume the callback and lop are not running.

The effect is exactly as if by base controler node were stuck in

Asked by chrisalbertson on 2019-03-04 14:17:01 UTC

Comments

As this is running over wireless: can you run a rostopic echo (and/or a rosbag record) of the cmd_vel topic on the robot to see whether that also stops?

Asked by gvdhoorn on 2019-03-04 15:14:11 UTC

Also:

the hardware motor controller never gets another update and the robot just keeps driving with the last motor speeds until power cycled.

independent of this problem: I would really recommend to implement a command time-out in your mobile base driver. That is something all mobile robots ..

Asked by gvdhoorn on 2019-03-04 15:14:43 UTC

.. should have and prevents communication issues from doing any real damage with robots repeating or continuing the last received command.

Asked by gvdhoorn on 2019-03-04 15:15:43 UTC

Yes, I have a timeout. If no cmd_vel messages are received I send a stop command to the hardware. But as I wrote, the while-rosok, dostuff() rate..sleep() loop does not run so time outs are not detected. (will write a watchdog node that monitors odometry and compares to cmd_vel. kills power)

Asked by chrisalbertson on 2019-03-04 16:09:49 UTC

It was not clear to me from your description that event processing / callback queues stop working. I had understood that it just appeared that no callbacks were being called / messages were not being received.

Do you have any other nodes on that system that exhibit the same problem? If you ..

Asked by gvdhoorn on 2019-03-04 16:14:01 UTC

Yes. I have monitored cmd_vel. the published cmd_vel is good and reacts to sensors. rosnode info shows the base controller is subscribed to cmd_vel. There is no difference between when it works and not that I can see with rostopic echo cmd_vel. rosnode ping works on all nodes.

Asked by chrisalbertson on 2019-03-04 16:14:33 UTC

.. disable parts of the base driver (ie: writing to/reading from hw fi), does this also occur?

How does your driver communnicate with the motor controller? Memory reads/writes? Serial port? Something else? If a serial port read hangs in a callback (for instance), that could explain this.

Asked by gvdhoorn on 2019-03-04 16:15:18 UTC

At first, I suspect the motor controller but no...

can you add how you reached this conclusion?

Asked by gvdhoorn on 2019-03-04 16:18:05 UTC

Yes, I agree it ONLY APPEARS as if event processing / callback queues stop . Only this one node has this problem. If this were C++ I'd suspect a pointer killed code inside a ROS library but this is Python.

My question is what to look at? I'm stumped. Nodes answer pings, cmd_val is published

Asked by chrisalbertson on 2019-03-04 16:21:09 UTC

Please see my before last comment. There's a few things to try.

Asked by gvdhoorn on 2019-03-04 16:26:14 UTC

"If a serial port read hangs in a callback...could explain this." THANKS. It uses serial.

Is spinOnce() multithreaded? Could explain how it can respond to pings and still be hung. I guessed it was not hung because it responds to pings. Where to read how it works -- link? or read source code?

Asked by chrisalbertson on 2019-03-04 16:47:29 UTC

Is spinOnce() multithreaded? Could explain how it can respond to pings and still be hung. I guessed it was not hung because it responds to pings. Where to read how it works -- link? or read source code?

rosnode ping is an XML-RPC call. That's -- afaik -- not handled by spinning.

Asked by gvdhoorn on 2019-03-05 02:03:11 UTC

I would still like to know how you ruled out the motor controller.

Asked by gvdhoorn on 2019-03-05 02:03:40 UTC

Did you figure out what was going on in the end?

Asked by gvdhoorn on 2019-03-13 03:16:26 UTC

Answers