Ask Your Question

Why do ros programs often have difficulty stopping cleanly

asked 2018-10-05 03:50:27 -0500

Sietse gravatar image

Hello List,

I am working with different ros versions, mostly kinetic and melodic. But my question pertains to all versions. Often it is difficult to stop ros programs, typing control-C often is not enough. It has to be repeated multiple times, it also take a long time to actually react. I usually ends with "excalating to SIGTERM" or something. I currently see this with gazebo_ros launch files.

Why is this so, what is the technical reason that a more clean and faster stop does not seem to be possible?

Regards, Sietse

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-10-05 04:27:02 -0500

Delb gravatar image

updated 2018-10-05 04:53:44 -0500

Ros nodes don't exit as requested usually when you use threads. If you have threads running with a while loop and you missed a condition like ros::isShuttingDown() (or another function monitoring the ros state) then the thread will still be running even if you exit your node (in that case i use the commands jobs to check the processes running and kill them with kill %1).

When you exit a ros node with CTRL-C you send a signal : SIGINT. This signal has a default timeout and if the node takes too much time to exit another signal is sent : SIGTERM.

  • The SIGINT signal is an interruption of your processes and you are able to define functions to handle this interruption and do whatever you want (usually end your threads and anything that can exit the node).
  • The SIGTERM signal kills all the processes so you just end everything without going in another function to deal with it.

With Gazebo it's a common issue. To run, Gazebo needs the process gzserver.

FYI : There is also the process gzclient used for the Gazebo GUI. You can run Gazebo without gzclient, you just won't have the GUI but if you check in Rviz everything will work.Instead of gzserver which is mandatory.

The problem here is that gzserver takes a lot of time to exit when you send a SIGINT signal (I don't know why though). The default timeout is 15.0 seconds so after that time you have the SIGTERM signal sent to exit gzserver. I use a workaround thanks to #q11353 to change the default timeout of the signals. In the file :


You can modify this line to the value you want (not too low to allow other nodes to exit cleanly)

_TIMEOUT_SIGINT = 15.0 #seconds

You can also try to increase this value to see how much time it takes to actually exit cleanly if you want.

edit flag offensive delete link more


Thanks for the thorough answer! In launch files and using separate processes it is difficult to control all as a group. When all are related as threads spawned from a parent it would be easier. It would be nice is launch files had a similar functionality, but that would complicate it too much......

Sietse gravatar image Sietse  ( 2018-10-05 06:29:00 -0500 )edit

Imo if all the processes were related debugging would be harder because one process failing would kill all the other ones (if you've ever used nodelets you'll get it). Moreover, that is the very essence of launch files to launch separate processes to let you add/remove whatever package you want.

Delb gravatar image Delb  ( 2018-10-05 06:49:26 -0500 )edit

You're absolutely right, especially in an experimental environment. But in a production/industrial environment you probably want something more convenient.

Sietse gravatar image Sietse  ( 2018-10-05 06:54:11 -0500 )edit

You could do as you said directly from your nodes that would create the other threads (example in #q217960 if you want). But the code would be really messy/disorganized.

Delb gravatar image Delb  ( 2018-10-05 06:54:22 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools



Asked: 2018-10-05 03:50:27 -0500

Seen: 282 times

Last updated: Oct 05 '18