Monitor remote nodes with roslaunch API
I am using the Python API of roslaunch to run several nodes, part of them on a remote machine (configured in the launch files).
Since I need to know when nodes terminate or crash, I set up a process listener that should get called whenever a node dies. This is working well for nodes that run on the same machine. However, for nodes that run on the remote machine, the callback is not called for single nodes but only once the whole launch file terminates.
Is it possible to somehow set it up in a way that the process listener is also called for dying nodes on remote machines?
In case it is relevant, I am using ROS Kinetic on Ubuntu 16.04.
To clarify my problem, I provide a minimal example in the following.
This script is used for launching:
import rospy
import roslaunch
class ProcessListener(roslaunch.pmon.ProcessListener):
def process_died(self, name, exit_code):
rospy.logwarn("%s died with code %s", name, exit_code)
def init_launch(launchfile, process_listener):
uuid = roslaunch.rlutil.get_or_generate_uuid(None, False)
roslaunch.configure_logging(uuid)
launch = roslaunch.parent.ROSLaunchParent(
uuid,
[launchfile],
process_listeners=[process_listener],
)
return launch
if __name__ == "__main__":
rospy.init_node("remote_launcher")
launch_file = "/path/to/my/launchfile.launch"
launch = init_launch(launch_file, ProcessListener())
launch.start()
rospy.spin()
launch.shutdown()
The launch file starts two nodes "node1" and "node2" which I then kill manually using rosnode kill node1
and rosnode kill node2
.
When I configure it to run them on the same machine as the script above, get the following prints:
[WARN] [rosout]: node1-1 died with code 0
[WARN] [rosout]: node2-2 died with code 0
However, when I configure it to run the nodes on a remote machine, the process listener is not called for the first kill but only for the second, giving the following output:
[WARN] [rosout]: remote-hostname-0 died with code None
I think it should be somehow possible to also get a callback for dying remote nodes, as I still get the output from roslaunch itself, saying
[remote-hostname-0]: [node1-1] process has finished cleanly
[remote-hostname-0]: [node2-2] process has finished cleanly
I just figured out that the "process has finished cleanly" output is actually coming from the remote machine, the local machine just shows the remote print but does not necessarily know about the crash. So it seems to be not as easy as I hoped...
It seems to be a bug in roslaunch itself, I was able to fix it: https://github.com/ros/ros_comm/pull/... However, there is response on the pull request, so far, so I am not sure if this is the proper way to fix.