launch file launched node takes 100% cpu and doesn't update

asked 2018-12-18 09:21:22 -0500

lucasw gravatar image

updated 2018-12-19 15:37:02 -0500

I'm seeing oddities with nodes taking 100% cpu and not working properly. In this example running static_transform_publisher from the command line works fine, or from a launch file that does nothing but launch that node- but here when I also launch my own demo.py node (assigned to configure_windows below) I see the static_transform_publisher process take a few seconds and to climb to 100% of a cpu core after launching.

The problem seemed to get worse when I updated from bouncy to crystal binaries on Ubuntu 18.04 a couple days ago, I saw 100% cpu occasionally in bouncy but maybe under somewhat different conditions, different combinations of running nodes from a launch script.

 def generate_launch_description():
    prefix = '/tmp/ros2/imgui_ros_demo/'
    if not os.path.exists(prefix):
        os.makedirs(prefix)

    static_tf = launch_ros.actions.Node(
            package='tf2_ros',
            node_executable='static_transform_publisher',
            node_name='static_foo',
            output='screen',
            arguments=['0.1', '0.2', '0.3', '0.4', '.5', '.6', 'map', 'foo'],
            )

    node_name = 'imgui_ros'
    params = dict(
        name = 'imgui_ros demo',
        width = 1440,
        height = 800,
        )
    param_file = prefix + node_name + '.yaml'
    with open(param_file, 'w') as outfile:
        print('opened ' + param_file + ' for yaml writing')
        data = {}
        data[node_name] = dict(ros__parameters = params)
        yaml.dump(data, outfile, default_flow_style=False)
    imgui_ros = launch_ros.actions.Node(
            package='imgui_ros', node_executable='imgui_ros_node', output='screen',
            node_name=node_name,
            # arguments=[image_manip_dir + "/data/mosaic.jpg"])
            arguments=['__params:=' + param_file],
            remappings=[])

    configure_windows = launch_ros.actions.Node(
            package='imgui_ros', node_executable='demo.py', output='screen')

    return launch.LaunchDescription([
        static_tf,
        imgui_ros,
        # If I disable configure_windows then static_transform_publisher works fine
        configure_windows,
    ])

https://github.com/lucasw/imgui_ros/b...

It's somewhat involved to get the above actually running, maybe I can make a smaller reproducible example if needed.

--- Update ---

demo.py was executing like this:

demo = Demo()
demo.run()  # make a few service calls, wait for them to complete
demo.destroy_node()
rclpy.shutdown()

I then added rclpy.spin(demo) to before the destroy_node and it looks like I'm avoiding the problem, though now I have to manually kill the nodes rather than they end have performed their service calls:

try:
    demo = Demo()
    demo.run()  # make a few service calls, wait for them to complete
    rclpy.spin(demo)
finally:
    demo.destroy_node()
    rclpy.shutdown()

But what was the problem? If one node does something bad, why would it foul up an unrelated node (neither were interacting on common topics)- except perhaps through the launch infrastructure?

edit retag flag offensive close merge delete

Comments

I think you'll need a minimal example, there's too much going on here to narrow it down. Does this only happen with a launch file? What happens if you run them separately. Is the launch file itself taking the CPU up or is the nodes it is running? What does "not working properly" mean here?

William gravatar image William  ( 2018-12-19 14:47:32 -0500 )edit

It was only from a launch file. The individual node processes were going to 100% as viewed in top. For the case of static_transform_publisher, it wouldn't be publishing on tf_static, stop looping through the while rclcpp::ok() loop- presumably one of the functions called there never returned.

lucasw gravatar image lucasw  ( 2018-12-19 15:33:46 -0500 )edit

When my own node (imgui_ros) went to 100% it appeared to still be operating but next time I duplicate this I'll see if it had ceased getting callbacks or similar, maybe only the non-ros parts were functioning.

lucasw gravatar image lucasw  ( 2018-12-19 15:35:10 -0500 )edit

All the launch file does is execute them with subprocess, and subscribe to their lifecycle topics if they have them, which in this case they do not. So I don't think it has anything to do with launch personally. Again, a minimal example of the problem will be needed to debug it I think.

William gravatar image William  ( 2018-12-19 15:39:55 -0500 )edit