ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

roslaunch and NVidia profiling

asked 2018-09-04 23:45:36 -0500

KenYN gravatar image

Has anyone had any success getting NVidia profiling tools and ROS to play well together?

At the moment, the best I can do is profile all processes, but that only reports memory copies to and from host, and some OpenCV (copy to and from cv::Mat and cv::cuda::GpuMat). My custom kernels are never profiled (yes, I have explicit cudaProfilerStart()/Stop() calls) and trying to use launch-prefix="nvprof" or directly profiling roslaunch never gets me anywhere except errors about being unable to load some nodelets.

Any suggestions as to what I might be doing wrong? I'm on Ubuntu 16.04.

edit retag flag offensive close merge delete


Are you running cuda code within nodelets? If your cuda code is running within a nodelet, you may want to try running nvprof on the nodelet manager.

ahendrix gravatar image ahendrix  ( 2018-09-05 00:17:03 -0500 )edit

I've tried that too, but no joy. I even have my cudaProfilerStart() called from every thread within the nodelet. Once or twice I have actually managed to capture calls to my CUDA code, but I've never managed to reproduce that...

KenYN gravatar image KenYN  ( 2018-09-05 00:41:32 -0500 )edit

Ah, I've tried again and just noticed an error about being unable to activate Unified Memory Profiling, so using launch-prefix="nvprof --unified-memory-profiling off" gets me further than I've ever got before.

KenYN gravatar image KenYN  ( 2018-09-05 01:13:50 -0500 )edit

@KenYN: what was the answer here? Your last comment?

If so: please post that as an answer and then accept your own answer.

We don't really close questions here on ROS Answers when they have an actual answer.

gvdhoorn gravatar image gvdhoorn  ( 2018-09-05 01:19:42 -0500 )edit

@gvdhoorn Oops, I cannot re-open. Can someone else please? I also discovered how to get final output, so I can actually answer the question now.

KenYN gravatar image KenYN  ( 2018-09-05 02:16:29 -0500 )edit

I've re-opened it for you.

gvdhoorn gravatar image gvdhoorn  ( 2018-09-05 02:28:19 -0500 )edit

3 Answers

Sort by ยป oldest newest most voted

answered 2018-09-05 02:36:20 -0500

KenYN gravatar image

I finally managed to get output, but not very prettily...

In my manager node line, I added launch-prefix="nvprof --unified-memory-profiling off --profile-child-process --profile-from-start off". Then in a suitable callback I added the following:

static bool startedProfile = false;
void MyClass::image_cb(const sensor_msgs::ImageConstPtr image)
    if (!startedProfile)
        startedProfile = true;
    else if (startedProfile && image->header.seq > 400) // 400 frames is enough profiling

    // Existing code...

This is a very ugly way to finish profiling, but cudaProfilerStop() on its own didn't produce any output and neither did the addition of exit(0). There are other nodelets running other CUDA code on both the same and different GPUs, so perhaps we needed to force every CUDA process to stop to get the profiling results to output?

edit flag offensive delete link more

answered 2021-09-01 08:31:18 -0500

look001 gravatar image

Just in case you have troubles with this as i did. You need sudo for profiling CUDA GPUs. Just adding sudo in front unfortunatelly does not work, as sudo users have different envrionment variables. ROS commands are not be available in the sudo environment. What worked for me is:

  1. sudo -s
  2. nvprof --output-profile ./profile.dat '/home/ma/catkin_ws/devel/lib/YOUR_PKG/YOUR_NODE' --profile-from-start off

If this still does not work, try to source the .bashrc after the first step. Btw. if you open the profile.dat in the NVIDiA visual profiler (nvvp) you get a very nice timeline. You can only open the file if the profiler comes from the same CUDA version as nvprof. Good luck!

edit flag offensive delete link more

answered 2023-04-19 02:19:53 -0500

Forget nvprof, current method is to use Nsight Systems @ nsys, here are some examples:

Just ensure to use corresponding version of nsys-ui to open the report generated by nsys (i.e. *.nsys-rep):

One can generate the report in Jetson and viewing it on a Windows desktop:

I used the following command to generate the report:

nsys profile --profile-from-start=off --trace=cuda,nvtx --capture-range=cudaProfilerApi roslaunch <package_name> *.launch
edit flag offensive delete link more

Question Tools

1 follower


Asked: 2018-09-04 23:45:36 -0500

Seen: 1,170 times

Last updated: Sep 01 '21