roslaunch and NVidia profiling
Has anyone had any success getting NVidia profiling tools and ROS to play well together?
At the moment, the best I can do is profile all processes, but that only reports memory copies to and from host, and some OpenCV (copy to and from cv::Mat
and cv::cuda::GpuMat
). My custom kernels are never profiled (yes, I have explicit cudaProfilerStart()/Stop()
calls) and trying to use launch-prefix="nvprof"
or directly profiling roslaunch
never gets me anywhere except errors about being unable to load some nodelets.
Any suggestions as to what I might be doing wrong? I'm on Ubuntu 16.04.
Are you running cuda code within nodelets? If your cuda code is running within a nodelet, you may want to try running nvprof on the nodelet manager.
I've tried that too, but no joy. I even have my
cudaProfilerStart()
called from every thread within the nodelet. Once or twice I have actually managed to capture calls to my CUDA code, but I've never managed to reproduce that...Ah, I've tried again and just noticed an error about being unable to activate Unified Memory Profiling, so using
launch-prefix="nvprof --unified-memory-profiling off"
gets me further than I've ever got before.@KenYN: what was the answer here? Your last comment?
If so: please post that as an answer and then accept your own answer.
We don't really close questions here on ROS Answers when they have an actual answer.
@gvdhoorn Oops, I cannot re-open. Can someone else please? I also discovered how to get final output, so I can actually answer the question now.
I've re-opened it for you.