Ask Your Question
0

roslaunch and NVidia profiling

asked 2018-09-04 23:45:36 -0500

KenYN gravatar image

Has anyone had any success getting NVidia profiling tools and ROS to play well together?

At the moment, the best I can do is profile all processes, but that only reports memory copies to and from host, and some OpenCV (copy to and from cv::Mat and cv::cuda::GpuMat). My custom kernels are never profiled (yes, I have explicit cudaProfilerStart()/Stop() calls) and trying to use launch-prefix="nvprof" or directly profiling roslaunch never gets me anywhere except errors about being unable to load some nodelets.

Any suggestions as to what I might be doing wrong? I'm on Ubuntu 16.04.

edit retag flag offensive close merge delete

Comments

Are you running cuda code within nodelets? If your cuda code is running within a nodelet, you may want to try running nvprof on the nodelet manager.

ahendrix gravatar image ahendrix  ( 2018-09-05 00:17:03 -0500 )edit

I've tried that too, but no joy. I even have my cudaProfilerStart() called from every thread within the nodelet. Once or twice I have actually managed to capture calls to my CUDA code, but I've never managed to reproduce that...

KenYN gravatar image KenYN  ( 2018-09-05 00:41:32 -0500 )edit

Ah, I've tried again and just noticed an error about being unable to activate Unified Memory Profiling, so using launch-prefix="nvprof --unified-memory-profiling off" gets me further than I've ever got before.

KenYN gravatar image KenYN  ( 2018-09-05 01:13:50 -0500 )edit
1

@KenYN: what was the answer here? Your last comment?

If so: please post that as an answer and then accept your own answer.

We don't really close questions here on ROS Answers when they have an actual answer.

gvdhoorn gravatar image gvdhoorn  ( 2018-09-05 01:19:42 -0500 )edit

@gvdhoorn Oops, I cannot re-open. Can someone else please? I also discovered how to get final output, so I can actually answer the question now.

KenYN gravatar image KenYN  ( 2018-09-05 02:16:29 -0500 )edit
1

I've re-opened it for you.

gvdhoorn gravatar image gvdhoorn  ( 2018-09-05 02:28:19 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-09-05 02:36:20 -0500

KenYN gravatar image

I finally managed to get output, but not very prettily...

In my manager node line, I added launch-prefix="nvprof --unified-memory-profiling off --profile-child-process --profile-from-start off". Then in a suitable callback I added the following:

static bool startedProfile = false;
void MyClass::image_cb(const sensor_msgs::ImageConstPtr image)
{
    if (!startedProfile)
    {
        startedProfile = true;
        cudaProfilerStart();
    }
    else if (startedProfile && image->header.seq > 400) // 400 frames is enough profiling
    {
        cudaProfilerStop();
        cudaDeviceReset();
        exit(0);
    }

    // Existing code...
}

This is a very ugly way to finish profiling, but cudaProfilerStop() on its own didn't produce any output and neither did the addition of exit(0). There are other nodelets running other CUDA code on both the same and different GPUs, so perhaps we needed to force every CUDA process to stop to get the profiling results to output?

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2018-09-04 23:45:36 -0500

Seen: 417 times

Last updated: Sep 05 '18