ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Profiling an application

This readme is a step by step application for using VTUNE to profile a node.

General Overview

In order to profile your application, you will need to do the following:

  1. Edit the application CMakeLists.txt to compile it correctly for profiling
  2. Edit the application to add any user instrumentation
  3. Edit the application launch file in order to launch it under the profiler with desired profile type
  4. Edit the launch point of the application to set the timeouts longer
  5. Set the correct settings in the pc to allow user profiling
  6. Run the node
  7. View the profiler results with the VTune GUI

We will go through these step by step here.

CMakeLists.txt

There are a few changes for your application binary that allow it to be profiled.

#  README: This is an example CMakeLists.txt for adding a target that you intend to profile. It
#  is a mix of the regular catkin for ros stuff as well as the extra stuff you need for the 
#  actual profiling bit. Each bit will be explained below.

# Regular ROS stuff, as required for all ros nodes.
cmake_minimum_required(VERSION 3.2)
project(profiler_example)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED YES)

find_package(catkin REQUIRED COMPONENTS
  roscpp
)

find_package(Threads REQUIRED)  # Threads may or may not be required. For the profiler, the target that you link against  
                                # will need to be linked against threads if it uses threads. For basically any ros node
                                # this will be the case because of subscription queues etc. 


catkin_package()

set(THREADS_PREFER_PTHREAD_FLAG ON) # This is just for us, we do prefer pthreads on our system, because linux etc. We
                                    # want this to ensure that we use pthreads which are profilable. Very high chance
                                    # we are using them anyway.

add_executable(test_profiler     # We add our executable as per usual, with all the translation units included. 
    thread_profile_example.cpp
)

#  The following 2 lines are very important if you wish to use the user instrumentation API detailed here:
#  https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis/instrumentation-and-tracing-technology-api-reference.html
#  This is so you can include and then link against their libs. For very basic example usage, see the thread_profiling_example.cpp
target_include_directories(test_profiler PRIVATE /opt/intel/oneapi/vtune/latest/sdk/include) 
target_link_directories(test_profiler PRIVATE /opt/intel/oneapi/vtune/latest/sdk/lib64)

# Each of the compiler options and libs we are linking against are quite important, so they will be detailed one at a time. 
target_compile_options(test_profiler PRIVATE
    -g                          # The most important, this compiles your application with profiling symbols.
    -fno-omit-frame-pointer     # This adds some extra information that allows the vtune application to more easily view your profile. 
    -D_LINUX                    # This is suggested by Vtune, i don't know what this does.
    -fno-asm                    # This stops you from ending up with no code trace I think.
)

target_link_libraries(test_profiler PRIVATE 
    ittnotify           # This is important to link against if you are using the user API, see above include directories
    dl                  # This allows the dynamic linking of the user api.
    pthread             # This or the below must be linked if you are using threads. Adding both doesn't hurt. 
    Threads::Threads    # See above.
    ${catkin_LIBRARIES} # Normal ros linking.
)

# Below is more regular ROS stuff
add_dependencies(test_profiler ${catkin_EXPORTED_TARGETS})

install(TARGETS 
    test_profiler

    ARCHIVE DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION}
    LIBRARY DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION}
    RUNTIME DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
)

The details for how to do this were taken from the compiler switches for linux vtune page. It is also helpful to look at the makefile in the provided matrix example in /opt/intel/oneapi/vtune/latest/samples/en/C++/matrix.

Adding user API instrumentation

Details about how to do this are here, with the most useful ones so far being: 1. One time user events 2. Thread naming

But you can do pretty much anything.

Launching from a launch file

In order to profile a node being launched from the ROS ecosystem you need to do a couple of things.

The first is to amend the launch file with a prefix to launch the profiler. You do this with:

launch-prefix=" /opt/intel/oneapi/vtune/2021.4.0/bin64/vtune # Always give the full path, the environment is not set up the way you think. --collect hotspots # Specify the type of profile you want to make -knob sampling-mode=hw -knob stack-size=0 # Specify extra arguments, like use hw mode. -result-dir /tmp/profiling/little-test-@@@ # Specify the output directory, @ is an auto incrementing numeral. "

Note: This all needs to be a single line without comments.

The full command line interface documentation is here. These are my recomendations: - Use hw sampling mode, it has a smaller impact and gets more samples. - There are many different profilers you can run, there is a list here - The easiest way to get a complex command line is to select the items you want with the GUI and check the generated command line. When you create a new profiler action in the gui, next to the start (play) button, there is a button to show you the command line of what you are about to run. - The vtune profiler needs significant time to finalize the result and make a report. If you want this to occur, you need to add some extended timeout to the launch so that it doesn't send SIGTERM to bomb your application while it is finalizing. You can add --sigint-timeout=60 --sigterm-timeout=60 to any launch command to extend this death time. It is likely that you will want to add it to node_manager.py where most launch files are run.

Changing your environment

You need a bunch of different environment variables. Likely that you will need to play with this for your specific example, eg. your user may not always be root or something.

echo "source /opt/intel/oneapi/setvars.sh" >> /home/user/.bashrc sed -i 's/ptrace_scope = 1/ptrace_scope = 0/g' /etc/sysctl.d/10-ptrace.conf echo "echo 0 > /proc/sys/kernel/perf_event_paranoid" >> /home/user/.bashrc echo "export INTEL_LIBITTNOTIFY32=/opt/intel/oneapi/vtune/latest/lib32/runtime/libittnotify_collector.so" >> /home/user/.bashrc echo "export INTEL_LIBITTNOTIFY64=/opt/intel/oneapi/vtune/latest/lib64/runtime/libittnotify_collector.so" >> /home/user/.bashrc

Running the stack and getting results

You can see above:

echo "echo 0 > /proc/sys/kernel/perf_event_paranoid" >> /root/.bashrc

If the user is not privileged, you may need to do:

sudo sysctl -w kernel.perf_event_paranoid=1

If you have set up everything correctly so far, you should just be able to run the system and expect the results to be generated at the location where you specified in the launch file (using -result-dir).

Now you can simply use vtune-gui top open the results. Use ctrl-o and go to the result folder and open the .vtune file. Happy profiling!

Attaching to an existing process with PID

Currently this does not fully work - see current bug report here. - The collection of user functions and system functions is correct. - The collection of user API is not working, aka, ITT events and thread names will not appear in the profiler data.

You can use -target-pid to attach to a process already running to capture profiler output. The application needs to be started in an environment where the envrionment variables INTEL_LIBITTNOTIFY32 and INTEL_LIBITTNOTIFY64 are set with the correct dll path. You can find your process with ps -aux or if you know the name already issue a command like:

/opt/intel/oneapi/vtune/2021.4.0/bin64/vtune \ --collect hotspots -knob sampling-mode=hw -knob sampling-interval=0.5 -knob stack-size=0 \ -result-dir /tmp/docker_share/profiling/test-@@@ \ -target-pid $(ps -ef | grep example | head -n 1 | awk '{print $2}')

The last line will find the pid for you (ensure that your process name is unique, otherwise just find it yourself).