Revision history - ROS Answers: Open Source Q&A Forum

As part of the ROS Real-time Working Group (RTWG) we have the idea to create several guides to help ROS 2 users to develop real-time applications. Please take a look at the RTWG documentation and our current roadmap. https://real-time-working-group.readthedocs.io/en/latest/Roadmap/Roadmap.html. As you can see some topics are planned but are still WIP and there is no information available yet. Hopefully, we will be able to complete all these tasks in the next few months.

In the remainder of this answer, we collected some guidelines you could follow to improve your application's real-time performance in the meanwhile.

With respect to the Linux kernel, if your application really needs real-time performance we suggest that you use the full Preempt-RT kernel. We created a guide to explain how to build and configure Raspberry PI 4 with Preempt-RT using a docker-based tool. https://real-time-working-group.readthedocs.io/en/latest/Guides/Real-Time-Operating-System-Setup/Real-Time-Linux/build_rt_kernel_using_docker.html. While this is only for the RPI4, the guide should help you to understand the steps to build the kernel and the settings we are using. Additionally, you can take a look at this presentation which gives a good overview of how to configure the kernel:

https://ogness.net/ese2020/ese2020_johnogness_rtchecklist.pdf
https://www.youtube.com/watch?v=3DNrjXEaTSyrw&ab_channel=3DTheLinuxFoundation

Depending on the computing device you are using you may have to configure additional settings (i.e: disable hyper-threading, disable NMIs, etc)

Finally, you should use cyclictest (https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start) to check the real-time performance you are getting with your system and your configuration. This is important because this allows you to catch issues with your configuration and your system and provides a baseline of the performance you could get.

Using CPU isolation or CPU affinity could help to improve the performance of your application. Note with this approach you would reduce the overall CPU available bandwidth of your system because your non-real-time applications won't be scheduled in the isolated CPUs. This approach could make sense if you have a clear separation of real-time and non-real-time applications in your system, and enough CPU for all your applications.

With respect to assigning RT priorities, one thing you could do is to separate node tasks in different callback groups depending on their priority and run them in different threads with different priorities. Here is an example https://github.com/ros2/examples/tree/master/rclcpp/executors/cbg_executor. Additionally, some DDS implementations allow to fine-tune their internal thread priorities and CPU affinities. For example, in the case of CycloneDDS it is possible to set the stack size, scheduling class, and scheduling priority for each thread (https://github.com/eclipse-cyclonedds/cyclonedds/blob/master/docs/manual/config.rst#thread-configuration). Note that if your real-time application depends on network communications you will have to tune the kernel network-related threads accordingly to the priorities you're setting in your application (see https://arxiv.org/pdf/1808.10821.pdf).

Here I'm listing some additional points related to memory management, blocking calls, etc.

Lock the process memory to avoid memory page faults
- https://wiki.linuxfoundation.org/realtime/documentation/howto/applications/memory
- Example: https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/utilities/rt_enabler.hpp#L73
Allocate the memory before the node transitions into the active state.
- For example pre-allocate messages and re-use them in runtime phase. https://gitlab.com/autowarefoundation/autoware.auto/AutowareAuto/-/blob/master/src/perception/filters/point_cloud_filter_transform_nodes/test/test_point_cloud_filter_transform.cpp#L75-95
Use a real-time capable allocator
- In Apex.OS, we use memory pools based allocators: https://www.youtube.com/watch?v=3Dl14Zkx5OXr4
- In ROS 2 there is one custom real-time capable allocator you could use https://docs.ros.org/en/foxy/Tutorials/Allocator-Template-Tutorial.html (https://github.com/ros2/realtime_support/tree/master/tlsf_cpp)
Use real-time capable containers https://gitlab.com/ApexAI/apex_containers
Use bounded or fixed-size data types
- In Apex.OS all message types are upper bounded
- In ROS 2 you could patch the messages you are using and set an upper bound depending on your application requirements
Avoid logging macro calls in the real-time code path
- In Apex.OS we have a real-time logger that avoids using any kind of potentially blocking calls such as e.g. cerr, fprintf, ... and instead uses DDS network communication to send the log data to the non-safety critical partition where it gets written to a file.
- In ROS 2 you replace your logging macros with a real-time logger or remove them from your real-time path and log from a non-real-time thread
In Multi-threaded applications blocking synchronization primitives such as mutexes may suffer from priority inversion
- Apex.OS implements its own threading package with configurable and extended synchronization primitives
- Use POSIX mutexes with priority inheritance enabled
- Avoid locks using lock-free data sharing mechanisms (std::atomics, FIFOs, CAS)
- https://github.com/eclipse-iceoryx/iceoryx/tree/master/iceoryx_hoofs#concurrent
- https://www.boost.org/doc/libs/1_76_0/doc/html/lockfree.html
- https://github.com/hogliux/farbot
Detection of memory allocation and blocking system calls
- Create tests to verify your real-time functions don't allocate. You can use: https://gitlab.com/ApexAI/apex_test_tools
- Also could insert checks in your code base using
  - osrf_testing_tools_cpp::memory_tools
  - https://github.com/osrf/osrf_testing_tools_cpp#memory_tools
  - https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/data_running/data_runner_base.hpp
Use ROS 2 tracing to measure end-to-end latencies and detect blocking calls (we will show an example in the ROScon executor workshop (https://www.apex.ai/roscon-21))
- https://gitlab.com/ros-tracing/ros2_tracing
- https://drive.google.com/file/d/1ogc43vQl79TAJBh_-hG20I-oaZ0iHr8n/view

I hope this information helps to improve your performance. I would encourage you to attend the RTWG and present your use case there. You will certainly get additional tips and support from other participants there.

As part of the ROS Real-time Working Group (RTWG) we have the idea to create several guides to help ROS 2 users to develop real-time applications. Please take a look at the RTWG documentation and our current roadmap. ~~https://real-time-working-group.readthedocs.io/en/latest/Roadmap/Roadmap.html.~~ https://ros-realtime.github.io/. As you can see some topics are planned but are still WIP and there is no information available yet. Hopefully, we will be able to complete all these tasks in the next few months.

In the remainder of this answer, we collected some guidelines you could follow to improve your application's real-time performance in the meanwhile.

With respect to the Linux kernel, if your application really needs real-time performance we suggest that you use the full Preempt-RT kernel. We created a guide to explain how to build and configure Raspberry PI 4 with Preempt-RT using a docker-based tool. https://real-time-working-group.readthedocs.io/en/latest/Guides/Real-Time-Operating-System-Setup/Real-Time-Linux/build_rt_kernel_using_docker.html. While this is only for the RPI4, the guide should help you to understand the steps to build the kernel and the settings we are using. Additionally, you can take a look at this presentation which gives a good overview of how to configure the kernel:

https://ogness.net/ese2020/ese2020_johnogness_rtchecklist.pdf
https://www.youtube.com/watch?v=3DNrjXEaTSyrw&ab_channel=3DTheLinuxFoundation

Depending on the computing device you are using you may have to configure additional settings (i.e: disable hyper-threading, disable NMIs, etc)

Finally, you should use cyclictest (https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start) to check the real-time performance you are getting with your system and your configuration. This is important because this allows you to catch issues with your configuration and your system and provides a baseline of the performance you could get.

Using CPU isolation or CPU affinity could help to improve the performance of your application. Note with this approach you would reduce the overall CPU available bandwidth of your system because your non-real-time applications won't be scheduled in the isolated CPUs. This approach could make sense if you have a clear separation of real-time and non-real-time applications in your system, and enough CPU for all your applications.

With respect to assigning RT priorities, one thing you could do is to separate node tasks in different callback groups depending on their priority and run them in different threads with different priorities. Here is an example https://github.com/ros2/examples/tree/master/rclcpp/executors/cbg_executor. Additionally, some DDS implementations allow to fine-tune their internal thread priorities and CPU affinities. For example, in the case of CycloneDDS it is possible to set the stack size, scheduling class, and scheduling priority for each thread (https://github.com/eclipse-cyclonedds/cyclonedds/blob/master/docs/manual/config.rst#thread-configuration). Note that if your real-time application depends on network communications you will have to tune the kernel network-related threads accordingly to the priorities you're setting in your application (see https://arxiv.org/pdf/1808.10821.pdf).

Here I'm listing some additional points related to memory management, blocking calls, etc.

Lock the process memory to avoid memory page faults
- https://wiki.linuxfoundation.org/realtime/documentation/howto/applications/memory
- Example: https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/utilities/rt_enabler.hpp#L73
Allocate the memory before the node transitions into the active state.
- For example pre-allocate messages and re-use them in runtime phase. https://gitlab.com/autowarefoundation/autoware.auto/AutowareAuto/-/blob/master/src/perception/filters/point_cloud_filter_transform_nodes/test/test_point_cloud_filter_transform.cpp#L75-95
Use a real-time capable allocator
- In Apex.OS, we use memory pools based allocators: https://www.youtube.com/watch?v=3Dl14Zkx5OXr4
- In ROS 2 there is one custom real-time capable allocator you could use https://docs.ros.org/en/foxy/Tutorials/Allocator-Template-Tutorial.html (https://github.com/ros2/realtime_support/tree/master/tlsf_cpp)
Use real-time capable containers https://gitlab.com/ApexAI/apex_containers
Use bounded or fixed-size data types
- In Apex.OS all message types are upper bounded
- In ROS 2 you could patch the messages you are using and set an upper bound depending on your application requirements
Avoid logging macro calls in the real-time code path
- In Apex.OS we have a real-time logger that avoids using any kind of potentially blocking calls such as e.g. cerr, fprintf, ... and instead uses DDS network communication to send the log data to the non-safety critical partition where it gets written to a file.
- In ROS 2 you replace your logging macros with a real-time logger or remove them from your real-time path and log from a non-real-time thread
In Multi-threaded applications blocking synchronization primitives such as mutexes may suffer from priority inversion
- Apex.OS implements its own threading package with configurable and extended synchronization primitives
- Use POSIX mutexes with priority inheritance enabled
- Avoid locks using lock-free data sharing mechanisms (std::atomics, FIFOs, CAS)
- https://github.com/eclipse-iceoryx/iceoryx/tree/master/iceoryx_hoofs#concurrent
- https://www.boost.org/doc/libs/1_76_0/doc/html/lockfree.html
- https://github.com/hogliux/farbot
Detection of memory allocation and blocking system calls
- Create tests to verify your real-time functions don't allocate. You can use: https://gitlab.com/ApexAI/apex_test_tools
- Also could insert checks in your code base using
  - osrf_testing_tools_cpp::memory_tools
  - https://github.com/osrf/osrf_testing_tools_cpp#memory_tools
  - https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/data_running/data_runner_base.hpp
Use ROS 2 tracing to measure end-to-end latencies and detect blocking calls (we will show an example in the ROScon executor workshop (https://www.apex.ai/roscon-21))
- https://gitlab.com/ros-tracing/ros2_tracing
- https://drive.google.com/file/d/1ogc43vQl79TAJBh_-hG20I-oaZ0iHr8n/view

I hope this information helps to improve your performance. I would encourage you to attend the RTWG and present your use case there. You will certainly get additional tips and support from other participants ~~there.~~ there.

As part of the ROS Real-time Working Group (RTWG) we have the idea to create several guides to help ROS 2 users to develop real-time applications. Please take a look at the RTWG documentation and our current roadmap. https://ros-realtime.github.io/. As you can see some topics are planned but are still WIP and there is no information available yet. Hopefully, we will be able to complete all these tasks in the next few months.

In the remainder of this answer, we collected some guidelines you could follow to improve your application's real-time performance in the meanwhile.

With respect to the Linux kernel, if your application really needs real-time performance we suggest that you use the full Preempt-RT kernel. We created a guide to explain how to build and configure Raspberry PI 4 with Preempt-RT using a docker-based ~~tool. https://real-time-working-group.readthedocs.io/en/latest/Guides/Real-Time-Operating-System-Setup/Real-Time-Linux/build_rt_kernel_using_docker.html.~~ tool linux-real-time-kernel-builder.. While this is only for the RPI4, the guide should help you to understand the steps to build the kernel and the settings we are using. Additionally, you can take a look at this presentation which gives a good overview of how to configure the kernel:

https://ogness.net/ese2020/ese2020_johnogness_rtchecklist.pdf
https://www.youtube.com/watch?v=3DNrjXEaTSyrw&ab_channel=3DTheLinuxFoundation

Depending on the computing device you are using you may have to configure additional settings (i.e: disable hyper-threading, disable NMIs, etc)

Finally, you should use cyclictest (https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start) to check the real-time performance you are getting with your system and your configuration. This is important because this allows you to catch issues with your configuration and your system and provides a baseline of the performance you could get.

Using CPU isolation or CPU affinity could help to improve the performance of your application. Note with this approach you would reduce the overall CPU available bandwidth of your system because your non-real-time applications won't be scheduled in the isolated CPUs. This approach could make sense if you have a clear separation of real-time and non-real-time applications in your system, and enough CPU for all your applications.

With respect to assigning RT priorities, one thing you could do is to separate node tasks in different callback groups depending on their priority and run them in different threads with different priorities. Here is an example https://github.com/ros2/examples/tree/master/rclcpp/executors/cbg_executor. Additionally, some DDS implementations allow to fine-tune their internal thread priorities and CPU affinities. For example, in the case of CycloneDDS it is possible to set the stack size, scheduling class, and scheduling priority for each thread (https://github.com/eclipse-cyclonedds/cyclonedds/blob/master/docs/manual/config.rst#thread-configuration). Note that if your real-time application depends on network communications you will have to tune the kernel network-related threads accordingly to the priorities you're setting in your application (see https://arxiv.org/pdf/1808.10821.pdf).

Here I'm listing some additional points related to memory management, blocking calls, etc.

Lock the process memory to avoid memory page faults
- https://wiki.linuxfoundation.org/realtime/documentation/howto/applications/memory
- Example: https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/utilities/rt_enabler.hpp#L73
Allocate the memory before the node transitions into the active state.
- For example pre-allocate messages and re-use them in runtime phase. https://gitlab.com/autowarefoundation/autoware.auto/AutowareAuto/-/blob/master/src/perception/filters/point_cloud_filter_transform_nodes/test/test_point_cloud_filter_transform.cpp#L75-95
Use a real-time capable allocator
- In Apex.OS, we use memory pools based allocators: https://www.youtube.com/watch?v=3Dl14Zkx5OXr4
- In ROS 2 there is one custom real-time capable allocator you could use https://docs.ros.org/en/foxy/Tutorials/Allocator-Template-Tutorial.html (https://github.com/ros2/realtime_support/tree/master/tlsf_cpp)
Use real-time capable containers https://gitlab.com/ApexAI/apex_containers
Use bounded or fixed-size data types
- In Apex.OS all message types are upper bounded
- In ROS 2 you could patch the messages you are using and set an upper bound depending on your application requirements
Avoid logging macro calls in the real-time code path
- In Apex.OS we have a real-time logger that avoids using any kind of potentially blocking calls such as e.g. cerr, fprintf, ... and instead uses DDS network communication to send the log data to the non-safety critical partition where it gets written to a file.
- In ROS 2 you replace your logging macros with a real-time logger or remove them from your real-time path and log from a non-real-time thread
In Multi-threaded applications blocking synchronization primitives such as mutexes may suffer from priority inversion
- Apex.OS implements its own threading package with configurable and extended synchronization primitives
- Use POSIX mutexes with priority inheritance enabled
- Avoid locks using lock-free data sharing mechanisms (std::atomics, FIFOs, CAS)
- https://github.com/eclipse-iceoryx/iceoryx/tree/master/iceoryx_hoofs#concurrent
- https://www.boost.org/doc/libs/1_76_0/doc/html/lockfree.html
- https://github.com/hogliux/farbot
Detection of memory allocation and blocking system calls
- Create tests to verify your real-time functions don't allocate. You can use: https://gitlab.com/ApexAI/apex_test_tools
- Also could insert checks in your code base using
  - osrf_testing_tools_cpp::memory_tools
  - https://github.com/osrf/osrf_testing_tools_cpp#memory_tools
  - https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/data_running/data_runner_base.hpp
Use ROS 2 tracing to measure end-to-end latencies and detect blocking calls (we will show an example in the ROScon executor workshop (https://www.apex.ai/roscon-21))
- https://gitlab.com/ros-tracing/ros2_tracing
- https://drive.google.com/file/d/1ogc43vQl79TAJBh_-hG20I-oaZ0iHr8n/view

I hope this information helps to improve your performance. I would encourage you to attend the RTWG and present your use case there. You will certainly get additional tips and support from other participants there.

Revision history [back]