Unreliable wall_timer in ros2
I encountered something rather alarming while using wall_timer in ros2. I created a simple class FrequencyCounter. It's only job is to increment numerical variable when it's only function is called and print frequency every 1000 calls. Then I created a node and put wall_timer that calls function from previously mentioned counter class. The executable that spins the node is here. As you can see I set priority of the process to highest secure value (which is max-1, so it doesn't interfere with system/watchdog/other important processes) and lock memory with mlockall(). I made some tests with it and it seems that for every 1000 calls the timer loses about 200-350 milliseconds. It is almost unnoticeable when I run it with low frequency e.g. when I set timer to 500ms which is 2Hz I get about 1.999Hz. On the other hand when I try to get 500Hz (2ms timer) I only receive 440-460Hz. Putting sleep in the function doesn't seem to affect frequency, I tried both sleep_for from std and rclcpp.
I tried it on both ros2 ardent latest release and master ros2 branch. I have the latest versions as I updated both yesterday. I'm working on Ubuntu 18.04, kernel version 4.15.0-20-generic.
Can this be a timer issue in ros2 or am I doing something wrong? Is it possible that timer doesn't compensate time spent in function, but counts during sleeps somehow?
EDIT: I just tried it on my BeagleBone Green that has linux and 4.4.91-ti-rt-r139 real-time kernel. The effects are the same.
Whatever the answer / resolution, please be aware that without an OS capable of deterministic scheduling, you will never get predictive performance of any timer. This is not a 'ROS thing', but will always be true, even for stuff you write yourself.
without a real-time OS running your code, this doesn't really mean anything.