ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

Measuring Autoware's System-Wide Latency

asked 2020-04-08 05:45:10 -0500

pedroexe gravatar image

updated 2020-04-08 07:14:35 -0500

Knowing the total system latency is essential for self-driving vehicles, for assessing the performance on different platforms, with different algorithms, verify wcet, and so on. Hence:

Quick question

How can I measure Autoware system latency, i.e., the time it takes to transform inputs into the car controlling output?

Long question

I'm looking for some manner to assess autoware's system latency. That is, the time autoware takes to process an input (points_raw, image_raw) into an output (vehicle_cmd car controlling command).

Currently, I'm collecting the timestamp when inputs arrive and the timestamp when twist_gate publishes commands to control the car and comparing them. I was expecting this could give me some system-wide latency idea. However, sometimes I collect many inputs timestamps before any output, or many outputs timestamps before any new input. So Autoware may not have a 1 input : 1 output relation, which makes this measurement complex.

Also, I know that topics may have queues, and that could also explain the behavior I'm getting. So, another option would be tracking input msg until they turn into the final output msg, but I didn't find any clever way to do that either.

Does Autoware or ROS provide any standard tool which is possible to analyze that? Does anyone have an idea on how this could be performed given the autoware architecture?

Thank you.

edit retag flag offensive close merge delete

Comments

Hi @pedroexe, did you find the solution? I'm facing the same problem!

runtao gravatar image runtao  ( 2020-12-12 07:03:08 -0500 )edit

Hi @runtao. It's been a while since I did this, but in the end, I instrumented the code on my own. I defined relevant execution paths I wanted to track down (e.g., image_raw -> vision_detector -> sensor_fusion -> tracking ... etc) and manually collected the time of arrival and departure of messages on these nodes. Then I calculated the latency time with the exit time of the message in the last node I was measuring - the initial time on the first node. Also, notice that some nodes (e.g. tracking) receive messages from more than one source node. To cope with this I kept comparing the header.stamp in both the source and destination nodes (considering it as a 'signature' of the message), and discard messages that were not from the source I was trying to measure.

Unfortunately, I did not found any tool from ROS/Autoware to solve ...(more)

pedroexe gravatar image pedroexe  ( 2020-12-12 13:20:02 -0500 )edit

Hi @pedroexe, thanks, 2 more questions,

  1. as far as I know, timestamp can only record one time value, how did you collect the time of arrival and departure at the same time?
  2. how did you exactly put the timestamp of the first node and the last node together? just record the messages in rosbag then manaully read it? I am thinking if there's a way to pass the timestamp node by node from first one to the last one, so that we can do the calculation in the last node directly?
runtao gravatar image runtao  ( 2020-12-17 08:58:38 -0500 )edit
  1. I collected the arrival/departure times by instrumenting the code (using lib Chrono) to get the time at the very beginning of the callback, and right before the publish(). Hence I had a collection of pairs (begin time, end time) for each node. I could use the very first and the very last values of the given execution path (defined in prev. comment) to compute the 'end-to-end' latency.

  2. I didn't modify the timestamps nor used them as measurement data. Instead, I saw that the headers of the messages (including the timestamp) were being passed along from one node to the next. This way I could track down what was the source of the message at a given node. Say node A callback is triggered by messages from both nodes B and C. If A.msg.timestamp == B.msg.timestamp (and A.msg.timestamp != C.msg.timestamp) then the pair ...

(more)
pedroexe gravatar image pedroexe  ( 2020-12-17 14:43:09 -0500 )edit

@pedroexe sorry for stealing your time, but can you share how did you solve the problem input :output is not 1:1 when computing the 'end-to-end' system latency? So how should pair which exit time of last node with which initial time of first node?

runtao gravatar image runtao  ( 2021-01-03 04:11:49 -0500 )edit

hey @runtao, no problem. as I tried to explain in a previous reply, I'd suggest that you leverage the data in the header of messages together with initial and final times, thus you can identify messages (with the header info) to match them up later.

I used the timestamp field in the header of messages (which was not modified by the nodes I was measuring, I checked). There is also a seq field in headers, but I read it is deprecated and you should not rely on them. I think you can also modify all msg files of your interest and use populate/read to keep track of message sources.

pedroexe gravatar image pedroexe  ( 2021-01-03 07:45:33 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2022-09-14 15:56:32 -0500

yiming gravatar image

Hi, I read your post and follow the guide to measure the system wide latency. However, I met a problem which confuses me.

I found out that for the same header.stamp, there are cases that an earlier node has a late wall clock compared to a later node, I wonder how that's possible. For example, I'm recording the header.stamp in an earlier node /voxel_grid_filter and the latest node /twist_gate. I found for the same header.stamp, the wall clock for /twist_gate is 20h 35m 19s 300ms, yet the wall clock for /voxel_grid_filter is 20h 35m 19s 309s for example. Is this possible?

edit flag offensive delete link more

Comments

Hi @yiming! Honestly, it's been a while since I looked into that, but I doubt this should be happening. There is a strict ordering when messages are passed along through nodes, so the msg processed by the voxel grid should reach the twist gate later, always. At least that was always the case in my experiments, back in the day :) Perhaps you have multiple messages with the same timestamp which is causing confusion? Perhaps you are not using a high-accuracy library to get the wall clock? Apart from that, all I can think of are logical mistakes when printing/getting the timestamp value.

pedroexe gravatar image pedroexe  ( 2022-09-15 10:53:33 -0500 )edit

Question Tools

3 followers

Stats

Asked: 2020-04-08 05:45:10 -0500

Seen: 179 times

Last updated: Apr 08 '20