Tracking a message from publisher to subscriber

answered 2020-03-11 11:18:14 -0600

christophebedard

641 ●7 ●30 ●24 https://christophebedard.com

updated 2020-03-13 02:29:51 -0600

gvdhoorn
86574 ●283 ●1432 ●1054 http://cor.tudelft.nl/

Submitting an answer to my own question since I didn't get any answers.

I ended up doing pretty much what I described in my question. Full post here: https://christophebedard.com/ros-trac...

Here's a summary/excerpt:

In order to do what I've described above, similar to what I mentioned, some information is needed on:

connections between publishers and subscribers
subscriber/publisher queue states
network packet exchanges

We first need to know about connections between nodes. The ROS instrumentation includes a tracepoint for new connections (new_connection). It includes the address and port of the host and the destination, with an address:port pair corresponding to a specific publisher or subscription.

We also need to build a model of the publisher and subscriber queues. To achieve this, we can leverage the relevant tracepoints. These include a tracepoint for when a message is added to the queue (publisher_message_queued, subscription_message_queued), when it’s dropped from the queue (subscriber_link_message_dropped, subscription_message_dropped), and when it leaves the queue (either sent over the network to the subscriber (subscriber_link_message_write), or handed over to a callback (subscriber_callback_start)). We can therefore visualize the state of a queue over time!

Finally, we need information on network packet exchanges. Although this isn’t really necessary for this kind of analysis, it allows us to reliably link a message that gets published to a message that gets received by the subscriber. This is good when building a robust analysis, and it paves the way for a future critical path analysis based on this message flow analysis.

This requires us to trace both userspace (ROS) and kernel. Fortunately, we only have to enable 2 kernel events for this (net_dev_queue for packet queuing and netif_receive_skb for packet reception). It saves us a lot of disk space, since enabling many events can generate multiple gigabytes of trace data, even when tracing for only a few seconds! Also, as the rate of generated events increases, the overhead also increases. More resources have to be allocated to the buffers to properly process those events, otherwise they can get discarded or overwritten.

Result:

$C:\fakepath\result_analysis_initial_zoom.png$

Some links for actual code/further information:

Trace Compass code for this analysis: https://git.eclipse.org/c/tracecompas... and https://git.eclipse.org/c/tracecompas...
My fork of the original instrumentation fork. I improved and fixed some small things, including adding information about latched messages. https://github.com/christophebedard/r...
My fork of the original tracetools package. https://github.com/christophebedard/t...
Repo with a few test traces and a .repos file to easily setup a workspace to trace ROS. https://github.com/christophebedard/t...

edit flag offensive delete link

Comments

As there is a good chance your website will go off-line in the future (all sites do at some point). It would be great if you could summarise what you did here in your answer. That would make this somewhat less of a link-only answer, and allow it to maintain its value even without your site being operational.