I'm trying to track messages accross nodes to eventually build a graph (well, many graphs) using LTTng and the tracetools package.

One of the things I need is to be able to connect the message that a publisher sends and the (same) message that a subscriber receives. For example, looking at this representation of pub/sub queues below, I'd like to link 0x561f5f3d1064 and 0x7f8120003230.

image description

Unfortunately, looking at roscomm and how a message is serialized, the message isn't unique since the content does not always have a timestamp/std_msgs/Header. For example, a std_msgs/String message with the same content will always be the same. Otherwise this would be too easy!

This is the solution I'm considering right now:

  • Right before the publisher sends the message over the network, we use message_start (from the buffer). This identifies the message on the publisher's side.
  • We take the very next net_dev_queue event that matches the pub/sub connection (hosts/ports of the TCP connection), and "link" the TCP sequence number (or skbaddr if we're on the same host) to the message_start above.
  • Similarly, for the subscriber, we use the message_start of the very next message that the subscriber receives after the corresponding netif_receive_skb event (with the same sequence number).
  • Thus the two message_start values are linked.

I'm not sure how reliable this method is.

Therefore I'd like to hear other ideas. Or maybe a confirmation that this might actually work and be reliable!

