Loaned Messages and zero-copy

asked 2022-05-25 15:50:09 -0500

AspireChef gravatar image

I'm working on translating a native Linux camera application to utilize the ROS2 galactic API and have been digging into the zero-copy / Loaned Message approach utilized by eProsima's FastDDS middleware. The more I dig, the less I think the zero-copy approach will be directly suitable for what I have in mind. Hopefully someone reads this and can point out where I may be mistaken in my approach and/or my understanding of Loaned Messages.

My plan is to divide the application into three nodes, each running in a separate process. The first node (aka "capture") would directly interface with Linux V4L2 drivers to perform capture. The second node (aka "txform") would remove one of the image data channels. The last node (aka "datalog") would simply log the processed image to storage. The dataflow would essentially be: capture -> txform -> datalog. Future nodes would exist that could do other useful things like perform other image processing, raw data logging, inferencing, etc. This attempt to translate my camera app is to get me familiar with latencies involved and really just "kick the ROS2 tires" since I'm new to ROS2.

Here's what I'm running into:

1) Loaned Messages don't seem to lend themselves to easily exposing the underlying message they encapsulate. I created a ROS2 message definition that includes a timestamp and a data buffer large enough to accommodate 1920x1080x4 channels worth of pixel data. I thought I'd be able to allocate a Loaned Message from the middleware and then pass the allocated buffer directly to my V4L2 drivers. In this way, my V4L2 drivers directly use the allocated message buffer and all I'd have to do is publish() the Loaned Message as images are captured.

I'm starting to think that this is not the way it is intended to be used. I've tried creating a std::vector to maintain a list of borrowed Loaned Messages, but this throws [ERROR] [1653510446.349724795] [LoanedMessage]: rcl_deallocate_loaned_message failed: error not set errors at run-time, so I'm not understanding something at a fundamental level.

I don't think I'll be able to directly reference the message within the Loaned Message. At best, I'll need a separate buffer for capture, borrow a Loaned Message, copy that capture buffer into the Loaned Message buffer, and then publish() that. It feels like an extra copy that I was hoping to avoid.

2) Shared data doesn't seem to be protected between the nodes accessing it. I created a pub and sub node. I can pass zero-copy messages from pub -> sub. However, if I put an artificial delay in the sub, it's clear that the message contents can change without the sub realizing it. I was expecting the message access to block while being accessed within the callback of the subscriber but that's not the case. I was thinking the middleware might recognize that one or more subscribers were processing things and take that ... (more)

edit retag flag offensive close merge delete

Comments

I've returned to this after a year and am still puzzled by the behavior of loaned messages in ROS2 (outlined in points 2 and 3 in the original question). With Humble, I can borrow a loaned message from the RMW and populate it with data in a publisher process and then ultimately publish it. This then wakes up my subscriber process. However, the subscriber doesn't recognize that it is operating on a loaned message and doesn't seem to have any ownership over it. I introduced an extremely large delay in the subscriber to monitor the loaned message contents before returning from its callback. Since the subscriber does not apparently own the loaned message, the publisher is able to borrow that exact same message again and populate it with new data. I've verified the subscriber can detect this.

This seems like a really bad concurrency issue and ...(more)

AspireChef gravatar image AspireChef  ( 2023-03-29 11:09:16 -0500 )edit