How to synchronize rostopic publication and node's callback function ?

asked 2021-06-11 10:16:51 -0500

Dynaflock gravatar image

updated 2021-06-14 08:49:09 -0500

Hi, I'm working on a drone simulator and I have an important issue regarding callback function and message delivery.

SUMMARY (as demanded here is a summary of my problem, for more details see below the whole explanation).

I have several agents (drones here) publishing in each other communication rostopic (\com_topic) their odometry information and I would need each agent to retrieve theses messages in the same order as they are published (to perform some calculation later on).

How it works (loop for communicating):

  • agent_1 publishes its odometry in /agent_2/com_topic, /agent_3/com_topic etc.. then gives its semaphore to agent_2.
  • agent_2 publishes its odometry in /agent_1/com_topic, /agent_3/com_topic etc.. then gives its semaphore to agent_3.


So I have for each drone a list of odometry messages in their /com_topic in the same order of the semaphore passing to each and every one of the drone.

For instance, /agent_1/com_topic/ will look like:

  • agent_2 odometry message
  • agent_3 odometry message
  • ...

I use then a callback function to get that message, and I do it a certain number of time to get the messages from all the drones, so for 5 drones I will do it 4 times to enable each drone to get the odometry info from all its neighbors.

This is where my problem lies, when I use this callback function I get the message in different order and not in the one they were published initially.

So for instance in /agent_1/com_topic/ instead of getting:

  • agent_2 odometry message
  • agent_3 odometry message
  • ...

I can have:

  • agent_3 odometry message
  • agent_3 odometry message
  • agent_2 odometry message
  • ...

It can be very random.

So my question is: is there a way to assure the transmission of messages in the right order if I have several publisher ?

Thanks you in advance.


How the simulator is used:

  • each drone is called in gazebo by launch file, creates a node (drone_1, drone_2 etc) coming with 2 main plugins: a controller and a communication one. (each drone have a physical model of a controller and an antenna)

  • I use a rosservice call to trigger the communication --> each drone after another will throw a ray to check for obstacle, if there is not between him and the other drones, it will publish in the com rostopic its odometry information (nav_msgs/Odometry). each drones publish in all its neighbors in one time step, so all its neighbors will receive the odometry in the same time step

    • at each time step, each nodes (each drone) compute its command and send it to its controller plugin that applies this command in the simulation.


  • at each time step, only one drone can communicate with all the other drone, so each drone need several time steps to receive all the information from its neighbors.
  • so at each time step, I use a callback function that get the odometry information from the communication rostopic.

Now, I'm working on flocking application, so basically I want each drone (each ... (more)

edit retag flag offensive close merge delete


This is quite a bit of text, and your real question is buried somewhere in it.

Could you perhaps summarise what it is you observe, and what you believe should happen instead?

As a quick comment: there are no guarantees about message delivery nor about the order in which messages are delivered to Subscribers necessarily. Especially not with multiple Publishers and Subscribers all concurrently active.

gvdhoorn gravatar image gvdhoorn  ( 2021-06-12 04:41:47 -0500 )edit

Hi, thanks for answering. Sorry for the long intro, I thought that giving a detailed context would make it easier to understand but I will update my message and add a summary for the next ones :)

Okay so even if I observe the right order of messages in the rostopic (meaning the publication is working as intended) there are no guarantee the subscribers will receive the different messages in the same order .

Is there a way to workaround this way of working or not at all ?


Dynaflock gravatar image Dynaflock  ( 2021-06-14 08:26:13 -0500 )edit

I still don't clearly understand what you're trying to achieve. If you could describe that in a few sentences, that would help.

Note: it's best if you describe what you want to do, not the approach you've tried to implement so far.

I would need each agent to retrieve theses messages in the same order as they are published

why? Is order important, or is it important to get a coherent snapshot in time of the system? If the latter: what about synchronising data streams using message_filters?

Are you trying to implement an algorithm which must have complete information about the distributed system at a specific timestamp perhaps?

Drones publishing to each others odom topics seems complex and doesn't really scale. Perhaps there is some other way of making this more robust.

gvdhoorn gravatar image gvdhoorn  ( 2021-06-14 10:02:28 -0500 )edit

Hey, the important of the order is related to the idea of flocking. I need to compute the command of agent_1 according to the odometry of the neighbors. If in my calculation I use twice the odometry of agent_2 for instance, then calcuation is false.

In the algorithm I wait for a certain number of time step (number of time step waited = number of neighbors = total number of agent - 1) to ensure that I get the odometry of all the neighbors. But if I can't ensure that I get the same order, then all my computation will always be false hence generating false trajectories.

BTW: the drones actually don't publish in other drone's com topic, they actually publish in their own topic the odometry of other drones... (just learned about this subtlety), in the order of communication (using the semaphore system).

I didn't know about message_filters ...(more)

Dynaflock gravatar image Dynaflock  ( 2021-06-15 02:10:54 -0500 )edit

it sounds like it's not the order of messages that's important, but the fact you need a consistent and coherent snapshot of the state of all the entities in the flock at the same point in time / your simulation.

Using order of messages for that seems rather brittle.

Wouldn't it be possible to gather information about the other drones by looking at the child_frame_id of each message instead?

Or perhaps go to a custom message which includes some sort of way to encode which iteration of the algorithm this is, and individual drones lockstepping with others based on that value.

the drones actually don't publish in other drone's com topic, they actually publish in their own topic the odometry of other drones

that seems .. strange.

I didn't know about message_filters or TimeSynchronization Policy and I will explore those path to see how it can ...

gvdhoorn gravatar image gvdhoorn  ( 2021-06-15 02:44:59 -0500 )edit

Okay, you mean get the message only if the child_frame_id is the one I'm looking for and then incrementing to get the info of all of them ? I thought about that (and probably will try), but I fear the total randomness of the time for each drone to get the different info of their neighbors... agent_1 may take 50ms to get the info of the 4 other drones, while agent_2 may take 100ms etc..

Yeah I'm not the one who coded those plugin for communication, I'm still discovering some of the features... that's why I'm struggling.

OK I'll have a look at the fkie_message_filters, but is the idea about not transferring to callback function messages that don't respect a filter (like child_frame_id) or is it something more ?

Thanks for your help, really appreciate.

Dynaflock gravatar image Dynaflock  ( 2021-06-15 03:16:55 -0500 )edit

It sound like you have some kind of collision arbitration occurring on your communications similar to CAN bus where lower node IDs have priority. That said, you really should rethink your flocking algorithm. Swarming agents flock on prediction more than simple detection. There are always sensory and processing delays. The critical information in a received odometry message should not be when it is received, but rather, the time stamp showing when it was sent. If your problem is real world, you should expect to lose messages. If you really want to time synchronize then you allocate them a specific millisecond timeslot from a common start time offset by node id. I suspect your odometry transmission frequency is too high. Consider dropping your transmission frequency and include odometry and intention in a given agents periodic transmission. Also consider that natural flocking behaviour like murmurations are caused by local reactions to nearby ...(more)

James NT gravatar image James NT  ( 2021-06-16 06:55:30 -0500 )edit

Thanks for your insights. I noticed before reading your comment that decreasing the frequency actually helped my node to do the job in the right order. Regarding the flocking, yes I do know this work using only close neighbors information but I wanted to check the overall working of my simulator first. Lastly, do you think that a high frequency information exchanges system could give the change to a prediction system ? Meaning if I choose not to try to predict my neighbors behaviour but rather have a high frequency exchange of information (approaching real-time) wouldn't it be viable ?

Dynaflock gravatar image Dynaflock  ( 2021-06-18 10:21:16 -0500 )edit