DDS discovery not working with ROS2 Humble and Fast DDS when nodes are on same machine

asked 2022-09-27 12:03:36 -0500

IamNotaRoBot gravatar image

updated 2022-09-30 09:57:23 -0500

Discovery in is not working when two ROS2 nodes are running on the same machine using Fast DDS. Going from one machine to another machine works, on the same machine, the two Fast DDS nodes do not discover one another. BTW, the ros2 node list command is used to test discovery results.

The setup here is ROS2 Humble (installed last week) on Ubuntu 22.04, installed on both a VirtualBox VM and a Raspberry Pi4. So Pi4 local-to-local node and VM local-to-local node does not work while Pi4-to-VM or VM-to-Pi4 does work. This was also tested with the simple talker/listener demo and it failed in the same way that discovery fails (local-to-local doesn't work but local-to-remote does). Meanwhile on Windows 10 with Humble, discovery and the local-to-local talker/listener demo both work perfectly.

The only solution that works for two nodes on the same machine is switching to CycloneDDS with both nodes. It also "kind of" works to run CycloneDDS on one node and Fast DDS on the other but it's not consistent enough to consider it "working". I can get into details of this inconsistency but mixing DDS like this isn't an ideal solution anyway!

I am new to the Humble release (last week) and only have a little experience with the Galactic release but, on Galactic, the default CycloneDDS was used (and without issues). I don't have any prior experience with Fast DDS so it's entirely possible I would have this same issue with Galactic + Fast DDS. But I have spent several hours trying to educate myself on Fast DDS, including looking into XML profile settings that might be causing issues. Changes to XML settings were necessary to get CycloneDDS working correctly so it was expected this might be necessary with Fast DDS. Unfortunately, so far no solutions and no magic XML settings have been discovered.

It was also suspected the issue might be related to multicast local loopback but no solutions were found and it really shouldn't be necessary to fiddle with kernel or network settings. That said, it's possible changes to the OS broke things... though nothing specific comes to mind (I don't normally fiddle). And whatever it might be, it doesn't break CycloneDDS.

To summarize:

  1. VM1: VirtualBox VM, Ubuntu 22.04, Humble (default install), FastDDS (default install)
  2. Pi1: Pi4, Ubuntu 22.04, Humble (default install), FastDDS (default install)
  3. PC1: Windows 10, Humble (default install), FastDDS (default install)

This does NOT work:

  • VM1: ros2 run demo_nodes_cpp listener
  • VM1: ros2 run demo_nodes_cpp talker

This does NOT work:

  • Pi1: ros2 run demo_nodes_cpp listener
  • Pi1: ros2 run demo_nodes_cpp talker

This DOES work:

  • VM1: ros2 run demo_nodes_cpp listener
  • Pi1: ros2 run demo_nodes_cpp talker

This DOES work:

  • PC1: ros2 run demo_nodes_cpp listener
  • PC1: ros2 run demo_nodes_cpp talker

This DOES work:

  • PC1: ros2 run demo_nodes_cpp listener
  • VM1, Pi1: ros2 run demo_nodes_cpp talker

Any help would be greatly appreciated!

Andy

edit retag flag offensive close merge delete

Comments

It's slightly confusing when you first write everything is "on the same machine", then later mention a couple of Virtual Machines and then also RPis. Could you perhaps clarify how many actual hosts are involved, what OS they are running, where the VMs come into this, and whether you are trying to run a multi-ROS-version setup or are comparing a previous Foxy install to your current Humble install?

A simple diagram (some boxes with names) could already help clear things up.

I can get into details of this inconsistency but mixing DDS like this isn't an ideal solution anyway!

it also isn't expected to work. See ros2/rmw_cyclonedds#184 for instance.

gvdhoorn gravatar image gvdhoorn  ( 2022-09-30 01:05:57 -0500 )edit

Sorry, did a brain dump and ended up with spaghetti. What I meant by "VM-to-VM" was "VM-to-same-VM", not to a second VM. I've updated the question and added some scenarios that work and don't work.

As mentioned previously, all of the above work perfectly after switching to CycloneDDS. And mixing flavors of DDS is indeed "expected" to work. The fact that it doesn't is one of the great ironies of the DDS standard!

I find it hard to believe that I'm the only person that's tried the talker/listener demo on nodes on the same Ubuntu 22.04 machine, so clearly it's a problem on my end!

IamNotaRoBot gravatar image IamNotaRoBot  ( 2022-09-30 09:56:25 -0500 )edit

And mixing flavors of DDS is indeed "expected" to work. The fact that it doesn't is one of the great ironies of the DDS standard!

no, it's not supposed to work.

DDS != ROS RMW.

Bytes can be exchanged using DDS infrastructure, but if the (de)serialisation doesn't match -- which it doesn't between Cyclone and Fast-DDS -- nothing will work.

That's what the issue I linked documents.

gvdhoorn gravatar image gvdhoorn  ( 2022-09-30 11:17:56 -0500 )edit

Hmmm, it looks like the interoperability issue you linked has both a DDS and RMW component. But it's reported as an issue which usually means it is in fact supposed to work.

Also, I believe (de)serialization is part of the DDS specification which in theory means something calling itself "DDS" should be interoperable with something else calling itself "DDS". I realize that doesn't always happen in practice... and it is nothing new in the DDS world.

Either way, I can't get Fast DDS to talk to Fast DDS on the same machine!

IamNotaRoBot gravatar image IamNotaRoBot  ( 2022-09-30 12:28:03 -0500 )edit

This could be a networking issue... does "ping localhost" work on the VM1 and the Pi1?

MarcoB gravatar image MarcoB  ( 2022-10-04 13:43:08 -0500 )edit

Yes, ping localhost works. Also note that everything works perfectly with Cylcone DDS. That's not to say that Cyclone doesn't just happen to workaround a network issue but it definitely indicates the network isn't completely broken. Also, both VM1 and Pi4 were "clean" installs of Ubuntu 22.04 so there is no reason the network should not work normally.

IamNotaRoBot gravatar image IamNotaRoBot  ( 2022-10-04 14:10:38 -0500 )edit

Weird... A desperate tentative could be disabling the ufw firewall https://linuxconfig.org/how-to-enable... I also stumbled upon some weird issue with Humble that I still couldn't figure out, but that involved services, not topics https://answers.ros.org/question/4027... Here is a discussion topic that might be closer to your case https://discourse.ros.org/t/fastdds-w...

MarcoB gravatar image MarcoB  ( 2022-10-04 16:09:12 -0500 )edit

Good catch and, yes, some of those issues are very similar, notably the ones where discovery isn't working normally. Thanks for pointing those out, @MarcoB. Guess I'll wait for Fast-DDS patches in a future Humble release... until then, I'll stick with CycloneDDS.

IamNotaRoBot gravatar image IamNotaRoBot  ( 2022-10-04 18:40:33 -0500 )edit