ROS2 and network usage
Hi,
I'm using ROS2 Bouncy and Crystal and I noticed a very strange behavior.
I'm running several nodes on the same machine (my laptop). The nodes are standard talkers and listeners. The DDS implementation is FastRTPS. My laptop is connected to the internet through an ethernet connection and wi-fi
When I start several nodes (one immediately after the other), it's common to see the ethernet connection disconnecting. This happens more frequently when the number of nodes that I run is high (10 15 nodes).
EDIT: linked to this github issue https://github.com/ros2/rmw_fastrtps/issues/255
EDIT:
When the ethernet gets disconnected I observe the following logs:
dmesg
e1000e: enp0s31f6 NIC Link is Down
syslog
Jan 24 11:26:15 asoragna kernel: [6536.084261] e1000e: enp0s31f6 NIC Link is Down
Jan 24 11:26:17 asoragna ntpd[3727]:Deleting interface #25 enp0s31f6,10.102.1.49#123, interface stats: received=0, sent=5, dropped=0,active_time=72 secs
Jan 24 11:26:17 asoragna ntpd[3727]: Deleting interface #26 enp0s31f6, fe80::a49b:ede8:e675:83c7%2#123, interface stats: received=0, sent=0, dropped=0, active_time=72 secs
Jan 24 11:26:19 asoragna NetworkManager[785]: <info> [1548329179.3922] device (enp0s31f6): link disconnected (calling deferred action)
Jan 24 11:26:19 asoragna NetworkManager[785]: <info> [1548329179.3928] device (enp0s31f6): state change: activated -> unavailable (reason 'carrier-changed') [100 20 40]
Jan 24 11:26:19 asoragna NetworkManager[785]: <info> [1548329179.4094] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 21312
Jan 24 11:26:19 asoragna NetworkManager[785]: <info> [1548329179.4095] dhcp4 (enp0s31f6): state changed bound -> done
Jan 24 11:26:19 asoragna avahi-daemon[772]: Withdrawing address record for 10.102.1.49 on enp0s31f6.
Jan 24 11:26:19 asoragna avahi-daemon[772]: Leaving mDNS multicast group on interface enp0s31f6.IPv4 with address 10.102.1.49.
Jan 24 11:26:19 asoragna avahi-daemon[772]: Interface enp0s31f6.IPv4 no longer relevant for mDNS.
Jan 24 11:26:19 asoragna NetworkManager[785]: <info> [1548329179.4164] manager: NetworkManager state is now CONNECTED_LOCAL
Jan 24 11:26:19 asoragna avahi-daemon[772]: Withdrawing address record for fe80::a49b:ede8:e675:83c7 on enp0s31f6.
Jan 24 11:26:19 asoragna avahi-daemon[772]: Leaving mDNS multicast group on interface enp0s31f6.IPv6 with address fe80::a49b:ede8:e675:83c7.
Jan 24 11:26:19 asoragna avahi-daemon[772]: Interface enp0s31f6.IPv6 no longer relevant for mDNS.
Do you have any hint about what could be causing this?
Asked by alsora on 2019-01-24 05:24:55 UTC
Comments
is it really disconnecting, or can you just not get any other traffic to be passed through the connection? Those are different things.
Asked by gvdhoorn on 2019-01-24 06:12:41 UTC
It's really disconnecting, I get the popup message from Ubuntu showing it.
Asked by alsora on 2019-01-24 06:15:13 UTC
Does
dmesg
show something related at that time?syslog
lines?Asked by gvdhoorn on 2019-01-24 06:16:08 UTC
Updated with
dmesg
output and partialsyslog
Asked by alsora on 2019-01-24 06:37:18 UTC
What sort of switch / router do you have this connected to?
Simple(r) consumer routers can crash when they are bombarded with too much traffic.
Not saying this is the cause, but something to look at.
Asked by gvdhoorn on 2019-01-24 07:41:36 UTC
This is a bad idea and causes a lot of problems. Use one link only and your system will run stable. Desktops are not prepared as routers do. 2 default routes via 2 interfaces will cause a lot of pain.
Asked by ChriMo on 2019-01-24 12:08:06 UTC
BTW: how handles DDS dual homed hosts ???
Asked by ChriMo on 2019-01-24 12:11:22 UTC
As long as one interface does not have a default route things should be fine. And even then, if the metric is different for the two routes, one will almost never be used.
It doesn't make much sense to me such a setup, that I agree with.
Asked by gvdhoorn on 2019-01-24 12:44:43 UTC
What I mean is: wi-fi is enabled while I'm connected to the ethernet. When the ethernet disconnects the laptop switches to wi-fi connection, without waiting to connect to it, it's the default setup of every laptop I guess. The wi-fi still works now However disabling the wi-fi the outcome is the same
Asked by alsora on 2019-01-24 13:25:23 UTC
the main problem is, when my developers connect both interfaces into networks with DHCP :-(
Don't allow this !!!
Asked by ChriMo on 2019-01-24 14:01:13 UTC
no, not really.
You still haven't told us what the rest of your network setup is.
The reason for the disconnect seems to "carrier loss", either due to the switch/router, or because of some renegotiation taking place.
Asked by gvdhoorn on 2019-01-24 14:25:28 UTC
I don't know the router, however 1) it's a company router so it should be more performant than a consumer one 2) I experienced the problem after moving to new office 3) If it can be of interest the disconnection happens even if no messages are published. I will try to get info about the router
Asked by alsora on 2019-01-31 12:48:50 UTC
yes, that is certainly important information.
Would you be able to try with a different RMW implementation (ie: different DDS vendor)?
Asked by gvdhoorn on 2019-01-31 12:57:59 UTC
No issues with Opensplice. I have a different problem with Connext now, but I will try it too as soon as possible.
However this may be already suggesting that it's a fastrps problem..
Asked by alsora on 2019-01-31 13:30:00 UTC
yes, that would seem so.
I would suggest to post an issue on the fastrtps tracker.
Please post a link to that issue here, so we keep things connected.
Asked by gvdhoorn on 2019-01-31 13:32:22 UTC