tcp errors with kinetic (updated)

asked 2017-08-29 04:38:08 -0500

Sietse gravatar image

updated 2017-09-19 04:38:16 -0500

Hello,

Please see the 2-nd UPDATE below.

I upgraded our systems to Ubuntu 16.04 and ROS from indigo to kinetic. Every works fine, but communication between PC (for roscore, commands and visualisation) and our robots (arm platform also with ubuntu 16.04 and ros/kinetic) mostly does not work.

Communication (wifi) works fine, updating, installing and stuff. Also the netcat tests using port 11311 work fine.

But, when I start roscore on the PC, and "rosnode list" on the robot, it mostly does hang. Using wireshark I see duplicate TCP ACK's and TCP retransmissions. Normally I never see them (the connection is very good). Please see the wireshark screenshot.

C:\fakepath\nexus_wireshark.png

I get the same when doing rostopic echo <topic>

What can there be wrong here? How does ROS trigger this and other communication not?

UPDATE: I extensively tested the connections with the iperf3 program. Both TCP and UDP communication work perfectly.

The problem does NOT arise when I use the wired connection to the robot.

"rosnode list" on the robot almost always hangs. But the command "rosparam list" always works! What does this mean?

The command "rostopic list" works most of the time.

When I do a roslaunch on the PC of a simulated robot, "rosnode list" on the robot DOES works most of the time. How is this triggered? Also rostopic list works, but a "rostopic echo topic" on the robot hangs.

============================================================================

2-ND UPDATE: A little bit of progress...

Recall that the current USB Wifi (chipset RTL8811AU, 2.4 + 5 GHz) worked perfectly on ubuntu 14.04/indigo. I now use the same driver rtl8812au.

I tried 4 other usb dongles of which only one actually worked on the arm-board. It has chipset AR9002U and does not need an out-of-kernel driver. But I use an out-kernel-driver that works perfectly.

With this driver ROS also works perfectly, but it can only do 2.4GHz, and I really need 5GHz. But this suggests that the problem is in the driver.

But I am not convinced. I isolated the problem somewhat and found that the problem seems to be in a xmlrpc-call to roscore. Using "rosnode ping --all" I followed the flow in the code. The call "getSystemState" is made but sometimes (say 1 out of 4 calls) the response is not coming back. In roscore on the PC this is received an (I think) a response is being send. And as far as I can see, using strace, there is no further network activity on the robot.

If I do the same with a simple xmlrpc client/server setup there are never errors! I cannot get networking to fail with my tests apart from with ros as described.

Any ideas of how to test further?

Again, thanks in advance, Sietse

edit retag flag offensive close merge delete

Comments

First make sure you don't have any 'old' ROS network env vars that were only used / needed / correct on your Trusty installation. Especially ROS_HOSTNAME, ROS_IP, ROS_MASTER_URI and related variables.

gvdhoorn gravatar image gvdhoorn  ( 2017-08-29 04:53:02 -0500 )edit

Thanks for the quick response, I checked it again, and all seems well. I only have set ROS_IP and ROS_MASTER_URI to the correct ip number. In fact I only did some s/indigo/kinetic/ in the settings. And both machines are fresh installs.

Sietse gravatar image Sietse  ( 2017-08-29 05:59:15 -0500 )edit

Unless another board member has experienced exactly the same thing, I feel this is going to be difficult to diagnose like this.

In-place upgrades can be the cause of trouble, but this seems like a weird thing to happen. Can you perhaps check with an Ubuntu LiveUSB and see what happens?

gvdhoorn gravatar image gvdhoorn  ( 2017-08-30 03:08:45 -0500 )edit

Also: check for any firewalls that the Xenial installation might have.

gvdhoorn gravatar image gvdhoorn  ( 2017-08-30 03:09:16 -0500 )edit

And: does rostopic list et al. work when you do it locally? So roscore locally on your robot, then rostopic list also local? And on the PC? Does it keep working if you set ROS_IP and ROS_MASTER_URI to the IP of the robot explicitly (instead of using the localhost default)?

gvdhoorn gravatar image gvdhoorn  ( 2017-08-30 03:10:36 -0500 )edit

If I connect the robot to a wired connection of the router instead of the wireless all is well. So it has to to with the wireless connection. I'll investigate further.

Sietse gravatar image Sietse  ( 2017-08-30 03:14:13 -0500 )edit

Hm, ok. Well there are obviously updates to the wireless drivers between Trusty and Xenial, so that could certainly be an issue.

gvdhoorn gravatar image gvdhoorn  ( 2017-08-30 03:15:55 -0500 )edit

Can you test with a different wireless NIC? A usb one perhaps? Just to rule out drivers.

Seeing as some things work some of the time, and only ROS binaries seem affected, it's probably not going to solve anything, but it would provide one more observation.

gvdhoorn gravatar image gvdhoorn  ( 2017-09-05 07:07:12 -0500 )edit