Debugging - TCP/IP connection failed errors

asked 2019-06-29 10:15:57 -0500

jannsta gravatar image

updated 2019-06-29 10:19:08 -0500

I have a codebase that runs reliably on a particular pc with ubuntu 16.04/kinetic, I tried running this code on another PC with 18.04/Melodic and the setup only works (i.e the ros server correctly parses messages between nodes and the robot behaves as expected) intermittently. While trying to diagnose the problem on the new PC I notice that:

"Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details. "

Is often printed in the console. In addition, if I look in the roslaunch logs I can see many of the nodes reporting errors like:

[rosout][DEBUG] 2019-06-27 05:25:42,632: connecting to thehive 34707
[rospy.internal][INFO] 2019-06-27 05:25:42,662: topic impl's ref count is zero, deleting topic /optic_flow_node/matched_filter_out...
[rospy.internal][INFO] 2019-06-27 05:25:42,662: topic[/optic_flow_node/matched_filter_out] removing connection to 
http://thehive:37109/
[rosout][DEBUG] 2019-06-27 05:25:42,663: connecting to thehive 41453
[rospy.topics][ERROR] 2019-06-27 05:25:42,665: Traceback (most recent call last):
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/topics.py", line 326, in close
c.close()
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 841, in close
self.socket.close()
AttributeError: 'NoneType' object has no attribute 'close'

[rospy.internal][WARNING] 2019-06-27 05:25:42,666: Unknown error initiating TCP/IP socket to thehive:41453 (http://thehive:37109/): Traceback (most recent call last):
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 558, in connect
self.read_header()
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 653, in read_header
self._validate_header(read_ros_handshake_header(sock, self.read_buff, self.protocol.buff_size))
AttributeError: 'NoneType' object has no attribute 'buff_size'

or

[rosout][INFO] 2019-06-27 05:25:42,668: receiving optic flow topic
[rospy.internal][INFO] 2019-06-29 15:15:56,082: topic[/clock] removing connection to http://thehive:41299/    
[rospy.internal][WARNING] 2019-06-29 15:15:56,083: Unknown error initiating TCP/IP socket to thehive:56837 (http://thehive:41299/): Traceback (most recent call last):
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 558, in connect
self.read_header()
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 653, in read_header
self._validate_header(read_ros_handshake_header(sock, self.read_buff, self.protocol.buff_size))
AttributeError: 'NoneType' object has no attribute 'buff_size'
[rospy.topics][ERROR] 2019-06-29 15:15:56,084: Traceback (most recent call last):
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/topics.py", line 326, in close
c.close()
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 841, in close
self.socket.close()
AttributeError: 'NoneType' object has no attribute 'close'

I'm not 100% sure that these errors are causing the functional problems I am seeing - but so that I can at least rule it out can anyone advise what this type of warning message might suggest, and/or how I ... (more)

edit retag flag offensive close merge delete

Comments

Are you creating Publishers or Subscribers in the bodies of callbacks? That could explain this -- as they'd go out of scope so fast that the middleware has no chance of actually delivering the bytes to clients.

gvdhoorn gravatar image gvdhoorn  ( 2019-06-29 10:33:23 -0500 )edit
1

I'm pretty sure I don't do this - I do some post processing of data in the callbacks so I guess there is a chance that new data would come in while this processing is happening - could this be a problem? I'm trying to pin down if there is a particular topic that is causing the failure. From the traceback above ("topic impl's ref count is zero...") - can I infer that /optic_flow_node/matched_filter_out is causing the TCP/IP error? Or should this information be treated separately?

jannsta gravatar image jannsta  ( 2019-06-29 11:50:36 -0500 )edit

Did you solve this error? I have the same problem.

fabbro gravatar image fabbro  ( 2020-07-09 02:28:42 -0500 )edit
1

Unfortunately I didn't ever find the root cause of this. It's a while since I worked on this task but I would be keen to know what the issue was if you manage to resolve it.

jannsta gravatar image jannsta  ( 2020-07-09 02:38:48 -0500 )edit

I had problem with ROS tcp that data from topics was lost when the network connection was lost. The solution was configure the /etc/hosts file with the right hostname of the machine. Try to check your ros network configuration. http://wiki.ros.org/ROS/NetworkSetup#...

VitorHirozawa gravatar image VitorHirozawa  ( 2021-03-26 10:03:19 -0500 )edit