Nodes stop subscribing over time
I'm running ROS hydro on Ubuntu 12.04.
I've written a simple python script that calculates the bandwidth though a network interface and then publishes as an Int16. So there's no data acquisition from an external sensor/hardware.
Whats going on is that at the start, everything seems to be working fine. I left it running overnight, the next morning, some of my nodes stopped being able to subscribe to it, but some are still subscribing to it. rostopic echo also isn't able to subscribe to it. From running rqt_graph, both the subscribing nodes and publisher are still alive and the arrows still indicate the subscribing/publishing.
Has anyone encountered this or have an idea how I should go about debugging it? I've started to look at the .ros/log folder and I can't seem seem to make sense of it. Upgrading to Ubuntu 14.04 and to indigo is an option, but I would like to try other alternatives before an upgrade.
Edit: I've updated to indigo and running ubuntu 14.04 LTS, but I'm still seeing this same issue.
This also doesn't look like an rostopic issue. When I run into this issue, the existing nodes continue to subscribe to this topic, but new nodes or rostopic cannot. In order to fix this, I have to kill the node and let the troubled node respawn/rerun and then new nodes are able to subscribe.
Edit2:
From Ahendrix's post, I thought I would try ROS_HOSTNAME. Here was my set up:
Main computer(main-computer@192.168.1.2) : I've edited the /etc/hosts file to resolve 127.0.0.1 as main-computer. export ROS_MASTER_URI=http://main-computer:11311
export ROS_HOSTNAME=main-computer
Remote computer(remote-computer@192.168.1.3) : Edited /etc/hosts file to resolve 192.168.1.2 as main-computer. export ROS_MASTER_URI=http://main-computer:11311
export ROS_IP=192.168.1.3
What I tried to do was make sure even if my DHCP lease was up, and a new IP was assigned, I all the old nodes and new nodes would be able to resolve "main-computer" to its local ip address.
Unfortunately, this still didn't work. While "main-computer" had this problem, I logged onto "main-computer" for an rostopic echo. I saw that on master.log, it was actually adding a sub.
[rosmaster.master][INFO] 2015-03-27 12:30:47,573: +SUB [/bandwidth_msg] /rostopic_11324_1427484647396 http://main-computer:42724/
I went into the rostopic_11324_1427484647396.log file, and saw that everything looks normal, and shortly after, this error message popped up.
[rospy.internal][WARNING] 2015-03-27 12:32:32,352: Unknown error initiating TCP/IP socket to main-computer:45391 (http://main-computer:52459/): Traceback (most recent call last):
File "/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 557, in connect
self.read_header()
File "/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 619, in read_header
self._validate_header(read_ros_handshake_header(sock, self.read_buff, self.protocol.buff_size))
File "/opt/ros/indigo/lib ...
Sounds like it might be some low-level instability "thing". A first debugging would be to remove your calculations/interfacing and really just publish random data. If it still occurs it's some ROS instability. If everything works fine, it has to do something with your code (not necessarily your faul
This happened again, but unfortunately I didn't get a chance to do that setup. It might be a rostopic issue. I didn't mention that the node runs with a ROS_MASTER_URI of the local IP address and not local. I saw that the rostopic issue happened when the connection was lost and reconnect.
When this happened, I did an rostopic pub on the troubled topic, and received a warning.[WARN] [WallTime: 1423217612.971032] Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.