rosmaster stops responding

asked 2017-10-04 15:02:49 -0600

davr gravatar image

At some point rosmaster will stop responding to queries. It is an intermittent issue but has been happening more often as our system gets more complex with more nodes, more topics, etc. The symptoms is that rosmaster will accept TCP connections, but will not reply any data. Any already running nodes that have already subscribed to topics will continue to work, as they talk directly to each other, but anything that needs to subscribe to new topics, or list existing nodes will hang. Here is the end of the output of running strace rosnode list:

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(11311), sin_addr=inet_addr("127.0.0.1")},              16) = 0
sendto(4, "POST /RPC2 HTTP/1.1\r\nHost: 127.0"..., 339, 0, NULL, 0) = 339
recvfrom(4,

The problem tends to happen after 20 minutes to an hour of the system running, so it's not super fast to reproduce. I don't see anything obviously wrong in various log files.

Any pointers on where I should be looking for problems would be helpful, thanks

edit retag flag offensive close merge delete

Comments

This may be a known issue (with perhaps even already a PR). Check the ros/ros_comm/issues tracker.

gvdhoorn gravatar image gvdhoorn  ( 2017-10-04 15:22:30 -0600 )edit

Thanks @gvdhoorn, I found a mention of CLOSE_WAIT sockets, and noticed I had a lot of those, will try the suggested fix and see if that solves my problem

davr gravatar image davr  ( 2017-10-06 13:00:56 -0600 )edit