message is published delayed (inconsistent)

asked 2021-03-08 07:35:01 -0500

crnewton gravatar image

updated 2021-03-09 02:59:44 -0500

I have a problem where pub.publish(...) gets called, but the actual publishing is not done directly. This delay in publishing happens occasionally, so it looks like a race condition. Also, when I add a delay after publishing it works correctly, no problems at all!

def vacuumActivity():
   print("-before publishing-")
   print("-after publishing-")

   pose_goal = GetPoseGoal()
   group.go(pose_goal, wait=True)


in this code pub.publish("SETVACUUM0") will occasionally be published delayed. When this happens it will be published just before pub.publish("MOVEMENT_EXECUTED"), so the next time pub.publish is called.

When I uncomment time.sleep(1) there are no issues and everything is working as expected.

I am publishing to a tcp_client_node, which is connected to a TCP/IP socket and handles messages to/from a PLC over TCP/IP. (I can add the code of that node if necessary, but I'm try to keep this post clean). My function 'vacuumActivity()' is part of 'robotACtivityExecuter()' which is called from my main:

if __name__ == "__main__":
   rate = rospy.rate(10)
   rospy.Subscriber("tcp_received", String, tcp_received_callback)
   pub = rospy.Publisher("tcp_to_send", String, queue_size=10")

   while not rospy.is_shutdown():
       except rospy.ROSInterruptException:

This is not a problem with starting nodes, because this behaviour is not during startup, and it happens inconsistent. This is not a problem with the queue_size; all messages are being published, but sometimes it's not directly.

Anyone who can help me how to get this to work correctly without a delay?

edit retag flag offensive close merge delete


First: you're using Python, which is not known for its determinism. From your mention of "PLC" and the contents of the string ("SETVACUUM0") this seems to be a (traditional) automation context. Perhaps a more direct / less elaborate setup instead of using a separate node to proxy a TCP socket to a PLC via ROS topics and rospy would be worth investigating.

Having written that: it might be you're seeing the effects of Nagle here. See rospy: Publisher Initialisation on how to disable that (tcp_nodelay).

No guarantees though as you have a lot of moving parts (ie: intemediate steps and conversions), so there might be other causes.

Edit: re-reading your question: wouldn't a service-based design be better here? SETVACUUM0 sounds like an action which needs to be initiated, and you have strict time-ordering requirements on actions. That's not something pub-sub was created for.

gvdhoorn gravatar image gvdhoorn  ( 2021-03-09 03:22:12 -0500 )edit

Valid feedback, thank you! I 'more direct' setup is not desired, PLC is already deployed otherwise I'd prefer modbus. (or just add IO to my pc).I looked into services but I don't want to wait untill the tcp/ip call is finished, maybe I will setup a service to test the performance. Nagle sounds like a possible cause, I'll look into it. relevent nagle post

crnewton gravatar image crnewton  ( 2021-03-09 08:36:31 -0500 )edit

If the PLC has modbus support, couldn't you use one of the available modbus packages, or even integrate pymodbus directly into your node?

gvdhoorn gravatar image gvdhoorn  ( 2021-03-09 08:39:18 -0500 )edit

Yes that would be possible, but the PLC has other tcp/ip clients. we want the robot also to be a tcp/ip client so there is one general protocol for all the clients connected to the PLC.

crnewton gravatar image crnewton  ( 2021-03-09 08:53:09 -0500 )edit

If you're happy I'm happy, but it seems strange to me to implement, debug and maintain your own protocol when you can actually use a standardised and well supported fieldbus instead.

In my own applications I always try to go for the least amount of moving parts, as they all introduce additional points of potential failure, increase maintenance overhead and attack surface.

gvdhoorn gravatar image gvdhoorn  ( 2021-03-09 09:16:18 -0500 )edit