Robotics StackExchange | Archived questions

message is published delayed (inconsistent)

I have a problem where pub.publish(...) gets called, but the actual publishing is not done directly. This delay in publishing happens occasionally, so it looks like a race condition. Also, when I add a delay after publishing it works correctly, no problems at all!

def vacuumActivity():
   print("-before publishing-")
   pub.publish("SETVACUUM0")
   #time.sleep(1)
   print("-after publishing-")

   pose_goal = GetPoseGoal()
   group.go(pose_goal, wait=True)
   group.stop()

   pub.publish("MOVEMENT_EXECUTED")
   return

in this code pub.publish("SETVACUUM0") will occasionally be published delayed. When this happens it will be published just before pub.publish("MOVEMENT_EXECUTED"), so the next time pub.publish is called.

When I uncomment time.sleep(1) there are no issues and everything is working as expected.

I am publishing to a tcpclientnode, which is connected to a TCP/IP socket and handles messages to/from a PLC over TCP/IP. (I can add the code of that node if necessary, but I'm try to keep this post clean). My function 'vacuumActivity()' is part of 'robotACtivityExecuter()' which is called from my main:

if __name__ == "__main__":
   rospy.init_node('activity_executer')
   rate = rospy.rate(10)
   rospy.on_shutdown(shutdownHook)
   rospy.Subscriber("tcp_received", String, tcp_received_callback)
   pub = rospy.Publisher("tcp_to_send", String, queue_size=10")

   while not rospy.is_shutdown():
      try:
          robotActivityExecuter()
          rate.sleep()
       except rospy.ROSInterruptException:
          pass

This is not a problem with starting nodes, because this behaviour is not during startup, and it happens inconsistent. This is not a problem with the queue_size; all messages are being published, but sometimes it's not directly.

Anyone who can help me how to get this to work correctly without a delay?

Asked by crnewton on 2021-03-08 08:35:01 UTC

Comments

First: you're using Python, which is not known for its determinism. From your mention of "PLC" and the contents of the string ("SETVACUUM0") this seems to be a (traditional) automation context. Perhaps a more direct / less elaborate setup instead of using a separate node to proxy a TCP socket to a PLC via ROS topics and rospy would be worth investigating.

Having written that: it might be you're seeing the effects of Nagle here. See rospy: Publisher Initialisation on how to disable that (tcp_nodelay).

No guarantees though as you have a lot of moving parts (ie: intemediate steps and conversions), so there might be other causes.


Edit: re-reading your question: wouldn't a service-based design be better here? SETVACUUM0 sounds like an action which needs to be initiated, and you have strict time-ordering requirements on actions. That's not something pub-sub was created for.

Asked by gvdhoorn on 2021-03-09 04:22:12 UTC

Valid feedback, thank you! I 'more direct' setup is not desired, PLC is already deployed otherwise I'd prefer modbus. (or just add IO to my pc).I looked into services but I don't want to wait untill the tcp/ip call is finished, maybe I will setup a service to test the performance. Nagle sounds like a possible cause, I'll look into it. relevent nagle post

Asked by crnewton on 2021-03-09 09:36:31 UTC

If the PLC has modbus support, couldn't you use one of the available modbus packages, or even integrate pymodbus directly into your node?

Asked by gvdhoorn on 2021-03-09 09:39:18 UTC

Yes that would be possible, but the PLC has other tcp/ip clients. we want the robot also to be a tcp/ip client so there is one general protocol for all the clients connected to the PLC.

Asked by crnewton on 2021-03-09 09:53:09 UTC

If you're happy I'm happy, but it seems strange to me to implement, debug and maintain your own protocol when you can actually use a standardised and well supported fieldbus instead.

In my own applications I always try to go for the least amount of moving parts, as they all introduce additional points of potential failure, increase maintenance overhead and attack surface.

Asked by gvdhoorn on 2021-03-09 10:16:18 UTC

Answers