# Weird increasing delay in rosbridge despite queue_size=1 and buff_size=2**24

Hi everyone,

I've got a very weird problem here. I have a topic published on at around 50hz. On the other end there's a subscriber. The callback of the subscriber is a relatively long running process, slower than those 50hz.

That subscriber is initialized with queue_size=1 and buff_size=2**24. That way I want to make sure, messages come in as fast as possible and if there's currently one message processed, further messages get dropped. That's the desired behavior. I don't need all messages, but I need them fast.

When I leave queue_size=1 and buff_size=2**24 out, I see an increasing delay when I take a look at the age of the messages processed (now - header.stamp).

queue_size=1 and buff_size=2**24 work when I use rospy directly. However, it does not with rosbridge. I have already changed rosbridge in a way that the actual rospy.Subscriber is created in the exact same way as descrbed above, including queue_size=1 and buff_size=2**24. it does not help. I still see the increasing delay.

Where do I have to look to make sure I don't have that delay when using rosbridge?

Thanks

# Update 1

Here's a code example with which I can reproduce the problem. I'm just not sure if it's caused by roslibpy or by rosbridge. Maybe someone with a little spare time could help me investigate this behavior. Or tell me that they cannot reproduce it and that I'm the only one seeing this behavior. Or that I'm doing something inherently wrong.

import roslibpy
import time
import json
import logging
logger = logging.getLogger("roslibpy")
handler = logging.StreamHandler()

def to_epoch(stamp):
stamp_secs = stamp["secs"]
stamp_nsecs = stamp["nsecs"]
return stamp_secs + stamp_nsecs*1e-9

def from_epoch(stamp):
stamp_secs = int(stamp)
stamp_nsecs = (stamp - stamp_secs) * 1e9
return {"secs": stamp_secs, "nsecs": stamp_nsecs}

def sub(msg):
# The age of the message should remain constant, but it increases seemingly indef.
age = time.time() - to_epoch(msg["stamp"])
print("Age of message: %6.3f seconds" % age)
time.sleep(1./5.)

ros = roslibpy.Ros("ws://localhost:9090")
ros.run()

tt = roslibpy.Topic(ros, "/test", "std_msgs/Header", queue_size=1)

ts = roslibpy.Topic(ros, "/test", "std_msgs/Header", queue_length=1)
ts.subscribe(sub)

try:
seq = 0
while True:
seq += 1
header = {"frame_id": "test", "stamp": from_epoch(time.time()), "seq": seq}
except KeyboardInterrupt:
ros.terminate()


# Update 2

Three things that I've already ruled out:

1. in rosbridge_library subscribers.py setting rospy.Subscriber(..., queue_size = 1, ...) has no effect
2. in rosbridge_library subscribers.py setting rospy.Subscriber(..., buff_size = 2**24, ...) has no effect
3. in rosbridge_server rosbridge_websocket.py setting listenWS(..., backlog = 1, ...) or backlog = 0 has no effect

# Update 3

I've now tested rosbridge_server/rosbridge_udp. It shows the same behavior! So it is not related to websockets.

# Update 4

As suggested below I've added a sleep to the send loop. It has no effect on my side.

import roslibpy
import time
import json
import ...
edit retag close merge delete

Possible you have buffering not related to ROS, like in a network switch or card. Is data through rosbridge coming through network? I've had more than one issue where images get backed up with frames 10 - 15 seconds old coming appearing, related to location of the image source in the house. Issue only occurs in wifi dead zones, and when coming back in range images from long ago will show up, and not always in the order you would expect.

So while ROS be properly handling the queue = 1, your HW may have other ideas.

( 2019-10-09 01:23:36 -0500 )edit

Cannot rule that out, but the sources are all on the same system, so everything runs via loopback. I've even throttled the topic to 1hz and still the age of the messages keeps increasing at the very same rate, which is exceptionally weird.

It only happens when I use rosbridge_suite in conjunction with roslibpy. If it also happens when I use roslibjs I'll have to test... rospy works fine. I'm just refraining from using rospy because my code has hard python 3 dependencies and rospy appears to be incompatible with python 3, which is a real shame.

I'll soon add a small test script that you might be willing to run in order to verify the behavior.

( 2019-10-09 03:15:45 -0500 )edit

Sort by » oldest newest most voted

So I made a change and got your code to work on my machine. Initially got the same results as you.

Remove time.sleep(1./5.) from the callback and add time.sleep(0.01) to the while true loop.

Perhaps the main needs to give some time for the callbacks to function.

#!/usr/bin/python
import roslibpy
import time
import json
import logging
logger = logging.getLogger("roslibpy")
handler = logging.StreamHandler()

def to_epoch(stamp):
stamp_secs = stamp["secs"]
stamp_nsecs = stamp["nsecs"]
return stamp_secs + (stamp_nsecs*1e-9)

def from_epoch(stamp):
stamp_secs = int(stamp)
stamp_nsecs = (stamp - stamp_secs) * 1e9
return {"secs": stamp_secs, "nsecs": stamp_nsecs}

def sub(msg):
# The age of the message should remain constant, but it increases seemingly indefinetly
age = time.time() - to_epoch(msg["stamp"])
print("Age of message: %6.3f seconds" % age)
#   time.sleep(1./5.)

ros = roslibpy.Ros("ws://localhost:9090")
ros.run()

tt = roslibpy.Topic(ros, "/test", "std_msgs/Header", queue_size=1)

ts = roslibpy.Topic(ros, "/test", "std_msgs/Header", queue_length=1)
ts.subscribe(sub)

try:
seq = 0
while True:
seq += 1
header = {"frame_id": "test", "stamp": from_epoch(time.time()), "seq": seq}
time.sleep(0.01)
except KeyboardInterrupt:
ros.terminate()

Age of message:  0.003 seconds
Age of message:  0.002 seconds
Age of message:  0.003 seconds
Age of message:  0.002 seconds
Age of message:  0.002 seconds
Age of message:  0.003 seconds
Age of message:  0.001 seconds
Age of message:  0.001 seconds
Age of message:  0.003 seconds
Age of message:  0.002 seconds
Age of message:  0.002 seconds

more

Thanks for your effort but the script that I gave was only a minimal working example to reproduce the actual problem. I don't have a time.sleep in my actual code that I could remove. The callback (or rather the process with the callback) of my code just is slow enough to cause this behavior. I can't do much about that fact. I need to find the actual cause, the very root of the problem, and get a fix for it. Or I need to know that this is expected behavior (which it can't be because rospy alone, without rosbridge in between, does not behave like this if I set the queue_sizes accordingly)

( 2019-10-10 04:52:09 -0500 )edit

But it's good to know that I'm not alone and that you can reproduce the behavior

( 2019-10-10 04:53:01 -0500 )edit

Removing the sleep from the callback didn't fix it for me. It was adding the sleep to the while true loop that fixed it. Have you tried that yet?

( 2019-10-10 18:15:37 -0500 )edit

Yes, thanks. Unfortunately, it has no effect.

( 2019-10-11 03:57:49 -0500 )edit