publish/receive large PointCloud2 is very slow
Hi everybody,
I'm porting a python node from ROS1 to ROS2, the node publishes some large PointCloud2 messages and collect statistics.
I explain the setup I did:
Two python node
One node is a dummy python node with one publisher and one subscriber, it listen to an input topic and when it receives something, it publishes on an output topic a dummy message copying the received header into it. In the subscriber callback I print the time elapsed between the time of the reception and the sending time. (The sending time is retrieved by the message header that is filled by the sending node with the sending time)
The second node is a node that publishes a large PointCloud2 to the input topic of the node above, it reads a binary file (from kitti benchmark dataset), put it on a PointCloud2 message, setup the header with the current time and send it. It also listens to the output topic of the dummy node collecting information about the time, simply by doing the difference between the actual time and the time in the header, having so the amount of time between publishing and receiving back a response, we will call it iteration time from now on.
When I was using it with ROS1 I had that one iteration time stands for less than 70 ms, now with ROS2 I see very high numbers, on average 900ms.
I suspect that the problem is on the transmission of large PointCloud2, but I don't understand why it differs so much from ROS1.
Pointcloud frame are on average 1.9Mb each.
Anyone has an idea?
Thank you very much
P.s. I add some code. dummy node:
#! /usr/bin/env python3
import rclpy
from rclpy.qos import QoSProfile, qos_profile_default, qos_profile_sensor_data
from ros2topic.api import get_msg_class
from sensor_msgs.msg import PointCloud2
from autoware_auto_msgs.msg import BoundingBoxArray
class DummyClass(object):
def __init__(self, node, in_topic, out_topic):
self.node = node
self.publisher = node.create_publisher(
BoundingBoxArray,
out_topic,
qos_profile=QoSProfile(depth=10),
callback_group=rclpy.callback_groups.ReentrantCallbackGroup())
self.subscriber = node.create_subscription(
PointCloud2,
in_topic,
self.sub_callback,
qos_profile=QoSProfile(depth=10),
callback_group=rclpy.callback_groups.ReentrantCallbackGroup()
)
def sub_callback(self, msg):
time_now = self.node.get_clock().now().nanoseconds
bbox_array = BoundingBoxArray()
bbox_array.header = msg.header
time_there = rclpy.time.Time()
time_there = time_there.from_msg(msg.header.stamp)
tm = int((time_now - time_there.nanoseconds) // (10**6))
print("time elapsed %d ms" % tm)
self.publisher.publish(bbox_array)
def main(args=None):
# Create a ROS2 node
rclpy.init(args=args)
node = rclpy.create_node("dummy_node")
node.get_logger().info("Dummy_node_started")
dummy = DummyClass(
node,
'/points_nonground',
'/lidar_bounding_boxes'
)
executor = rclpy.executors.MultiThreadedExecutor(num_threads=4)
rclpy.spin(node, executor)
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
The other node:
#! /usr/bin/env python3
import rclpy
from rclpy.qos import QoSProfile
from sensor_msgs.msg import PointCloud2, PointField
from autoware_auto_msgs.msg import BoundingBoxArray
from numpy.random import seed
from numpy.random import randint
class DummyPubClass(object):
def __init__(self, node, in_topic, out_topic):
self.node = node
self.publisher = node.create_publisher(
PointCloud2,
out_topic,
qos_profile=QoSProfile(depth=1)
)
self.subscriber = node ...
Without sharing your code we can't tell you if the high latency has something to do with how you implemented it.
@Dirk Thomas, I've updated the body adding some code
It is difficult to reproduce without also having working code for the publisher side of the point cloud. If you can't share that I would suggest trying to implement
DummyClass
in C++ to see it that is any better.Thanks @Dirk Thomas, well the code from the publisher side basically looks for any file inside a folder and for each file it reads it, fill a PointCloud2 message and send it. The next message is sent after something is received from sub_output_topic subscriber (I set a flag). I will try to implement the dummy node in C++ to see if there is any difference.
I've added an example of a publisher node that reproduces the issue I'm encountering
@Dirk Thomas I've implemented the dummy_node also in C++, the result is the same. I don't know it it's worth to try to implement the sender node int C++ also
For Python, we did a quick analysis here: https://github.com/cyberbotics/webots... and it seems that the bottleneck is
data
setter. It performs conversion frombytes
object toarray
object which takes a big chunk of the time. I am not sure what is the best way to handle this.@lukicdarkoo thanks for your comment, very good analysis, did you found also a good setup for the dds layer?