Robotics StackExchange | Archived questions

publish/receive large PointCloud2 is very slow

Hi everybody,

I'm porting a python node from ROS1 to ROS2, the node publishes some large PointCloud2 messages and collect statistics.

I explain the setup I did:

When I was using it with ROS1 I had that one iteration time stands for less than 70 ms, now with ROS2 I see very high numbers, on average 900ms.

I suspect that the problem is on the transmission of large PointCloud2, but I don't understand why it differs so much from ROS1.

Pointcloud frame are on average 1.9Mb each.

Anyone has an idea?

Thank you very much

P.s. I add some code. dummy node:

#! /usr/bin/env python3

import rclpy
from rclpy.qos import QoSProfile, qos_profile_default, qos_profile_sensor_data
from ros2topic.api import get_msg_class
from sensor_msgs.msg import PointCloud2
from autoware_auto_msgs.msg import BoundingBoxArray


class DummyClass(object):

    def __init__(self, node, in_topic, out_topic):
        self.node = node
        self.publisher = node.create_publisher(
                BoundingBoxArray,
                out_topic,
                qos_profile=QoSProfile(depth=10),
                callback_group=rclpy.callback_groups.ReentrantCallbackGroup())

        self.subscriber = node.create_subscription(
                    PointCloud2,
                    in_topic,
                    self.sub_callback,
                    qos_profile=QoSProfile(depth=10),
                    callback_group=rclpy.callback_groups.ReentrantCallbackGroup()
        )


    def sub_callback(self, msg):
        time_now = self.node.get_clock().now().nanoseconds
        bbox_array = BoundingBoxArray()
        bbox_array.header = msg.header

        time_there = rclpy.time.Time()
        time_there = time_there.from_msg(msg.header.stamp)
        tm = int((time_now - time_there.nanoseconds) // (10**6))
        print("time elapsed %d ms" % tm)

        self.publisher.publish(bbox_array)


def main(args=None):
    # Create a ROS2 node
    rclpy.init(args=args)
    node = rclpy.create_node("dummy_node")

    node.get_logger().info("Dummy_node_started")

    dummy = DummyClass(
            node,
            '/points_nonground',
            '/lidar_bounding_boxes'
    )

    executor = rclpy.executors.MultiThreadedExecutor(num_threads=4)

    rclpy.spin(node, executor)

    node.destroy_node()
    rclpy.shutdown()


if __name__ == '__main__':
    main()

The other node:

#! /usr/bin/env python3

import rclpy
from rclpy.qos import QoSProfile
from sensor_msgs.msg import PointCloud2, PointField
from autoware_auto_msgs.msg import BoundingBoxArray
from numpy.random import seed
from numpy.random import randint


class DummyPubClass(object):

    def __init__(self, node, in_topic, out_topic):
        self.node = node
        self.publisher = node.create_publisher(
            PointCloud2,
            out_topic,
            qos_profile=QoSProfile(depth=1)
        )
        self.subscriber = node.create_subscription(
            BoundingBoxArray,
            in_topic,
            self.sub_callback,
            qos_profile=QoSProfile(depth=1)
        )
        self.clearToSend = True
        self.out_topic = out_topic


    def sub_callback(self, msg):
        time_now = self.node.get_clock().now().nanoseconds

        time_there = rclpy.time.Time()
        time_there = time_there.from_msg(msg.header.stamp)
        tm = int((time_now - time_there.nanoseconds) // (10**6))
        print("time elapsed %d ms" % tm)
        self.clearToSend = True


    def publish(self):
        if self.node.count_subscribers(self.out_topic) > 0:
            if self.clearToSend == True:
                # 1.9 Mb pointcloud
                file_size = 1900000

                p_cloud = PointCloud2()
                p_cloud.header.frame_id    = "base_link"
                p_cloud.height             = 1
                p_cloud.point_step         = 16
                p_cloud.width              = int(file_size / p_cloud.point_step)

                # Data has x,y,z and intensity
                p_cloud.fields             = [
                    PointField(name='x', offset=0,
                                datatype=PointField.FLOAT32, count=1),
                    PointField(name='y', offset=4,
                                datatype=PointField.FLOAT32, count=1),
                    PointField(name='z', offset=8,
                                datatype=PointField.FLOAT32, count=1),
                    PointField(name='intensity', offset=12,
                                datatype=PointField.FLOAT32, count=1)
                ]

                p_cloud.is_bigendian      = False
                p_cloud.row_step          = p_cloud.width * p_cloud.point_step
                p_cloud.is_dense          = False

                # Copy the content of the file into the message data
                p_cloud.data              = randint(0, 255, file_size).tolist()
                p_cloud.header.stamp      = self.node.get_clock().now().to_msg()

                self.clearToSend = False
                self.publisher.publish(p_cloud)


def main(args=None):
    # Create a ROS2 node
    rclpy.init(args=args)
    node = rclpy.create_node("dummy_pub")

    seed(1)

    node.get_logger().info("Dummy_pub_started")

    dummy = DummyPubClass(
            node,
            '/lidar_bounding_boxes',
            '/points_nonground'
    )

    # Loop rate at 1000 Hz
    loop_rate_s = float(1.0/1000)
    timer = node.create_timer(
        loop_rate_s,
        dummy.publish
    )

    rclpy.spin(node)

    node.destroy_timer(timer)
    node.destroy_node()
    rclpy.shutdown()


if __name__ == '__main__':
    main()

Asked by luca.fancellu on 2020-10-06 10:20:07 UTC

Comments

Without sharing your code we can't tell you if the high latency has something to do with how you implemented it.

Asked by Dirk Thomas on 2020-10-06 10:25:28 UTC

@Dirk Thomas, I've updated the body adding some code

Asked by luca.fancellu on 2020-10-06 10:51:05 UTC

It is difficult to reproduce without also having working code for the publisher side of the point cloud. If you can't share that I would suggest trying to implement DummyClass in C++ to see it that is any better.

Asked by Dirk Thomas on 2020-10-06 11:09:07 UTC

Thanks @Dirk Thomas, well the code from the publisher side basically looks for any file inside a folder and for each file it reads it, fill a PointCloud2 message and send it. The next message is sent after something is received from sub_output_topic subscriber (I set a flag). I will try to implement the dummy node in C++ to see if there is any difference.

Asked by luca.fancellu on 2020-10-06 11:20:37 UTC

I've added an example of a publisher node that reproduces the issue I'm encountering

Asked by luca.fancellu on 2020-10-07 04:30:07 UTC

@Dirk Thomas I've implemented the dummy_node also in C++, the result is the same. I don't know it it's worth to try to implement the sender node int C++ also

Asked by luca.fancellu on 2020-10-07 06:39:56 UTC

For Python, we did a quick analysis here: https://github.com/cyberbotics/webots_ros2/pull/160#issuecomment-707204002 and it seems that the bottleneck is data setter. It performs conversion from bytes object to array object which takes a big chunk of the time. I am not sure what is the best way to handle this.

Asked by lukicdarkoo on 2020-10-13 02:59:29 UTC

@lukicdarkoo thanks for your comment, very good analysis, did you found also a good setup for the dds layer?

Asked by luca.fancellu on 2020-10-13 03:07:32 UTC

No, but it seems that the default configuration handles like 60-70 messages, of 1MB in size, per second (when messages are prepared before sending).

Asked by lukicdarkoo on 2020-10-13 03:39:13 UTC

For large data, you should change udp setup ex: sudo sysctl -w net.core.rmem_max="20971520" sudo sysctl -w net.core.wmem_max="20971520" sudo sysctl -w net.core.rmem_default="20971520" sudo sysctl -w net.core.wmem_default="20971520"

Asked by sdyy on 2020-10-26 03:50:46 UTC

Answers