publish/receive large PointCloud2 is very slow

asked 2020-10-06 10:20:07 -0500

luca.fancellu gravatar image

updated 2020-10-07 04:28:52 -0500

Hi everybody,

I'm porting a python node from ROS1 to ROS2, the node publishes some large PointCloud2 messages and collect statistics.

I explain the setup I did:

  • Two python node

  • One node is a dummy python node with one publisher and one subscriber, it listen to an input topic and when it receives something, it publishes on an output topic a dummy message copying the received header into it. In the subscriber callback I print the time elapsed between the time of the reception and the sending time. (The sending time is retrieved by the message header that is filled by the sending node with the sending time)

  • The second node is a node that publishes a large PointCloud2 to the input topic of the node above, it reads a binary file (from kitti benchmark dataset), put it on a PointCloud2 message, setup the header with the current time and send it. It also listens to the output topic of the dummy node collecting information about the time, simply by doing the difference between the actual time and the time in the header, having so the amount of time between publishing and receiving back a response, we will call it iteration time from now on.

When I was using it with ROS1 I had that one iteration time stands for less than 70 ms, now with ROS2 I see very high numbers, on average 900ms.

I suspect that the problem is on the transmission of large PointCloud2, but I don't understand why it differs so much from ROS1.

Pointcloud frame are on average 1.9Mb each.

Anyone has an idea?

Thank you very much

P.s. I add some code. dummy node:

#! /usr/bin/env python3

import rclpy
from rclpy.qos import QoSProfile, qos_profile_default, qos_profile_sensor_data
from ros2topic.api import get_msg_class
from sensor_msgs.msg import PointCloud2
from autoware_auto_msgs.msg import BoundingBoxArray

class DummyClass(object):

    def __init__(self, node, in_topic, out_topic):
        self.node = node
        self.publisher = node.create_publisher(

        self.subscriber = node.create_subscription(

    def sub_callback(self, msg):
        time_now = self.node.get_clock().now().nanoseconds
        bbox_array = BoundingBoxArray()
        bbox_array.header = msg.header

        time_there = rclpy.time.Time()
        time_there = time_there.from_msg(msg.header.stamp)
        tm = int((time_now - time_there.nanoseconds) // (10**6))
        print("time elapsed %d ms" % tm)


def main(args=None):
    # Create a ROS2 node
    node = rclpy.create_node("dummy_node")


    dummy = DummyClass(

    executor = rclpy.executors.MultiThreadedExecutor(num_threads=4)

    rclpy.spin(node, executor)


if __name__ == '__main__':

The other node:

#! /usr/bin/env python3

import rclpy
from rclpy.qos import QoSProfile
from sensor_msgs.msg import PointCloud2, PointField
from autoware_auto_msgs.msg import BoundingBoxArray
from numpy.random import seed
from numpy.random import randint

class DummyPubClass(object):

    def __init__(self, node, in_topic, out_topic):
        self.node = node
        self.publisher = node.create_publisher(
        self.subscriber = node ...
edit retag flag offensive close merge delete


Without sharing your code we can't tell you if the high latency has something to do with how you implemented it.

Dirk Thomas gravatar image Dirk Thomas  ( 2020-10-06 10:25:28 -0500 )edit

@Dirk Thomas, I've updated the body adding some code

luca.fancellu gravatar image luca.fancellu  ( 2020-10-06 10:51:05 -0500 )edit

It is difficult to reproduce without also having working code for the publisher side of the point cloud. If you can't share that I would suggest trying to implement DummyClass in C++ to see it that is any better.

Dirk Thomas gravatar image Dirk Thomas  ( 2020-10-06 11:09:07 -0500 )edit

Thanks @Dirk Thomas, well the code from the publisher side basically looks for any file inside a folder and for each file it reads it, fill a PointCloud2 message and send it. The next message is sent after something is received from sub_output_topic subscriber (I set a flag). I will try to implement the dummy node in C++ to see if there is any difference.

luca.fancellu gravatar image luca.fancellu  ( 2020-10-06 11:20:37 -0500 )edit

I've added an example of a publisher node that reproduces the issue I'm encountering

luca.fancellu gravatar image luca.fancellu  ( 2020-10-07 04:30:07 -0500 )edit

@Dirk Thomas I've implemented the dummy_node also in C++, the result is the same. I don't know it it's worth to try to implement the sender node int C++ also

luca.fancellu gravatar image luca.fancellu  ( 2020-10-07 06:39:56 -0500 )edit

For Python, we did a quick analysis here: and it seems that the bottleneck is data setter. It performs conversion from bytes object to array object which takes a big chunk of the time. I am not sure what is the best way to handle this.

lukicdarkoo gravatar image lukicdarkoo  ( 2020-10-13 02:59:29 -0500 )edit

@lukicdarkoo thanks for your comment, very good analysis, did you found also a good setup for the dds layer?

luca.fancellu gravatar image luca.fancellu  ( 2020-10-13 03:07:32 -0500 )edit