ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | Q&A answers.ros.org

# is transforming (tf) a pointcloud faster than transforming the points one by one

I have about 600 points with the same time stamp and frame id. I am transforming them all one by one by multiplying the transform pre-obtained by lookupTransform with the point:

t = lookupTransform()
for ( point in points )
btVector3 p = point
btVector3 q = t*p
point = q


This is taking quite some time, which is a problem because these are points from the velodyne so I have to do it again and again for millions of them.

I am wondering if there would be an advantage in throwing them all in a point cloud and transforming the whole cloud at once.

I saw that the tf::TransformListener::transfromPointCloud function is using boost::numeric::ublas in the background, which construct one big matrix with all the points and multiplies it with the transform. see http://www.ros.org/doc/api/tf/html/c++/transform__listener_8cpp_source.html#l00288

I also saw that joq's code for his velodyne driver is using pcl_ros to transform the cloud, which ends up calling pcl::transformPointCloud that can be seen here: http://docs.pointclouds.org/trunk/transforms_8hpp_source.html#l00042

At the end, you can't beat the fact that you have to multiply each point by the transform matrix, so it's only a question of computational efficiency... Which would be more efficient then: pcl_ros or ublas, and why?

Of course I will know if I experiment all of those solutions, but I am hoping that someone will save me that pain...

Also I have been thinking about GPUs these days. Is there a GPU implementation of this? Would that make sense? I guess it would for large enough point clouds...

edit retag close merge delete

Sort by » oldest newest most voted

So I gave it a go and used tf::TransformListener::transfromPointCloud and it turned out to be much much faster (CPU usage went from hogging a core to 10%). I haven't tried pcl::transformPointCloud yet, but I guess I will, because that would be interesting to know which one is faster.

I am a bit puzzled though. I am copying my points to a point cloud, then they get copied to a ublas matrix. This matrix is then multiplied by the transform. Then points are copied back to a point cloud and back to my data format. So first of all there are 2 copy operations (but in my own implementation I also had to copy to a btVector3 back and forth ...). Then the number of mathematical operations is still the same, isn't it: the 600 points still have to be multiplied by the transform. So either ublas is exceptionally well optimized, or my implementation was really crappy.

EDIT: Since then, I have switched from diamondback to fuerte. I noticed that the implementation of tf::TransformListener::transfromPointCloud has changed, starting from electric. It is now transforming all the points one by one. See the following API links for a comparison:

I haven't thoroughly timed them, but it seems that it's still doing a good job at it.

more

2

There are all kinds of crazy optimizations that can be done in serious numerical libraries. No sane programmer would do them in a simple loop.

( 2013-01-22 11:38:05 -0600 )edit
1

Have you checked the performance of tf::TransformListener::transfromPointCloud versus pcl::transformPointCloud? I assume they are about equal, but would be interested in the results nevertheless :-)

( 2013-03-04 03:15:34 -0600 )edit

no I haven't tried

( 2013-03-04 05:34:28 -0600 )edit

The only way to be sure is to code it up and measure the time.

Since it seems likely, that would probably be worth the effort.

more

Also I have been thinking about GPUs these days. Is there a GPU implementation of this? Would that make sense? I guess it would for large enough point clouds...

OpenCV 2.4.3 has a set of GPU accelerated matrix operation API. I am not sure if the one that is shipped with ROS is built with GPU support though.

more

( 2013-01-21 10:45:52 -0600 )edit