Ask Your Question

is transforming (tf) a pointcloud faster than transforming the points one by one

asked 2013-01-17 11:08:31 -0500

brice rebsamen gravatar image

updated 2014-01-28 17:14:53 -0500

ngrennan gravatar image

I have about 600 points with the same time stamp and frame id. I am transforming them all one by one by multiplying the transform pre-obtained by lookupTransform with the point:

t = lookupTransform()
for ( point in points )
  btVector3 p = point
  btVector3 q = t*p
  point = q

This is taking quite some time, which is a problem because these are points from the velodyne so I have to do it again and again for millions of them.

I am wondering if there would be an advantage in throwing them all in a point cloud and transforming the whole cloud at once.

I saw that the tf::TransformListener::transfromPointCloud function is using boost::numeric::ublas in the background, which construct one big matrix with all the points and multiplies it with the transform. see

I also saw that joq's code for his velodyne driver is using pcl_ros to transform the cloud, which ends up calling pcl::transformPointCloud that can be seen here:

At the end, you can't beat the fact that you have to multiply each point by the transform matrix, so it's only a question of computational efficiency... Which would be more efficient then: pcl_ros or ublas, and why?

Of course I will know if I experiment all of those solutions, but I am hoping that someone will save me that pain...

Also I have been thinking about GPUs these days. Is there a GPU implementation of this? Would that make sense? I guess it would for large enough point clouds...

edit retag flag offensive close merge delete

3 Answers

Sort by ยป oldest newest most voted

answered 2013-01-21 05:39:21 -0500

brice rebsamen gravatar image

updated 2013-03-04 05:33:59 -0500

So I gave it a go and used tf::TransformListener::transfromPointCloud and it turned out to be much much faster (CPU usage went from hogging a core to 10%). I haven't tried pcl::transformPointCloud yet, but I guess I will, because that would be interesting to know which one is faster.

I am a bit puzzled though. I am copying my points to a point cloud, then they get copied to a ublas matrix. This matrix is then multiplied by the transform. Then points are copied back to a point cloud and back to my data format. So first of all there are 2 copy operations (but in my own implementation I also had to copy to a btVector3 back and forth ...). Then the number of mathematical operations is still the same, isn't it: the 600 points still have to be multiplied by the transform. So either ublas is exceptionally well optimized, or my implementation was really crappy.

EDIT: Since then, I have switched from diamondback to fuerte. I noticed that the implementation of tf::TransformListener::transfromPointCloud has changed, starting from electric. It is now transforming all the points one by one. See the following API links for a comparison:

I haven't thoroughly timed them, but it seems that it's still doing a good job at it.

edit flag offensive delete link more



There are all kinds of crazy optimizations that can be done in serious numerical libraries. No sane programmer would do them in a simple loop.

joq gravatar image joq  ( 2013-01-22 11:38:05 -0500 )edit

Have you checked the performance of tf::TransformListener::transfromPointCloud versus pcl::transformPointCloud? I assume they are about equal, but would be interested in the results nevertheless :-)

Philip gravatar image Philip  ( 2013-03-04 03:15:34 -0500 )edit

no I haven't tried

brice rebsamen gravatar image brice rebsamen  ( 2013-03-04 05:34:28 -0500 )edit

answered 2013-01-18 13:50:38 -0500

joq gravatar image

The only way to be sure is to code it up and measure the time.

Since it seems likely, that would probably be worth the effort.

edit flag offensive delete link more

answered 2013-01-21 09:40:27 -0500

Mani gravatar image

Also I have been thinking about GPUs these days. Is there a GPU implementation of this? Would that make sense? I guess it would for large enough point clouds...

OpenCV 2.4.3 has a set of GPU accelerated matrix operation API. I am not sure if the one that is shipped with ROS is built with GPU support though.

edit flag offensive delete link more


last time I checked (about 1 year ago), it didn't. And providing GPU support was a headache. see

brice rebsamen gravatar image brice rebsamen  ( 2013-01-21 10:45:52 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools



Asked: 2013-01-17 11:08:31 -0500

Seen: 2,744 times

Last updated: Mar 04 '13