Ask Your Question

Optimal way to gather many transforms

asked 2021-05-19 07:23:25 -0500

xsol-taylor gravatar image

I need to collect transforms at a certain interval from time N to N + S, to use them in some vectorised calculations.

ind = 0
for i in range(start, end, (end - start) / steps):
  listener.waitForTransform(targetFrame, sourceFrame, rospy.rostime.Time(0, i), timeout)
  translation, rotation = listener.lookupTransform(targetFrame, sourceFrame, rospy.rostime.Time(0, i))
  transforms[ind] = listener.fromTranslationRotation(translation, rotation)
  ind += 1

I am worried however about the efficiency of this operation - it takes quite a long time, I assume due to the way I am waiting for the transforms. For arguments sake, the lookup is started well after the TF listener is created, so the TF buffer has a complete picture of the transforms in the time I am asking for.

Is this the most efficient method to look up a large number transforms in a known interval? I've scoured through ROS TF and can't find anything that will allow me to return a batch of TF's, only one at a time. Thanks.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2021-05-21 19:24:15 -0500

tfoote gravatar image

You say it's taking "quite a long time" but you don't provide any numbers. So it's hard to help you too much. The query times for the lookupitself are on the order of microseconds. I ran the performance tests recently for this question Querying for each transform is relatively low overhead. There's potentially a slight optimization if you were to build up every single transform in the tree while walking it. However it would only be a marginal benefit worth doing for a very large tree. But I don't think that's where you should worry about optimizing.

You're right that using waitForTransform within a loop is something that I would recommend against. You should be able to waitForTransform for the most recent one and then just query for the previous values assuming data is being generated in order.

Also, I don't know what your resolution of querying is. But if you're querying for a transform within a very small window (aka at very high resolution tf is doing interpolation for any of those intermediate points. You could likely do the same at a higher level and reduce the number of queries.

edit flag offensive delete link more


Sure I'll provide a little more context - we're transforming a pointcloud, 65k points, with a sensor that is giving us odometry data at 200hz. Each point is stamped with the time in nanoseconds, obviously transforming these with a for loop on the CPU will take way too long in real time. So we are trying to accumulate the transform matrices and pass them off to a CUDA processor which will very quickly do that transform. But currently the bottleneck is actually retrieving the transform data.

As you mentioned I've also done a waitfortransform on the newest point in the dataset and just let the loop run after that - It's a little faster but not the order of magnitudes needed here.

xsol-taylor gravatar image xsol-taylor  ( 2021-05-30 21:41:01 -0500 )edit

It sounds like you're trying to compute the transform for each point in your point cloud down to the exact nanosecond and then load it into an array for each point? That's going to generate a larger datastructure than the pointcloud (the 6 dof transforms are bigger than x, y, z points) which is going to bog down your system. And if you only have data coming in at 200Hz querying the transform data down to the nanosecond doesn't add information. The transform lookup is doing interpolation to give results between the exact samples. To that end you can do the interpolation in your CUDA code and make it much more accurate. Assuming your characteristic frequency of your system is shorter than the span of your point cloud, the standard approach is to sample the first and last timestamp and use linear interpolation in time to transform ...(more)

tfoote gravatar image tfoote  ( 2021-06-01 20:03:42 -0500 )edit

Thanks - agreed, this is the approach we are going with, sampling the 200hz for the pointcloud, using CUDA to do the in between interpolation and then passing that to the matrix multiplier. I was concerned that doing so many small copies would take up too much time still but I was pleasantly surprised to see that memcpy'ing the data, without the lookup wait, is still quite fast.

xsol-taylor gravatar image xsol-taylor  ( 2021-06-01 20:49:12 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools



Asked: 2021-05-19 07:23:25 -0500

Seen: 128 times

Last updated: May 21 '21