ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

Merge two or more depth images

asked 2018-11-05 09:42:12 -0600

Hendrik Wiese gravatar image

updated 2018-11-06 06:08:15 -0600

I would like to merge multiple depth images from at least two RGBD cameras. The poses of the cameras are known as well as their intrinsic parameters. So I would assume that it is a task that could be done in realtime when no feature extraction is necessary.

My current idea is to simply morph/project the image of camera B on the image plane of camera A by an affine transformation. But that leaves out the actual depth values. I'd have to transform them as well, wouldn't I?

New idea: I could also turn the depth images into point clouds, merge them and convert them back to a new depth image. I could imagine that there's some way using depth_image_proc but I'm not sure.

Is there already something that could do this transformation for me? Maybe in OpenCV or even in ROS itself, based on TF frames and the camerainfo messages for the involved cameras?

ROS version is Kinetic Kame on Ubuntu 16.04 LTS

edit retag flag offensive close merge delete

3 Answers

Sort by ยป oldest newest most voted

answered 2018-11-06 06:30:15 -0600

updated 2018-11-06 09:06:42 -0600

Strictly speaking what you're trying to do is not possible. If you convert the depth maps to point clouds then you could easily combine them to a super point cloud, but returning to a depth map representation has a few problems.

The problem is that a depth map represents a special sub-set of point clouds, they are both structured and projectable. Structured means that the points are arranged in a 2D grid pattern from the point of view of the sensor, and projectable means that no points occlude any other points.

If your two depth sensors have a parallax baseline between them, then the combined point cloud may loose the projectable property, meaning that it cannot be represented as a depth map without some data loss. The data that would be lost are the parts of the point cloud which are occluded by other points.

However if you're okay with this data loss then converting back to a depth map could be the solution. You'll need to bear in mind that the combined point cloud will no longer be structured, so you'll have to project each point back into camera space to work out which pixel of the new depth map it corresponds to.

Hope this helps.


I don't know of any ROS nodes that can do the whole thing out of the box. You could use the depth_image_proc package to convert the two depth images into point clouds. You could then make your own node that transformed these two clouds into the same coordinate system and created a super-set of the points in them both.

To convert this super-set back into a point cloud you could follow the final answer on this stackoverflow question. The user shows some example code for projecting points back into camera space using a simple pinhole camera model. For this to do what you want you'll need to define 'virtual' camera intrinsic and extrinsic parameters such that its FOV covers both of the cameras you're trying to merge. Then use these parameters to project the points to image space.

In order to deal with the data loss you'll need to use a z-buffering type approach, where you only add a new point into a pixel if it's closer than the value currently there.

Hope this makes sense, I'm guessing this has turned out more complicated than you were expecting!

edit flag offensive delete link more


The two cameras are close together, just at an angle. The idea is to increase the FOV of a single depth sensors by combining it with others of the same type. I'll be okay with data loss if I get back a valid point for each pixel. Are there already nodes/nodelets available that can do this job?

Hendrik Wiese gravatar image Hendrik Wiese  ( 2018-11-06 06:45:46 -0600 )edit

Not that I know of. I've updated my answer with some pointers to make your own though.

PeteBlackerThe3rd gravatar image PeteBlackerThe3rd  ( 2018-11-06 09:07:02 -0600 )edit

At a first glance this totally makes sense, yes. I'll see how I can make use of your pointers. Thank you very much!

Hendrik Wiese gravatar image Hendrik Wiese  ( 2018-11-06 09:18:51 -0600 )edit

That pinhole camera approach works like a charm. I've written two nodelets for this purpose. One that merges two point clouds based on the known relative poses of the cameras and the other rendering a depth image from the new point cloud based on a pinhole camera model. I'll put them on github.

Hendrik Wiese gravatar image Hendrik Wiese  ( 2018-11-09 10:45:18 -0600 )edit

Great news, I'll have a look when you put them up.

PeteBlackerThe3rd gravatar image PeteBlackerThe3rd  ( 2018-11-09 12:32:56 -0600 )edit

Would love the link to this github. Edit: Found it

mmiles19 gravatar image mmiles19  ( 2020-12-07 17:11:51 -0600 )edit

answered 2018-11-05 13:13:29 -0600

harsha336 gravatar image

updated 2022-05-15 07:41:00 -0600

lucasw gravatar image

probably you could use laser assembler for the task with map as the fixed frame.

edit flag offensive delete link more

answered 2022-05-15 08:23:34 -0600

lucasw gravatar image

You could use depth_image_proc register on one rgbd camera to transform it into the frame (and resolution and instrinsics) of the other rgbd camera (register calls the destination CameraInfo rgb/camera_info), or transform both rgbd cameras into a third frame and combine there.

Then there needs to be a nodelet to take two depth images in the same frame and with the same resolution and intrinsics/distortion (and same timestamp or else additional error) and combine each pixel of each source- pass through the closer valid depth value of the two would make sense, or average or something else. I don't think that nodelet exists anywhere open source (but also haven't looked much), but it wouldn't be hard to adapt the code of another nodelet in depth_image_proc to do it (and ideally upstream it into depth_image_proc).

depth_image_proc/register isn't currently doing the right thing with synchronization and timestamps of the two camera_info inputs so the results will be sloppy, be less correct when the cameras or what they are looking at are moving fast relative to frame rate, unless they are time synchronized at acquisition. It needs to use the destination camera_info to drive the output timestamp, and do a 'full' transform where if transforms the input depth image at the time of the input depth image into the destination frame at the destination camera_info timestamp, using a third frame as a fixed reference (e.g. map or odom).

If the two camera frames aren't that close to each other you may need

edit flag offensive delete link more

Question Tools



Asked: 2018-11-05 09:42:12 -0600

Seen: 2,238 times

Last updated: May 15 '22