 I am looking for an answer for hours and finally I am asking it here.

I am using CMvision to find a specific color in Kinect's sight and I want to find the real world coordinates of the object with that color. I am planning to use CMvision to find the frame coordinates (as X and Y pixel values on the picture) and use these coordinates and the depth value of that pixel to calculate the real world coordinates.

As I understand, /camera/depth_registered/points topic already gives the real world coordinate but I couldn't find how to retrieve the X,Y,Z values of a specific pixel that I've choosen on the depth (or RGB) image.

edit retag close merge delete

Sort by » oldest newest most voted

I was looking for the exact formula to calculate the real world X,Y,Z values by the depth image and finally I got it.

First of all, it appears that the values in the depth image are not giving the distance between any point in the real world and the origin of the Kinect. It appears that it is the distance between the point and Kinect's XY plane (the plane which is parallel to the front surface of Kinect). So, if Kinect is looking at a wall, all the depth values give about the same value. It doesn't matter the real distance between a point on the wall and Kinect's origin.

I found the calculations in the NuiSkeleton.h file: https://code.google.com/p/stevenhickson-code/source/browse/trunk/blepo/external/Microsoft/Kinect/NuiSkeleton.h?r=14

For the Z axis, in line 625, it says:

FLOAT fSkeletonZ = static_cast<FLOAT>(usDepthValue >> 3) / 1000.0f;


You don't have to worry about the bitshift operation because in the description of the method it says:

/// <param name="usDepthValue">
/// The depth value (in millimeters) of the depth image pixel, shifted left by three bits. The left
/// shift enables you to pass the value from the depth image directly into this function.
/// </param>


So, you can use the equation:

FLOAT fSkeletonZ = static_cast<FLOAT>(usDepthValue) / 1000.0f;


the unit is in meters (because of the division by 1000).

For X and Y axes, line 633 and 634 gives:

FLOAT fSkeletonX = (lDepthX - width/2.0f) * (320.0f/width) * NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 * fSkeletonZ;

FLOAT fSkeletonY = -(lDepthY - height/2.0f) * (240.0f/height) * NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 * fSkeletonZ;


The calculation of the Y axis starts with a minus sign because in a picture (like depth image) the Y value increases when you go down but in real world coordinates, conventionally, Y increases when you go up.

For the NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 constant, line 349 defines:

#define NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 (NUI_CAMERA_DEPTH_NOMINAL_INVERSE_FOCAL_LENGTH_IN_PIXELS)


NUI_CAMERA_DEPTH_NOMINAL_INVERSE_FOCAL_LENGTH_IN_PIXELS (3.501e-3f)


So, using these formulas gives you the real world coordinates within the coordinate frame given in the image: more

You can subscribe to the RGB images and Depth images from Kinect using an Approximate Time Synchronizer feature ..so that you can be sure that the RGB images and Depth images are from the same place and time instant.

Next, assuming that you know the considered pixel position in RGB image,

// get raw z value (in mm)

uint16_t z_raw = depth_img.at<uint16_t>(v, u);

// z [meters]

z_mean = z_raw * 0.001;

This is taken from https://github.com/ccny-ros-pkg/rgbdt...

Actually, if I am right, this is what the openni_launch package does. It subscribes to the RGB and depth images, then finds the depth info and creates [x y z] representation using the camera matrix.

You can have a look into the openni_camera and openni_launch package for more details or the link that I have provided.

Good Luck!

more