Ask Your Question
1

Coordinates of a specific pixel in depthimage published by Kinect

asked 2014-10-15 09:55:14 -0500

Sekocan gravatar image

updated 2014-10-16 13:41:29 -0500

paulbovbel gravatar image

I am looking for an answer for hours and finally I am asking it here.

I am using CMvision to find a specific color in Kinect's sight and I want to find the real world coordinates of the object with that color. I am planning to use CMvision to find the frame coordinates (as X and Y pixel values on the picture) and use these coordinates and the depth value of that pixel to calculate the real world coordinates.

As I understand, /camera/depth_registered/points topic already gives the real world coordinate but I couldn't find how to retrieve the X,Y,Z values of a specific pixel that I've choosen on the depth (or RGB) image.

Thanks in advance.

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
1

answered 2014-10-20 06:46:34 -0500

Sekocan gravatar image

I was looking for the exact formula to calculate the real world X,Y,Z values by the depth image and finally I got it.

First of all, it appears that the values in the depth image are not giving the distance between any point in the real world and the origin of the Kinect. It appears that it is the distance between the point and Kinect's XY plane (the plane which is parallel to the front surface of Kinect). So, if Kinect is looking at a wall, all the depth values give about the same value. It doesn't matter the real distance between a point on the wall and Kinect's origin.

I found the calculations in the NuiSkeleton.h file: https://code.google.com/p/stevenhickson-code/source/browse/trunk/blepo/external/Microsoft/Kinect/NuiSkeleton.h?r=14

For the Z axis, in line 625, it says:

FLOAT fSkeletonZ = static_cast<FLOAT>(usDepthValue >> 3) / 1000.0f;

You don't have to worry about the bitshift operation because in the description of the method it says:

/// <param name="usDepthValue">
/// The depth value (in millimeters) of the depth image pixel, shifted left by three bits. The left
/// shift enables you to pass the value from the depth image directly into this function.
/// </param>

So, you can use the equation:

FLOAT fSkeletonZ = static_cast<FLOAT>(usDepthValue) / 1000.0f;

the unit is in meters (because of the division by 1000).

For X and Y axes, line 633 and 634 gives:

FLOAT fSkeletonX = (lDepthX - width/2.0f) * (320.0f/width) * NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 * fSkeletonZ;

FLOAT fSkeletonY = -(lDepthY - height/2.0f) * (240.0f/height) * NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 * fSkeletonZ;

The calculation of the Y axis starts with a minus sign because in a picture (like depth image) the Y value increases when you go down but in real world coordinates, conventionally, Y increases when you go up.

For the NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 constant, line 349 defines:

#define NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 (NUI_CAMERA_DEPTH_NOMINAL_INVERSE_FOCAL_LENGTH_IN_PIXELS)

a short Googling gives this page: http://msdn.microsoft.com/en-us/library/hh855368.aspx where it says

NUI_CAMERA_DEPTH_NOMINAL_INVERSE_FOCAL_LENGTH_IN_PIXELS (3.501e-3f)

So, using these formulas gives you the real world coordinates within the coordinate frame given in the image: image description

(ref: http://pille.iwr.uni-heidelberg.de/~kinect01/doc/reconstruction.html)

edit flag offensive delete link more
1

answered 2014-10-15 10:19:33 -0500

sai gravatar image

updated 2014-10-15 10:20:14 -0500

You can subscribe to the RGB images and Depth images from Kinect using an Approximate Time Synchronizer feature ..so that you can be sure that the RGB images and Depth images are from the same place and time instant.

Next, assuming that you know the considered pixel position in RGB image,

// get raw z value (in mm)

uint16_t z_raw = depth_img.at<uint16_t>(v, u);

// z [meters]

z_mean = z_raw * 0.001;

This is taken from https://github.com/ccny-ros-pkg/rgbdt...

Actually, if I am right, this is what the openni_launch package does. It subscribes to the RGB and depth images, then finds the depth info and creates [x y z] representation using the camera matrix.

You can have a look into the openni_camera and openni_launch package for more details or the link that I have provided.

Good Luck!

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2014-10-15 09:55:14 -0500

Seen: 3,762 times

Last updated: Oct 20 '14