Ask Your Question

doubts about interpreting x,y,z from kinect

asked 2012-05-02 02:10:41 -0500

Moataz Elmasry gravatar image

Hello everyone

I'm implementing a stair detection algorithm using kinect. My understanding is that in ros and openni the y and z of 3D points are reversed. so point.y indicates depth, and point.z indicates depth, while (0,0,1) is the normal to the ground plane

Now when I'm running plane/stair detection on some sample stairs, I have the feeling this is not the case, for example:

`Standing down and looking upstairs:
where Normal (x,y,z) is the angle between plane normal and (1,0,0), (0,0,1), (0,1,0), and center means the center point of the plane

Normal (x,y,z) to plane=8=(85.9741,5.20184,86.7113), center=(-486.011,110.283,1775.32, is vertical
Normal (x,y,z) to plane=17=(86.2233,6.65427,84.5293), center=(97.3194,177.953,1440.98, is vertical
Normal (x,y,z) to plane=21=(87.3236,16.1993,74.0353), center=(54.2511,277.825,1146.79, is vertical
Normal (x,y,z) to plane=22=(90.2841,87.4952,2.52085), center=(-26.3924,329.484,929.445, is horizontal
Normal (x,y,z) to plane=24=(87.1528,12.2501,78.0953), center=(-31.5048,393.672,855.474, is vertical
Normal (x,y,z) to plane=25=(89.4157,90.5968,179.165), center=(-21.418,422.656,675.315, is horizontal

since points are scanned from top left to bottom right, so planes with smaller id´s are scanned first, i.e. in the upper case they are the higher further plane.

Now for the case standing up the stairs and looking downwards, here are my planes:

Normal (x,y,z) to plane=67=(90.5932,89.6205,179.296), center=(-223.516,950.897,2188.82, is horizontal
Normal (x,y,z) to plane=85=(90.4772,93.7257,176.244), center=(70.3257,842.521,1605.57, is horizontal
Normal (x,y,z) to plane=86=(90.4431,94.3491,175.628), center=(32.0055,743.043,1304.14, is horizontal
Normal (x,y,z) to plane=88=(90.4187,78.5517,168.544), center=(20.4919,639.753,996.574, is horizontal
Normal (x,y,z) to plane=90=(90.0278,110.79,20.7901), center=(-1.77436,540.584,736.258

Again small plane id means further (but lower planes in this case) As you can see, the 3rd component of the center (z) starts large and goes smaller in both cases, while the (y) component in the first case is small and goes larger and vice versa in second case, so this tells me that this coordinate system is just a standard coordinate system with y indicates height and z indicates depth one more tip, calculating angle between plane normal and (0,0,1) yields almost 90 degrees for horizontal planes and 0 or 180 degrees for vertical planes, while using the normal (0,1,0) as ... (more)

edit retag flag offensive close merge delete


Slightly related, this paper about stair detection could interest you:

AHornung gravatar image AHornung  ( 2012-05-03 06:14:39 -0500 )edit

yes I'm refering to this papaer for my master thesis. Thanks alot

Moataz Elmasry gravatar image Moataz Elmasry  ( 2012-05-09 23:28:53 -0500 )edit

3 Answers

Sort by » oldest newest most voted

answered 2012-05-03 05:46:11 -0500

A (x, y, z) point only makes sense when the reference coordinate system is specified. ROS contains the TF library to do that. I would suggest you fire up RViz and visualize the TF topic to see what is going on. In your case, there are basically two coordinate systems ("frames" in TF terminology) you might be interested in: the optical frame of the Kinect (usually called openni_depth_optical_frame or such) and the coordinate system of the robot, projected onto the ground plane (usually called base_footprint).

In base_footprint, x means forward, y means left (both relative to the robot) and z means up (i.e., away from gravity). TF frames are all standard right-handed coordinate systems.

In openni_depth_optical_frame, x and y use the standard image processing convention, (i.e., x goes from left to right and y goes from top to bottom). z means depth. Again, a standard right-handed coordinate system.

For your application, it is probably more convenient to work in the robot's base frame (or any other frame in which z always points up), since then you become independent from the orientation of the Kinect; also, it becomes trivial to identify horizontal and vertical planes. TF will take care of the necessary conversions for you.

Without knowing the coordinate system, it's hard to be more specific. Assuming you are working in the optical frame, the fact that z decreases when y increases is expected, no matter whether you look up the stairs or down the stairs, isn't it?

edit flag offensive delete link more


Thank you very much for the detailed reply. From kinect I'm using the topic /camera/rgb/points, so the frame is /openni_rgb_optical_frame This is as I understood left hand coordinate system, x left, y up and z forward This should be converted to right hand coordinate system with x forward, y left a

Moataz Elmasry gravatar image Moataz Elmasry  ( 2012-05-15 05:03:07 -0500 )edit

All coordinate systems in ROS are right-handed.

Martin Günther gravatar image Martin Günther  ( 2012-05-15 05:09:05 -0500 )edit

well, but if in depth_optical_frame, x goes left to right, and y top to bottom, then this is left hand side coordinate. Aside from the name, at the moment there´s no robot, so no base frame. A base frame could be created using tf transformation and make it parent for openni_camera.

Moataz Elmasry gravatar image Moataz Elmasry  ( 2012-05-15 05:19:27 -0500 )edit

It's still right-handed, because z goes away from the camera. Try it; it's a little uncomfortable to bend your right hand into that position, but it works! ;-)

Martin Günther gravatar image Martin Günther  ( 2012-05-15 05:24:31 -0500 )edit

re: base_frame - if you plan on using a real robot at some time, where it would be easy to use the base frame information, it would be a shame to throw it away. But you don't need a base frame if you're going for a free-hand camera version.

Martin Günther gravatar image Martin Günther  ( 2012-05-15 05:33:50 -0500 )edit

so I just checked the tf frames in rviz. and it looks like openni_rgb_optical_frame is x to left, y downwards,z away from camera.while openni_rgb_frame has a "standard" orientation. same rules apply for depth frames.

Moataz Elmasry gravatar image Moataz Elmasry  ( 2012-05-15 07:03:17 -0500 )edit

so if points are represented in optical frame coordinates, this means tf static transformation could be done using following vales "0 0 0 1.57 0 -1.57", which basically is rotating 90 degrees on x and and -90 on z axis.

Moataz Elmasry gravatar image Moataz Elmasry  ( 2012-05-15 07:08:13 -0500 )edit

answered 2012-05-16 05:28:35 -0500

Moataz Elmasry gravatar image

updated 2012-10-18 01:56:44 -0500

So it looks like that the problem has been solved (at least for now) Many thanks to Martin Günther I have been working with pcd files extracted from kinect. Either if pcds are recorded from the rgb or depth frame, this understanding apply (rgb and depth keywords are interchangable)

Points are represented under the frame /openni_rgb_optical_frame, which is a child of openni_rgb_frame

  1. The axis for /openni_rgb_optical_frame are: x-right, y-downwards, z-forward
  2. The axis for /openni_rgb_frame are: x-forward, y-left, z-upward ("standard understanding of coordinates in ros")

This can be seen clearly in /tf topic under rviz

now considering the transformation between the two frames through "rostopic echo /tf", one can see that there's a rotation between the two frames with the following values: x: -0.499999841466 y: 0.499601836645 z: -0.499999841466 w: 0.500398163355

so we need to rotate the point cloud in the opoosite direction of these values. something like that

Quaternion<float> q;
q = Quaternion<float> (-0.500398163355, 0.499999841466, -0.499601836645, 0.499999841466);
pcl::transformPointCloud (tmpCloud, source, Eigen::Vector3f (0, 0, 0), q);

That's it, now it looks like the coordinates are again in the standard ros coordinate system

edit flag offensive delete link more

answered 2020-04-13 14:27:35 -0500

Fookii gravatar image

Thanks for the explanation on optical_frame and base_link, The ros openni wrapper will do these automatically, however,If you are using gazebo simulation, there won't be any optical frames as childer to your camera link. which can leads to confusing situation. If you are using gazebo, then refer to this

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools


Asked: 2012-05-02 02:10:41 -0500

Seen: 4,048 times

Last updated: Apr 13