I have a setup with a bottom camera (simple webcam pointed at the ground). Along with the camera is an IMU, which means that I have the attitude of the camera. I also have an estimation of the distance from the camera to the ground.
Since I'm detecting some target on the ground (I'm able to detect and get their position on the image using openCV), I would like to extract their position on the world frame. However I'm lost on how to do it.
Are there any ROS package that implement this? How should I do it?
https://answers.ros.org/question/214116/2d-image-point-to-3d/?answer=214209#post-id-214209It sounds like you have the 3D point that describes the location of the camera, and since you have the IMU with it you know it's orientation as well. Since you also know the estimated location of the ground plane this can be treated as a ray-plane intersection problem, which is a common problem in computer graphics that solves the x,y,z intersection point of a ray and a plane.
The line that starts from the camera and points through the ground is the ray, and the ground or course is the plane. Since the object you're selecting/detecting from the camera image is not always in the center pixel, you'll need to add a pan and tilt angle to the ray depending on which pixel the center of the object corresponds to, the [image_geometry](http://wiki.ros.org/image_geometry) package has tools to help do just that.
As mig mentioned, you'll want to use tf to help keep track of all the transformation frames, and a urdf can make this even easier.
That's actually what I had in mind, using the plane z=0 (the ground). However I'm unsure how to obtain the ray, since I only have a 3x3 matrix of intrinsic parameters (camera matrix). And how should I get the scale factor? I'm a little bit lost.
https://answers.ros.org/question/214116/2d-image-point-to-3d/?answer=214174#post-id-214174You're saying you have the position of the camera in 3D already, right?
Then, you would have to write this yourself, but this is fairly easy with the [`tf` library](http://wiki.ros.org/tf) (best read through the documentation and the tutorials). However, you need to have your camera and IMU set up correctly in the `urdf`.
Static tf is fine. Just assumed you'd have a robot... You say you have an estimate of the distance from the camera to the ground --> `z`.

Otherwise, from a monocular camera you cannot tell the distance to an object (as long as you don't know the exact parameters of the the object and estimate it).
Do i need to use the `urdf` model? Is it not enough to define a static tf between the camera and IMU?

How should I get the `x,y,z` with respect to the camera? I only have the `x,y` in the image, and `camera_calibration` only outputs the intrinsic parameters (from what I can tell).
I couldn't find any relevant tutorial to this question over there, could you pinpoint exactly which one is referring to 2d-3d coordinate conversion?