Robotics StackExchange | Archived questions

Get position of the objects detected by the Yolov3 in ROS

Dear all,

I have a simulated scene with some objects in the V-REP, where the robot base is fixed and robot does joint movements. I have two camera sources (vision cameras placed on left and right sides to cover the scene).

On one hand, I have trained my yolov3 with the images from the cameras to identify objects-- this is done.

On the other hand, I have published the raw image, depth image and the camera info to the ROS.

Now I wanted to identify the position of the detected objects. I have been researching about this, but I couldn't able to formulate a clear idea or process to solve this.

I would appreciate and it would be really helpful to me if you can provide some guidance or links to the documentation that can help me to solve this or to get an idea regarding this.

Thank you in advance.

Asked by lmand on 2019-08-08 05:56:14 UTC

Comments

Answers

This is more of a question of whether your raw image and depth image are in same frame, if they are then, you just look at corresponding pixel value on depth image and you get the distance, if they are not then you can do something like following.

  1. You need to fuse your depth information with camera so you need fusion. Here is a good starting point, it works with melodic and is easy enough to use. Also note that you have depth-image and this package doesn't work with depth-images but with pointcloud, you can either modify it to work with depth image (not sure how easy it is or is it possible) or you can use this to convert your depth image into pointcloud This step can be done in 2 ways. Fuse whole image with depth image, or fuse part of it, just the bounding box.
  2. You look at specific pixel on object and find corresponding value from depth image. Note that bounding box will also include points that are not part of object so it might be good idea to apply a clustering algorithm beforehand and maybe take mean of the left points, or take the point that is closest (might not give correct result always), reduce size of bounding box to have less points that are not part of object, get creative with it.

Hope this helps :)

Asked by Choco93 on 2019-08-12 05:28:20 UTC

Comments

Thank you so much for your answer.

Yes I have them in same frame. The conversion of depth image to point clouds is easy thanks to ROS functions, I have done it previously. But, I don't think it is necessary. Following questions a raised from your solution:

1) If I want to use the output from all the cameras, then I need to have them in the common reference frame. As the output of the mean of points after the lidar camera fusion will be a point in the world with respect to camera coordinate system, how can I make a fusion between different camera outputs to build a confidence factor on the position of the object present in the world?

2) Is there any way that I can use the pinhole model or triangulation methods or some other methods in order to identify the objects that have been detected by the YOLOv3.. I am thinking of these because I have all the intrinsic and extrinsic parameter values of the camera.

Thank you so much in advance. I am looking forward for some comments on this :)

Asked by lmand on 2019-08-12 08:05:11 UTC

You are welcome, so just getting the depth is quite easy, just a lookup. Now for the questions you asked,

  1. I don't think there is a way to get your images in a same frame as normal transforms don't apply to images as they have 2D information, It is possible if you fuse both images and transform xyzrgb info to same frame. From there you can use bounding box info from detection output of both cameras and try to raise your confidence.
  2. I don't think it can be done, it is doable if you know exact dimensions of objects (like in case of chess board or tags). It will involve bunch of maths and some opencv magic, and is only possible for very few objects. This is my opinion, maybe there is a way to do it, but you will have to search web for it.

Asked by Choco93 on 2019-08-12 08:45:21 UTC

ok. Thank you for the information :)

I will try to proceed further and see if I can get the position with high confidence.

Asked by lmand on 2019-08-12 09:16:01 UTC