ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

Get position of the objects detected by the Yolov3 in ROS

asked 2019-08-08 06:03:53 -0600

lmand gravatar image

updated 2019-08-12 05:03:13 -0600

Dear all,

I have a simulated scene with some objects in the V-REP, where the robot base is fixed and robot does joint movements. I have two camera sources (vision cameras placed on left and right sides to cover the scene).

On one hand, I have trained my yolov3 with the images from the cameras to identify objects-- this is done.

On the other hand, I have published the raw image, depth image and the camera info to the ROS.

Now I wanted to identify the position of the detected objects. I have been researching about this, but I couldn't able to formulate a clear idea or process to solve this.

I would appreciate and it would be really helpful to me if you can provide some guidance or links to the documentation that can help me to solve this or to get an idea regarding this.

Thank you in advance.

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted

answered 2019-08-12 05:28:20 -0600

Choco93 gravatar image

updated 2019-08-12 07:19:56 -0600

This is more of a question of whether your raw image and depth image are in same frame, if they are then, you just look at corresponding pixel value on depth image and you get the distance, if they are not then you can do something like following.

  1. You need to fuse your depth information with camera so you need fusion. Here is a good starting point, it works with melodic and is easy enough to use. Also note that you have depth-image and this package doesn't work with depth-images but with pointcloud, you can either modify it to work with depth image (not sure how easy it is or is it possible) or you can use this to convert your depth image into pointcloud This step can be done in 2 ways. Fuse whole image with depth image, or fuse part of it, just the bounding box.
  2. You look at specific pixel on object and find corresponding value from depth image. Note that bounding box will also include points that are not part of object so it might be good idea to apply a clustering algorithm beforehand and maybe take mean of the left points, or take the point that is closest (might not give correct result always), reduce size of bounding box to have less points that are not part of object, get creative with it.

Hope this helps :)

edit flag offensive delete link more


Thank you so much for your answer.

Yes I have them in same frame. The conversion of depth image to point clouds is easy thanks to ROS functions, I have done it previously. But, I don't think it is necessary. Following questions a raised from your solution:

1) If I want to use the output from all the cameras, then I need to have them in the common reference frame. As the output of the mean of points after the lidar camera fusion will be a point in the world with respect to camera coordinate system, how can I make a fusion between different camera outputs to build a confidence factor on the position of the object present in the world?

2) Is there any way that I can use the pinhole model or triangulation methods or some other methods in order to identify the objects that have been ...(more)

lmand gravatar image lmand  ( 2019-08-12 08:05:11 -0600 )edit

You are welcome, so just getting the depth is quite easy, just a lookup. Now for the questions you asked,

  1. I don't think there is a way to get your images in a same frame as normal transforms don't apply to images as they have 2D information, It is possible if you fuse both images and transform xyzrgb info to same frame. From there you can use bounding box info from detection output of both cameras and try to raise your confidence.
  2. I don't think it can be done, it is doable if you know exact dimensions of objects (like in case of chess board or tags). It will involve bunch of maths and some opencv magic, and is only possible for very few objects. This is my opinion, maybe there is a way to do it, but you will have to search web for it.
Choco93 gravatar image Choco93  ( 2019-08-12 08:45:21 -0600 )edit

ok. Thank you for the information :)

I will try to proceed further and see if I can get the position with high confidence.

lmand gravatar image lmand  ( 2019-08-12 09:16:01 -0600 )edit

Question Tools

1 follower


Asked: 2019-08-08 05:56:14 -0600

Seen: 1,261 times

Last updated: Aug 12 '19