Get 3D positions of detected objects of a Deep Learning model using TensorFlow/Keras/YOLO/...

asked 2019-02-12 02:41:41 -0500

chbloca gravatar image

updated 2019-02-12 03:43:41 -0500

Hello everyone.

My intention is to obtain the TFs of certain objects that are detected using a depth camera built in a mobile robot and a deep neural network via TensorFlow, Keras or YOLO. I have found so far that those APIs work with RGB images but not with stereo depth images. Also I found a marvelous package called find-object-2d that provides the position of the detected objects in the 3D space via TFs which you can visualize using RViz.

Since the detection algorithms of this package are based on Image Matching (SIFT, SURF, BRIEF, ORB, ...), it is quite limited for my purposes considering alternatives much more powerful like the use of neural networks for object detection based on ImageNet pretraining.

I am kind of stuck at this point and not sure how an implementation can solve this issue. Is there any way to relate find-object-2d package with TensorFlow detections? If not, is there any way to do that without this package?

extra question: how can I filter specific objects to be detected?

edit retag flag offensive close merge delete


Yesterday I almost posted this same exact question. I had found ros_object_analytics but not find-object-2d. This first package lets you use e.g. YOLO with RealSense or Kinect. I'm looking into retraining/transfer learning atm but will be back.

aPonza gravatar image aPonza  ( 2019-02-12 04:36:11 -0500 )edit

To clarify, ros_object_analytics is geared towards the RealSense users, I was thinking of clCaffe when talking about Kinect.

aPonza gravatar image aPonza  ( 2019-02-12 04:38:52 -0500 )edit

Exactly, I found it after posting the package you mentioned. The problem is that apparently publishes 3D boundary boxes (cubes) but not the TFs. Nonetheless, also found this useful package which does exactly that, it is called dodo_detector_ros.

chbloca gravatar image chbloca  ( 2019-02-12 05:33:03 -0500 )edit

From the source, it seems to return a pose in the center of the pointcloud. I'd like to fit an existing mesh onto the found object or to detect a specific point in it for the pose.

aPonza gravatar image aPonza  ( 2019-02-13 07:53:43 -0500 )edit

Yeah, you can modify such piece of code so it returns the mean of the pixel distances of the boundary box. Have you managed to run ros_object_analytics successfully?

chbloca gravatar image chbloca  ( 2019-02-26 11:49:36 -0500 )edit

I launched the nodes and seen the camera correctly infer some of the 20 objects in yolov2 (the example cnn) and I'm working on training on my own dataset. I didn't do much testing apart from what basic stuff was suggested, it seemed to run fine, are you having issues?

aPonza gravatar image aPonza  ( 2019-02-27 01:48:23 -0500 )edit

How can I use this package to train own data set in Yolov3 or Retinanet and get the 3D bounding box of the detected objects? I like to use the 3D Bounding Box in Moveit to do collision checking. Any idea or help?

Astronaut gravatar image Astronaut  ( 2019-07-18 04:01:17 -0500 )edit