Ask Your Question

Simulate Object Detection

asked 2020-03-24 11:34:35 -0500

matt_robo gravatar image


We have the following use case: Control a drone based on object (people) detections coming from a video stream.

To simulate this within Gazebo, we would have to simulate the stream of incoming object detections since we don't have the real camera input that we can run inference on. I searched the internet to find out if something like this already exists but couldn't find anything. Am I missing something, is anyone familiar with a package like that?

The working principle would be as follow. Given we have a camera with given intrinsics positioned at some location looking in some direction and given an object which we place somewhere in the world (could be a cube), what would the resulting bounding box be on the image plane based on the object's silhouette?

Cheers, Matt

edit retag flag offensive close merge delete



Hi @matt_robo,

Maybe I did not understand you but, why not using a camera plugin to process the output with OpenCV for instance.

Weasfas gravatar image Weasfas  ( 2020-03-24 13:08:03 -0500 )edit

Gazebo can provide the ground truth location of the object as well as your drone, so you could do a tf lookup to get the position.

nkhedekar gravatar image nkhedekar  ( 2020-03-24 17:42:57 -0500 )edit

2 Answers

Sort by » oldest newest most voted

answered 2021-08-06 14:33:04 -0500

Fetullah Atas gravatar image

Gazebo is a 3D simulator. Every object within a world is model with certain physical/dynamical properties. For your case, You could use a camera plugin provided by gazebo. This will let you to capture a desired FOV of World you are simulating as 2D image, you can then place your human models into FOV and simulate a camera stream including objects and maybe some background.

Gazebo’s API is brilliant, you can write a World Plugin, through that world plug-in you can access to every model, as other person answered above, each models also have a 3D Collision Box. Since you know the camera position in the world you can determine the object (models) that are on camera FOV.

From there you will need to do two transforms to get that 3D box overlayed onto your image. First, since the pose of that 3D box is in World frame, you have to transform that to camera frame, after you have the pose in camera frame, you then use camera instrinstic parameters to get pixel coordinates of 3D box in image. Note that, since this is a 3D box, you will have 8 corners in image, but it’s easy to just get minimum 2D bounding box that covers all this 8 corners. This 2D bounding box then can be used as a Grojnd truth, and from there it’s up to you if you want to use this data for a Neural Network training.

this answer was more intended to give an high level overview of steps rather than some practical help. Hope it’s useful somehow. Cheers !

edit flag offensive delete link more

answered 2021-08-06 00:09:50 -0500

Anukriti gravatar image

Hey, were you able to find any solution?

edit flag offensive delete link more


This better would have been a comment

Fetullah Atas gravatar image Fetullah Atas  ( 2021-08-06 14:19:02 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower


Asked: 2020-03-24 11:34:35 -0500

Seen: 589 times

Last updated: Aug 06