We have the following use case: Control a drone based on object (people) detections coming from a video stream.

To simulate this within Gazebo, we would have to simulate the stream of incoming object detections since we don't have the real camera input that we can run inference on. I searched the internet to find out if something like this already exists but couldn't find anything. Am I missing something, is anyone familiar with a package like that?

The working principle would be as follow. Given we have a camera with given intrinsics positioned at some location looking in some direction and given an object which we place somewhere in the world (could be a cube), what would the resulting bounding box be on the image plane based on the object's silhouette?

Cheers, Matt

Hi @matt_robo,

Maybe I did not understand you but, why not using a camera plugin to process the output with OpenCV for instance.

Weasfas

Gazebo can provide the ground truth location of the object as well as your drone, so you could do a tf lookup to get the position.

nkhedekar