ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | Q&A
Ask Your Question

Simulate Object Detection

asked 2020-03-24 11:34:35 -0500

matt_robo gravatar image


We have the following use case: Control a drone based on object (people) detections coming from a video stream.

To simulate this within Gazebo, we would have to simulate the stream of incoming object detections since we don't have the real camera input that we can run inference on. I searched the internet to find out if something like this already exists but couldn't find anything. Am I missing something, is anyone familiar with a package like that?

The working principle would be as follow. Given we have a camera with given intrinsics positioned at some location looking in some direction and given an object which we place somewhere in the world (could be a cube), what would the resulting bounding box be on the image plane based on the object's silhouette?

Cheers, Matt

edit retag flag offensive close merge delete



Hi @matt_robo,

Maybe I did not understand you but, why not using a camera plugin to process the output with OpenCV for instance.

Weasfas gravatar image Weasfas  ( 2020-03-24 13:08:03 -0500 )edit

Gazebo can provide the ground truth location of the object as well as your drone, so you could do a tf lookup to get the position.

nkhedekar gravatar image nkhedekar  ( 2020-03-24 17:42:57 -0500 )edit

Hey, were you able to find any solution?

Anukriti gravatar image Anukriti  ( 2021-08-06 00:09:50 -0500 )edit

This better would have been a comment

Fetullah Atas gravatar image Fetullah Atas  ( 2021-08-06 14:19:02 -0500 )edit

2 Answers

Sort by » oldest newest most voted

answered 2021-08-06 14:33:04 -0500

Gazebo is a 3D simulator. Every object within a world is model with certain physical/dynamical properties. For your case, You could use a camera plugin provided by gazebo. This will let you to capture a desired FOV of World you are simulating as 2D image, you can then place your human models into FOV and simulate a camera stream including objects and maybe some background.

Gazebo’s API is brilliant, you can write a World Plugin, through that world plug-in you can access to every model, as other person answered above, each models also have a 3D Collision Box. Since you know the camera position in the world you can determine the object (models) that are on camera FOV.

From there you will need to do two transforms to get that 3D box overlayed onto your image. First, since the pose of that 3D box is in World frame, you have to transform that to camera frame, after you have the pose in camera frame, you then use camera instrinstic parameters to get pixel coordinates of 3D box in image. Note that, since this is a 3D box, you will have 8 corners in image, but it’s easy to just get minimum 2D bounding box that covers all this 8 corners. This 2D bounding box then can be used as a Grojnd truth, and from there it’s up to you if you want to use this data for a Neural Network training.

this answer was more intended to give an high level overview of steps rather than some practical help. Hope it’s useful somehow. Cheers !

edit flag offensive delete link more

answered 2022-02-02 16:47:41 -0500

KhalidOwlWalid gravatar image

updated 2022-02-04 10:10:59 -0500

I have done something similar in one of my projects but due to confidential reasons, I cant simply give you the code but I'll structure a general idea of how to implement it. I hope that this might be what you and other people are looking for since I was also struggling with this back then:

  1. You would have to write some ground truth object position which would find where the object's position is with respect to your robot base_link and publish those messages into a topic
  2. Now, your robot base_link is generally the "world coordinate" and you would have to find the transformation from base_link to the camera_frame. have a look at slide 7
  3. After transforming it to the camera coordinate, now you should project the 3d image into 2d image plane and this can be done easily using the image geometry pinholecameramodel
  4. You should now get the pixel coordinate of the object's position with respect to the camera
  5. If your robot has a camera plugin, you can get the live camera feed by subscribing to the any topics with an image type

Here is how you can get the live camera feed from your camera plugin:

#!/usr/bin/env python

import rclpy
import time
import message_filters
import cv2

from cv_bridge import CvBridge, CvBridgeError
from rclpy.qos import qos_profile_sensor_data, QoSProfile

from rclpy.node import Node
from sensor_msgs.msg import Image, CameraInfo
from image_geometry import PinholeCameraModel
from eufs_msgs.msg import BoundingBoxes

class BoundingBox():

    def __init__(self):
        self.cone_list = {'blue':[],'yellow':[],'orange':[], 'big orange':[], 'unknown':[]}

class DrawBoundingBox(Node):

    def __init__(self):


        self.declare_parameter('camera_image_topic', 'zed/left/image_rect_color')

        self.camera_image_topic = self.get_parameter('camera_image_topic').get_parameter_value().string_value

        qos_profile = qos_profile_sensor_data

        # For image topic, you need to set it to equal to qos_profile sensor data because we are using
        # the data from the zed camera
        self.image_sub = message_filters.Subscriber(

        self.bounding_box_sub = message_filters.Subscriber(

        sync_ = message_filters.ApproximateTimeSynchronizer([self.image_sub, self.bounding_box_sub], 10, 0.5)

    def image_callback(self, image_msg, bounding_box_msg):

        debug = False

            self.bridge = CvBridge()

            cv_image = self.bridge.imgmsg_to_cv2(image_msg, desired_encoding="bgr8")

            for cone in bounding_box_msg.bounding_boxes:

                if debug:
          , (int(cone.xmin), int(cone.ymin)), 3, (255,0,0))

                    cv2.rectangle(cv_image, (int(cone.xmin), int(cone.ymax)), (int(cone.xmax),int(cone.ymin)), (255,0,0))

            cv2.imshow(f'From {self.camera_image_topic} ', cv_image)


        except CvBridgeError as e:

def main(args=None):


    bounding_box_node = DrawBoundingBox()

        while rclpy.ok():

    except KeyboardInterrupt:


if __name__ == '__main__':

Hope it helps!

Note: Please be aware that the axis when trying to use the image geometry pinholecameramodel is a bit different than gazebo's axes.

see here

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower


Asked: 2020-03-24 11:34:35 -0500

Seen: 1,037 times

Last updated: Feb 04