Ask Your Question

Is people detection enough for them to be marked on a simulated map like in rviz?

asked 2018-09-13 01:10:30 -0500

Nelle gravatar image

Hello all.

We are now in the later part of our project. We have already integrated ROS with YOLO for people detection and have also been able to make the robot avoid obstacles using sonars and IRs. My question this time is how can we mark a detected person in a simulated map just to know where it is? We only have sonars, IRs, and the raspberry pi camera used in YOLO. We do not have any kinect or depth sensors. Is this possible? And if we do have rgb-d cameras, can ROS-YOLO do this alone? Or do we need some people tracking package too? Can a robot go to a detected person without some sort of person tracking package? Really confused on people detection vs. people tracking. Can someone enlighten me please?

edit retag flag offensive close merge delete



  • detection: figuring out if there are certain objects in a scene, frame-by-frame
  • tracking: maintaining history of detections, predicting where they will be in the (near) future and making sure detections in the current frame are "the same" objects as in the previous frame
gvdhoorn gravatar image gvdhoorn  ( 2018-09-13 01:23:39 -0500 )edit

If I want my robot to go to a detected person, do I need a tracking package?

Nelle gravatar image Nelle  ( 2018-09-13 01:41:13 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2018-09-13 01:51:25 -0500

Choco93 gravatar image

With just camera, no because you need depth information. And I don't think info from sonar can be easily mapped onto image. Most common thing would be to have pointcloud. And then you can use move_base to get obstacle info. But if you want to go the other way around, that's how I'd do it:

  1. Map lidar points onto image so that you have XYZRGB info as a new pointcloud message.
  2. Use clustering algorithm on pointcloud(XYZ) to get the clusters.
  3. Through yolo you already have bounding boxes, find overlap between those bounding boxes and pointcloud clusters.
  4. You can set up a limit for %age overlap, so if overlap is greater than 50% then that cluster belongs to human.
  5. Now you can make custom obstacle layer and feed four extreme points of clusters to it and it will mark that as an obstacle on costmap (optional, if you have done all above this should be easy).
  6. These are the same co-ordinates of person so you can give a goal relative to map frame and it should go to that person.

And people detection and tracking is already explained in a comment. There are package available like leg_detector that can detect a person and you can get coordinate info from there as well and give that as a goal. And for your case detection should be enough I guess.

Hope this explained things a bit.

edit flag offensive delete link more


Our ROS-YOLO only publishes binary data like 1 if there is a person detected or 2 if there are no person detected. If we are to rebuild this, how can we also publish the coordinates of the detected person with only our rpi cam and other sensors?

Nelle gravatar image Nelle  ( 2018-09-13 02:05:56 -0500 )edit

We do not have wheel encoders too, btw. So I'm not really sure how to move the robot towards the person it detected.

Nelle gravatar image Nelle  ( 2018-09-13 02:07:51 -0500 )edit

you cannot publish coordinates just using camera, you need pointcloud as well. And can you share a link to yolo implementation that you are using or is it custom? And as I have explained, yolo will only give person detection, bounding box, you have to compare it with clusters yourself.

Choco93 gravatar image Choco93  ( 2018-09-13 02:29:54 -0500 )edit

And for moving robot you need to know how much to move, it's better to have IMU and wheel encoders to have an acceptable odom, but can also be hackily done with pointcloud.

Choco93 gravatar image Choco93  ( 2018-09-13 02:31:22 -0500 )edit

@Choco93: a pointcloud could technically not be needed: if using visual servoing, one could use a 2D image only to drive towards a detected object. It would not be too robust, and would probably have to rely on thresholding a bit, but it could probably be done. Bounding boxes would be needed though.

gvdhoorn gravatar image gvdhoorn  ( 2018-09-13 02:32:14 -0500 )edit

@Nelle wrote:

We do not have wheel encoders too, btw.

that is a serious limitation, although perhaps visual odometry could work around it.

gvdhoorn gravatar image gvdhoorn  ( 2018-09-13 02:33:02 -0500 )edit

yes visual servoing can be used as well, but again depends how much robustness, accuracy is desired and what is the end goal.

Choco93 gravatar image Choco93  ( 2018-09-13 02:37:41 -0500 )edit

how can make the robot move towards the image using 2D images? Do I need YOLO to publish certain msgs like this one?

Nelle gravatar image Nelle  ( 2018-09-13 02:49:52 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools



Asked: 2018-09-13 01:10:30 -0500

Seen: 357 times

Last updated: Sep 13 '18