ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

Using ar_track_alvar as "object recognition black box"

asked 2012-12-17 09:08:20 -0600

DrLeopoldStrabismus gravatar image

Hello all,

I'm trying to use this wonderful stack, ar_track_alvar, to serve as a type of object recognition black box. We have a few robots in my lab, and we're trying to develop essentially chase-and-run algorithms, where one robot will chase the others, except it will use some sort of intelligence to anticipate where the prey will run instead of blindly drawing a line between the two and following that.

Think of when you were a kid and chasing your sibling around the kitchen table, at some point you anticipate that they're going to keep going in a circle, and you change your own trajectory to catch them.

Anyway, part of this is that the robots need to be able to recognize each other by sight, since this would simulate how a predator robot would find its "prey" in the "wild" - whether that prey is prey in the military or police sense, or something related such as obstacle avoidance, like a self-driving car anticipating where another car will be in one second and swerving to avoid it. Stuff like that. So, we need to use exclusively vision systems to track - we can't cheat using wireless or anything.

So we want to intercept the data used by ar_track_alvar to tell our robots where in its vision the other robot is. Currently, ar_track_alvar returns a full pose of the marker, which is great if you're looking for that, but for our purposes that's TOO MUCH information. I know ar_track_alvar uses the alvar code, which in turn uses opencv to get stuff done. Can anybody help me figure out a way to intercept the data or modify the code so that we just get information that an object recognition suite would be able to give.

Right now ar_track_alvar uses the known size of the marker to determine the x, y, and z position of the marker in relation to the camera frame, but that's too much information. Really what we need is, given one frame of the image, to determine the x and y pixel information of the alvar marker centroid. Then we'd couple that with the known camera information to determine some theta our robot needs to turn in order to face the other robot. Given two frames of data, we could extrapolate the distance to the object, but we want to treat the robot as if it can recognize an object without having to know its size.

edit retag flag offensive close merge delete



I'm not sure if I understood correctly. Why don't you want to use the (x,y,z) position of the marker in relation to the camera frame? Getting a bearing angle shouldn't be too difficult using tf if you know the relative tf between the camera frame and your robot frame.

georgebrindeiro gravatar image georgebrindeiro  ( 2012-12-17 09:32:54 -0600 )edit


We want to assume that at some point in the future, we'll have an object recognition routine that uses raw camera data to find the bearing towards any object, even previously unknown objects. In that situation, x,y,z data will be unknown, so we want to mimic that here.

DrLeopoldStrabismus gravatar image DrLeopoldStrabismus  ( 2012-12-17 10:14:11 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2013-03-03 09:00:03 -0600

sniekum gravatar image

If you are willing to modify the ar_track_alvar code, pixel location is actually calculated before 3D position, so you could publish that to a topic. However, it might be significantly easier just to project the x,y,z coordinates back onto the image plane. That's essentially what rviz is doing anyway, so that it can visualize the AR tags.

edit flag offensive delete link more

Question Tools


Asked: 2012-12-17 09:08:20 -0600

Seen: 322 times

Last updated: Mar 03 '13