Using ar_track_alvar as "object recognition black box"

asked 2012-12-17 09:08:20 -0500

Hello all,

I'm trying to use this wonderful stack, ar_track_alvar, to serve as a type of object recognition black box. We have a few robots in my lab, and we're trying to develop essentially chase-and-run algorithms, where one robot will chase the others, except it will use some sort of intelligence to anticipate where the prey will run instead of blindly drawing a line between the two and following that.

Think of when you were a kid and chasing your sibling around the kitchen table, at some point you anticipate that they're going to keep going in a circle, and you change your own trajectory to catch them.

Anyway, part of this is that the robots need to be able to recognize each other by sight, since this would simulate how a predator robot would find its "prey" in the "wild" - whether that prey is prey in the military or police sense, or something related such as obstacle avoidance, like a self-driving car anticipating where another car will be in one second and swerving to avoid it. Stuff like that. So, we need to use exclusively vision systems to track - we can't cheat using wireless or anything.

So we want to intercept the data used by ar_track_alvar to tell our robots where in its vision the other robot is. Currently, ar_track_alvar returns a full pose of the marker, which is great if you're looking for that, but for our purposes that's TOO MUCH information. I know ar_track_alvar uses the alvar code, which in turn uses opencv to get stuff done. Can anybody help me figure out a way to intercept the data or modify the code so that we just get information that an object recognition suite would be able to give.

Right now ar_track_alvar uses the known size of the marker to determine the x, y, and z position of the marker in relation to the camera frame, but that's too much information. Really what we need is, given one frame of the image, to determine the x and y pixel information of the alvar marker centroid. Then we'd couple that with the known camera information to determine some theta our robot needs to turn in order to face the other robot. Given two frames of data, we could extrapolate the distance to the object, but we want to treat the robot as if it can recognize an object without having to know its size.

edit retag flag offensive close merge delete

Comments

I'm not sure if I understood correctly. Why don't you want to use the (x,y,z) position of the marker in relation to the camera frame? Getting a bearing angle shouldn't be too difficult using tf if you know the relative tf between the camera frame and your robot frame.

georgebrindeiro ( 2012-12-17 09:32:54 -0500 )edit

@georgebrindeiro

We want to assume that at some point in the future, we'll have an object recognition routine that uses raw camera data to find the bearing towards any object, even previously unknown objects. In that situation, x,y,z data will be unknown, so we want to mimic that here.

DrLeopoldStrabismus ( 2012-12-17 10:14:11 -0500 )edit

add a comment