Ask Your Question

Inverted files not used in Vocabulary Tree find ?

asked 2011-09-05 21:49:28 -0500

daveudaimon gravatar image


I found the implementation of the vocabulary tree in ROS very interresting, but by looking carefully at the code I've noticed that the inverted files are not used to query the document database.

As far as I understand it, this kills all the interest for a visual vocabulary tree. Indeed, if i have to scan ALL THE DOCUMENTS in the database to find the matches (as it is actually done), why bother quantitizing features in words ? I could as well compare directly the query image with all the images in the database.

But the code is nice. If no further development is planed to make a proper use of the tf-idf files that allow a FAST COMPUTATION of the distance between the query document and the database, I'm willing to contribute.

Patrick, would you like us to work together on it ?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2011-09-07 10:17:25 -0500

Patrick Mihelich gravatar image

Hi Dave,

You're correct - the ROS vocabulary tree computes distances to every document in the database when you a query. Fortunately quantizing features into words makes each individual distance computation quite fast, and for the sort of place recognition tasks (only 100s of documents) we've done with vocabulary_tree the performance is fine. For scaling to much larger databases, using the inverted files to try only relevant documents probably is a big win.

Improving vocabulary_tree isn't on my critical path at the moment, but if you're interested in contributing I'm happy to answer questions about the code and apply patches. I think this optimization is not too complicated - Database::find() just needs to be updated to do something like:

  • Look at the InvertedFile for each word in the query document
  • Build a std::set containing the DocId's from all the inverted files.
  • Only compute distances to those documents.

I'd be very interested to see a performance comparison for a suitably large dataset.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools


Asked: 2011-09-05 21:49:28 -0500

Seen: 366 times

Last updated: Sep 07 '11