Revision history [back]

Hi Tom!

As Ivan already suggested, I would separate the problem of continuous motion estimation from the loop closing problem. For vehicle control you need velocity estimates at high rates. For short path following, you need pose estimates that do not "jump" as might occur in a loop closing event.

For the first part of the problem, a fast visual odometry combined with an IMU is a common approach. Apart from the already mentioned libviso2, fovis is a well-coded visual odometry library by MIT, University of Washington and Pontiﬁcia Universidad Catolica de Chile which gives more possibilities on introspection than libviso2 and supports RGBD and stereo cameras. I believe that this software is part of the full SLAM system you already mentioned in your question. I highly recommend reading the corresponding paper. We already wrapped libviso2 in ROS and started wrapping fovis, too. See the ROS packages viso2_ros and fovis_ros for details. fovis_ros only supports stereo so far but writing a node for kinect should be fairly simple. We get full frame rate (10Hz) motion estimates using a bumblebee stereo camera (2*1024x768px) with both libraries on an i3 processor. In small environments, the drift is acceptable and you can do something without a full SLAM algorithm.

If you get good motion estimates from your VO/IMU implementation, the second part of the problem -- the loop closing -- does not need to work at high rates. Every now and then, whenever you might be able to close a loop, you can start a pose graph optimization. You need much stronger features (SIFT, SURF etc.) than for visual odometry, as the matching constraints are far less. For the SLAM part, I suggest you to have a look at g2o (also available through the libg2o package: apt-get install ros-fuerte-libg2o), which is the backend of RGBDSLAM (and supposedly others).

There also exist combined approaches such as PTAM (parallel tracking and mapping, ROS wrapper here) that tightly integrate incremental motion estimation and mapping. Note that this algorithm uses monocular images only and therefore needs to be combined with other sensors to provide pose estimates at a metric scale.

Within the last year, a lot of solutions to this problem popped out. I would rather try a bunch, look at the code and adapt it to your needs than starting from scratch.

Good luck!