Ask Your Question

Solutions for visually aided 2D/3D SLAM for UAV with ROS

asked 2011-05-04 23:02:09 -0500

tom gravatar image

updated 2014-01-28 17:09:38 -0500

ngrennan gravatar image

I've been wondering what possibilities are there to do visually aided 2D/3D SLAM for UAV in indoor settings with ROS. I've got an IMU talking to ROS, a Kinect sensor and two UVC cameras. What I originally thought would work is feeding robot_pose_ekf with vslam_system's visual odometry (/vo) and IMU data, but I'm having trouble (described here) obtaining /vo. So I'm wondering what other possibilities do I have.

My general goal is very similar to Rainer Hessmer's, but with a cheap DIY quadrotor platform. On the beginning I'd use sonar based altitude hold and assume no pitch/roll for this to hopefully work. Similar projects I'm so far aware of are:

  1. As mentioned above, assuming I can make vslam_system work
  2. A great solution I can only dream of copying is MIT's SLAM with Kinect on a Quadrotor, their code is not yet and will not shortly be released
  3. Patrick Bouffard's Quadrotor Altitude Control and Obstacle Avoidance, similar, could maybe be integrated with gmapping using pointcloud_to_laser to provide SLAM
  4. As suggested here I could maybe use rgbdslam. I'd need to extract 3D odometry estimates from rgbdslam and fuse them with IMU's readings to eliminate drift. Would this be possible?

I'm open for all suggestions on feasibility of the above and beyond. The main problem I'm having is obtaining visual odometry with Kinect or stereo cameras. Any help appreciated. I'd be more than happy to open and describe my solution once I get it working.

edit retag flag offensive close merge delete

6 Answers

Sort by » oldest newest most voted

answered 2011-05-05 03:57:21 -0500

When using rgbdslam I'd assume that the computational limitations on the UAV and the bandwidth limitation could be a problem. This could be addressed, by sending only (downsampled?) monochrome (do you need color?) and depth image over wireless, do registration and backprojection offboard (and change the pcl-point type used by rgbdslam to XYZ only plus some changes in the UI do adapt to monochrome).

Also, rgbdslam has no motion model yet, so you would need to code the integration of the imu data. You could just integrate another subscriber-callback, that integrates all imu readings and, on insertion of a kinect frame to the pose graph, adds an edge that represents the summed up transformation with an appropriate information matrix.

edit flag offensive delete link more


Thanks Felix. I'm assuming computational power won't be a problem, I'll focus on this when the prototype works. I need a source of visual odometry for robot_pose_ekf to work with IMU data. The question is, if I can extract this info using RGBDSLAM, and cut on doing anything unnecessary (from my POV)
tom gravatar image tom  ( 2011-05-08 19:20:54 -0500 )edit
You could do so, by comparing only to the last frame and removing the graph optimization. However, building that from scratch with opencv2 and pcl might be less pain, than removing 90% of rgbdslam to dig out the desired code.
Felix Endres gravatar image Felix Endres  ( 2011-05-08 21:43:58 -0500 )edit

answered 2013-02-12 21:44:22 -0500

Stephan gravatar image

Hi Tom!

As Ivan already suggested, I would separate the problem of continuous motion estimation from the loop closing problem. For vehicle control you need velocity estimates at high rates. For short path following, you need pose estimates that do not "jump" as might occur in a loop closing event.

For the first part of the problem, a fast visual odometry combined with an IMU is a common approach. Apart from the already mentioned libviso2, fovis is a well-coded visual odometry library by MIT, University of Washington and Pontificia Universidad Catolica de Chile which gives more possibilities on introspection than libviso2 and supports RGBD and stereo cameras. I believe that this software is part of the full SLAM system you already mentioned in your question. I highly recommend reading the corresponding paper. We already wrapped libviso2 in ROS and started wrapping fovis, too. See the ROS packages viso2_ros and fovis_ros for details. fovis_ros only supports stereo so far but writing a node for kinect should be fairly simple. We get full frame rate (10Hz) motion estimates using a bumblebee stereo camera (2*1024x768px) with both libraries on an i3 processor. In small environments, the drift is acceptable and you can do something without a full SLAM algorithm.

If you get good motion estimates from your VO/IMU implementation, the second part of the problem -- the loop closing -- does not need to work at high rates. Every now and then, whenever you might be able to close a loop, you can start a pose graph optimization. You need much stronger features (SIFT, SURF etc.) than for visual odometry, as the matching constraints are far less. For the SLAM part, I suggest you to have a look at g2o (also available through the libg2o package: apt-get install ros-fuerte-libg2o), which is the backend of RGBDSLAM (and supposedly others).

There also exist combined approaches such as PTAM (parallel tracking and mapping, ROS wrapper here) that tightly integrate incremental motion estimation and mapping. Note that this algorithm uses monocular images only and therefore needs to be combined with other sensors to provide pose estimates at a metric scale.

Within the last year, a lot of solutions to this problem popped out. I would rather try a bunch, look at the code and adapt it to your needs than starting from scratch.

Good luck!

edit flag offensive delete link more

answered 2011-05-12 09:21:53 -0500

dimatura gravatar image

As a comment on the relative computational requirements of visual odometry with stereo versus Kinect (I don't have karma to comment yet), the Kinect odometry should be more lightweight since the depth information is 'free' (it comes from the Kinect) whereas in stereo you'll have to extract the depth information from the stereo images yourself. The libviso2 link that was linked is referring to monocular visual odometry, which would not be necessary under the Kinect.

edit flag offensive delete link more

answered 2011-05-04 23:53:37 -0500

raphael favier gravatar image

Hello Tom,

I am waiting for ScaViSLAM to be realeased. I think it will provide a nice, lightweight and precise visual slam system for monocular cameras.

I spoke with Hauke (its creator) some months ago. He said he was planning to release it in spring. So I assume (and hope) it will be released quite soon now.


edit flag offensive delete link more


Right, I have forgotten to mention that, thank you. Of course I'd like to try that too. Can this really be more lightweight than stereo camera SLAM? The libviso2 guys are saying different... ( Does ScaViSLAM provide 2D or 3D odometry?
tom gravatar image tom  ( 2011-05-05 00:06:11 -0500 )edit
Well, i must say i am only assuming about the lightweight aspect. I think that you remove the stereo processing, so that should be somehow faster, no? i also understand there was a focus on performances. scavislam should perform a 7 dof slam. Check the paper associated to it.
raphael favier gravatar image raphael favier  ( 2011-05-05 00:45:11 -0500 )edit
From the page cited above: "Due to the 8 correspondences needed for the 8-point [mono SLAM] algorithm, many more RANSAC samples need to be drawn, which makes the monocular algorithm slower than stereo, for which 3 are sufficent." I just suppose this holds for ScaViSLAM as well, but I don't know :).
tom gravatar image tom  ( 2011-05-05 00:53:15 -0500 )edit
I really dont know. I can only draw assumptions. But if you compare them at some points, id love to know about it. One thing i noticed is that the imperial college has quite innovative solutions. who knows, maybe they developped a new approach?
raphael favier gravatar image raphael favier  ( 2011-05-05 00:58:18 -0500 )edit

answered 2013-02-12 10:44:33 -0500

updated 2013-02-12 12:17:05 -0500

You can also check out ccny_rgbd, which provides tools for fast visual odometry with RGB-D cameras.

EDIT as per K_Yousif's question:

I'm not sure if this is also the case with RGBDSLAM, but in our implementation, we've separated the visual odometry from the loop-closing problem. We do provide a mapping interface, which operates on top of the visual odometry and can perform SLAM, but it is not required for the VO.

In terms of the VO, we use a Kalman-Filter based approach which does not require computation of feature descriptors or RANSAC matching. This allows us to use cheap features (such as Lucas-Kanade corners). On an i7 processor, I'm getting a processing time of ~10 to 15ms per frame. Also, our VO has constant time and space requirements (not sure if this is true with RGBDSLAM).

The wiki pages have a bit more information about the pipeline

edit flag offensive delete link more


What are the main differences between your method and RGB-D SLAM? What are the advantages of this method? I presume this is not only a visual odometry method but also builds a map (SLAM)?

K_Yousif gravatar image K_Yousif  ( 2013-02-12 11:27:11 -0500 )edit

Thanks for the update. I would like to read your paper, I could not find it online, has it been published yet? I would have thought an Extended Kalman Filter approuch would be really slow for dense reconstruction (because of the huge covariance matrix) ,

K_Yousif gravatar image K_Yousif  ( 2013-02-12 13:54:16 -0500 )edit

unless you have used seperate EKF's for each feature that are independent of each other. I guess those questions will be answered after reading your paper.

K_Yousif gravatar image K_Yousif  ( 2013-02-12 13:55:07 -0500 )edit

answered 2013-01-26 06:36:11 -0500

There is now also the Multi-Resolution Surfel Map library available, which performs dense RGB-D-SLAM on a CPU.

Details are in this paper:

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools



Asked: 2011-05-04 23:02:09 -0500

Seen: 5,184 times

Last updated: Feb 12 '13