Nicolò Valigi | articles | talks

Open source Visual SLAM evaluation

Navigation is a critical component of just any autonomous system, and cameras are a wonderfully cheap way of addressing this need. From among the dozens of open-source packages shared by researchers worldwide, I've picked a few promising ones and benchmarked them against a indoor drone dataset. The results will be useful to hackers and DIYers that want to add localization capabilities to their drone or autonomous vehicle. All the code I used for running the tests, benchmarking, and plotting the results is available on my Github.

The contenders

There's a few fundamentally different approaches to visual-based localization, developed over decades of research. I've tried to pick the most useful representative of each family to paint a broad picture and help you decide which route to take.

The evaluation

I've used the EuRoC dataset from ETHZ Zurich (available here) since it has high-quality stereo images, the IMU data needed by rovio, and accurate ground truth from a laser system. These are 10-20m indoor trajectories recorded on a small drone.

While viso2 and rovio already come with good ROS support, ORB-SLAM2 is a rather poor citizen of the ecosystem, so I wrote a new wrapper. Some more work was required to make sure that the camera calibration was set correctly and that the reference frames were aligned among the different implementations. All the code you need to re-run the evaluation is on Github.

Accuracy results

The following plot compares the estimated trajectories against ground truth.

The results are not surprising: ORB-SLAM is the most accurate, and tracks the actual trajectory amazingly well. Rovio is a close second, whereas the purely odometric viso2 accumulates a substantial drift over time. I've also evaluated ORB-SLAM in its special localization mode that disables mapping of new features and thus works like odometry.

Computational load

Of course, accuracy alone doesn't tell the whole picture, since most autonomous robots suffer from hardware limitations of some sort. None of these algorithms run on the GPU, so we're only looking at the CPU here. An highly-unscientific benchmark suggests the following running times for the four algorithms:

Rovio is much faster than viso2 here, probably because it only matches ~30 features per frame compared to ~200 for viso2 (which also runs an expensive RANSAC check to detect outlier corrispondences). In any case, that's impressive for a monocular system that doesn't even benefit from multiple cameras.

Expectedly, ORB-SLAM is much slower than the other two, and only runs in real time thanks to multithreading. I would have assumed local mapping to make a bigger difference, but that didn't seem to be the case.


If you don't need accurate mapping and loop closures, Rovio is an excellent performer. The slower (and less accurate) Viso2 could be useful on computationally constrained platforms where adding a stereo camera is more convienent than setting up the IMU needed for Rovio. ORB-SLAM confirmed its good reputation for accuracy but may be too expensive to run on today's mobile platforms.