SliceNet: Street-to-Satellite Image Metric Localization using Local Feature Matching

Master thesis (2022)

Authors

T. de Vries Lentsch Mechanical Engineering

Contributors

J.F.P. Kooij Intelligent Vehicles - Mechanical, Maritime and Materials Engineering (mentor)

Z. Xia Intelligent Vehicles - Mechanical, Maritime and Materials Engineering (mentor)

Holger Caesar Intelligent Vehicles - Mechanical, Maritime and Materials Engineering (coach)

S. Khademi History, Form & Aesthetics - Architecture and the Built Environment (coach)

Faculty

Mechanical Engineering, Mechanical Engineering

Pose estimation Image matching Cross-view matching Street-to-satellite image matching Vehicle localization SliceNet Street camera localization Visual localization Image metric localization Local feature matching

To reference this document use:

http://resolver.tudelft.nl/uuid:fd6af5cb-c8a7-4b54-8161-b34ede4cf2dd

More Info

expand_more

Published Date

11-10-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

This work addresses visual localization for intelligent vehicles. The task of cross-view matching-based localization is to estimate the geo-location of a vehicle-mounted camera by matching the captured street view image with an overhead-view satellite map containing the vehicle's local surroundings. This local satellite view image can be obtained using any rough localization prior, e.g., from a global navigation satellite system or temporal filtering. Existing cross-view matching methods are global image descriptor-based and achieve considerably lower localization performance than structure-based methods with 3D maps. Whereas structure-based methods utilized global image descriptors in the past, recent structure-based work has shown that significantly better localization performance can be achieved using local image descriptors to find pixel-level correspondences between the query street view image and the 3D map. Hence, using local image descriptors may be the key to improving the localization performance of cross-view matching methods. However, the street and the satellite view do exhibit not only very different visual appearances but also have distinctive geometric configurations. As a result, finding correspondences between the two views is not a trivial task. We observe that the geometric relationship between the street and satellite view implies that every vertical line in the street view image has a corresponding azimuth direction in the satellite view image. Based on this prior, we devise a novel neural network architecture called SliceNet that extracts local image descriptors from both images and matches these to compute a dense spatial distribution for the camera's location. Specifically, the geometric prior is used as a weakly supervised signal to enable SliceNet to learn the correspondences between the two views. As an additional task, we also show that the extracted local image descriptors can be used to determine the heading of the camera. SliceNet outperforms global image descriptor-based cross-view matching methods and achieves state-of-the-art localization results on the VIGOR dataset. Notably, the proposed method reduces the median metric localization error by 21% and 4% compared to the state-of-the-art methods when generalizing, respectively, in the same area and across areas.

Files

Thesis_SliceNet_TedDeVriesLent... (pdf)

(pdf | 15.4 Mb)

Unknown license