Learning to read maps
: geolocation by embedding images and maps

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)


This thesis presents an investigation into the use of visual information for geolocation aimed at both terrestrial and aerial autonomous systems operating in urban and suburban environments. The core contribution is a novel learning-based method that links semantic elements visible in images with their depiction on a planimetric map. The work is partly inspired by humans’ ability to read such maps for localising themselves in the world. We show that the method can be used for successful geolocation using both street-level and aerial images.

The method is based on a learned embedded vector space of encoded images and localised areas of a map (tiles) within which images and tiles corresponding to similar places are close. The Euclidean distance between vectors provides a measure of similarity and enables geolocation of an image by determining its closest map tile. Conversely, it also allows retrieving places spatially coherent with the semantics indicated in a rendered or sketched tile. Experiments demonstrate that the approach is effective, with for example, top-1 recall rates of up to 78% being achieved in urban areas greater than 2.3 km2 with over 5,000 locations. Moreover, comparison with previous work based on using aerial images as the reference domain, suggests that using maps results in more robust embeddings.

We also investigate geolocation based on sequences of images along routes and show that concatenating individual vectors resolves local ambiguities quickly, leveraging the uniqueness of semantic patterns observed in urban trajectories. Accuracy of over 96% is obtained when using sequences with five locations (50 m) in a set of terrestrial testing routes. The concept was also applied to geolocate aerial systems employing a particle filter which integrates visual odometry with the similarity between aerial images and candidate map tiles. Moreover, we propose a method to approximate embedded vectors for particles in testing time using linear interpolation for efficiency and scalability. Experiments in areas up to 50 km2 in three cities of the UK showed convergence rates above 70% and typical localisation and orientation errors of around 100 m and 10 degrees.

The above results demonstrate that the method has learned to link and encode diverse semantic clues between images and maps, generalising early approaches and allowing quicker and more robust geolocation.
Date of Award24 Jan 2023
Original languageEnglish
Awarding Institution
  • The University of Bristol
SupervisorAndrew Calway (Supervisor) & Walterio W Mayol-Cuevas (Supervisor)


  • Geolocation

Cite this