We describe a novel approach to image based lo- calisation in urban environments which uses semantic matching between images and a 2-D cartographic map. This contrasts with the majority of existing approaches which use image to image database matching. We use highly compact binary descriptors to represent locations, indicating the presence or not of semantic features, which significantly increases scalability and has the potential for greater invariance to variable imaging conditions. The approach is also more akin to human map reading, making it better suited to human-system interaction. In this initial study we use semantic features relating to buildings and road junctions in discrete viewing directions. CNN classi- fiers are used to detect the features in images and we match descriptor estimates with location tagged descriptors derived from the 2-D map to give localisation. The descriptors are not sufficiently discriminative on their own, but when concatenated sequentially along a route, their combination becomes highly distinctive and allows localisation even when using non-perfect classifiers. Performance is further improved by taking into account left or right turns over a route. Experimental results obtained using Google StreetView and OpenStreetMap data show that the approach has considerable potential, achieving localisation accuracy of around 85% using routes corresponding to approximately 200 meters.
- place recognition
- computer vision