Enhancing RGB-D SLAM Using Deep Learning

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)


This thesis addresses the problem of Simultaneously Localisation and Mapping (SLAM) with colour and depth (RGB-D) data. The problem is also known as RGB-D SLAM and it has been widely studied in both robotics and computer vision communities for decades. Although both the performance and robustness of tracking and mapping have been significantly improved throughout its development, there are two bottlenecks that impede the further evolution of RGB-D SLAM. The first one is that many RGB-D SLAM systems only use low-level geometric features for tracking and mapping, which is likely to cause system failures in challenging scenarios and also limits its ability for higher-level tasks. And for the second bottleneck, all the RGB-D SLAM systems require a large memory footprint to store the reconstructed map, especially when the map is densely reconstructed, and lack the ability to hallucinate about the unobserved part of the scene hence, thus restraining the systems from mapping larger-scale environments and also resulting in reconstructions with holes.
Focusing on these two bottlenecks, this thesis seeks to enhance the RGB-D SLAM using recent deep learning techniques. In particular, three independent investigations are carried out in this thesis. The first two investigations address the first bottleneck and utilise high-level semantic features to improve the relocalisation and indoor place recognition performance within the RGB-D SLAM system in challenging scenarios. Especially for relocalisation, this thesis takes advantage of the view-independent property of objects for solving the wide-disparity relocalisation, achieving an average relocalisation success rate of 82.04% on 10 challenging desktop scenes while the conventional feature-based method Bag-of-Words (BoW) only has 28.30% or the appearance-based method randomised Ferns (FERNS) has 43.17%.
Then in the place recognition task, pure appearance-based or geometry-based methods often find themselves struggling to distinguish indoor places with high similarities in terms of structures and appearances. This thesis exploits the semantic information and improves indoor place recognition by combining implicit semantic features with appearance and geometric features. The developed indoor place recognition network achieves a top-3 average recall rate of 75.06% compared with 41.49% for the closest rival method.
Finally, the third investigation engages in the second bottleneck and employs a single multi-layer perceptron (MLP) as the only map representation in the RGB-D SLAM. Accordingly, an end-to-end RGB-D SLAM with deep feature tracking and neural implicit mapping is presented to use the scene-specific features for camera pose estimation and learn the scene geometry online. In terms of tracking, the absolute trajectory error (ATE) is evaluated and the developed SLAM system achieves an average root mean square error (RMSE) of 0.025m over 8 sequences in the Replica dataset compared to 0.183m for implicit mapping and positioning (iMAP) and 0.020m for neural implicit scalable encoding for SLAM (NICE-SLAM). When it comes to reconstruction, an average completion ratio of 86.34% is achieved while iMAP and NICE-SLAM have 79.06% and 82.41% respectively.
Date of Award24 Jan 2023
Original languageEnglish
Awarding Institution
  • The University of Bristol
SupervisorAndrew Calway (Supervisor)

Cite this