Guided deep learning applied to animal recognition in video

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)


This thesis addresses guided deep learning, a new perspective for dealing with the real-world challenges of deep learning applications such as animal analysis. The proposed methods seek guidance from various sources in data such as spatiotemporal context, visual coherence, temporal correspondence, and dynamic learning policies to facilitate learning. The results are demonstrated on various computer vision tasks including animal detection, animal behaviour understanding, action recognition, video representation learning, and generic object detection.

First, this thesis explores a context-guided detection model. It leverages spatiotemporal context with the attention mechanism, building a spatio-local correlation and a long-range temporal dependency to boost animal detection in the camera trap. The proposed method outperforms state-of-the-art frame-based detection baselines on the PanAfrica Dataset, a dataset of Great Apes filmed by camera traps in their natural habitats. Extensive experimental results show the effectiveness of the proposed context guidance in scenarios where motion blur, major occlusion, and camouflage effects etc. occur whereas the state-of-the-art detectors fail.

This thesis also offers insight into how to leverage visual coherence and temporal correspondence in a video to guide visual representation learning without labels. The proposed self-guided temporal learning model performs forward-backward cycle predictions in time and then guides the predicted cycles to be coherent (smoothly varying) and correspondent (circularly consistent). It shows competitive experimental results for action recognition on UCF101 and HMDB51. Also, self-guided learning demonstrates the potential in animal analysis where a large amount of unlabelled animal videos can be exploited in a meaningful way.

Inspired by the success of semi-supervised learning using large quantities of unlabelled data in other research fields, this thesis final argues that semi-supervised approaches can benefit animal analysis to address the lack of data labels in animal datasets. However, the current pseudo-label-based semi-supervised detection models that simultaneously utilise labelled and unlabelled data suffer from potential learning collapse. Herein, a policy-guided semi-supervised model is proposed. The proposed curriculum learning policies can steer learning towards a self-reinforced virtuous learning cycle that benefits the learning process and improves model performance. Experiments on PanAfrica and Bee datasets demonstrate the superior performance of policy guidance compared with state-of-the-art semi-supervised detection. It also achieves state-of-the-art performance on MS-COCO and PASCAL VOC 2007 under a low-data regime.

This thesis concludes that the proposed guided deep learning approaches benefit animal analysis in tackling the real-world challenges.
Date of Award20 Jun 2023
Original languageEnglish
Awarding Institution
  • University of Bristol
SupervisorMajid Mirmehdi (Supervisor) & Tilo Burghardt (Supervisor)


  • Animal Welfare
  • Animal Recognition
  • Video Understanding
  • Object Detection
  • Self-supervised Learning
  • Semi-supervised Learning
  • Action Recognition

Cite this