Abstract
We show that learning video feature spaces in which temporal cycles are maximally predictable benefits action classification. In particular, we propose a novel learning approach, Cycle Encoding Prediction (CEP), that is able to effectively represent high-level spatio-temporal structure of unlabelled video content. CEP builds a latent space wherein the concept of closed forward-backward, as well as backward-forward, temporal loops is approximately preserved. As a self-supervision signal, CEP leverages the bi-directional
temporal coherence of entire video snippets and applies loss functions that encourage both temporal cycle closure and contrastive feature separation. Architecturally, the underpinning network architecture utilises a single feature encoder for all input videos, adding two predictive modules that learn temporal forward and backward transitions. We apply our framework for pretext training of networks for action recognition and report significantly improved results for the standard datasets UCF101 and HMDB51.
temporal coherence of entire video snippets and applies loss functions that encourage both temporal cycle closure and contrastive feature separation. Architecturally, the underpinning network architecture utilises a single feature encoder for all input videos, adding two predictive modules that learn temporal forward and backward transitions. We apply our framework for pretext training of networks for action recognition and report significantly improved results for the standard datasets UCF101 and HMDB51.
Original language | English |
---|---|
Number of pages | 15 |
Publication status | Unpublished - 25 Nov 2021 |
Event | The 32nd British Machine Vision Conference - Online Duration: 22 Nov 2021 → 25 Nov 2021 Conference number: 32 https://www.bmvc2021-virtualconference.com/ https://www.bmvc2021.com/ |
Conference
Conference | The 32nd British Machine Vision Conference |
---|---|
Abbreviated title | BMVC 2021 |
Period | 22/11/21 → 25/11/21 |
Internet address |
Fingerprint
Dive into the research topics of 'Back to the Future: Cycle Encoding Prediction for Self-supervised Video Representation Learning'. Together they form a unique fingerprint.Student theses
-
Guided deep learning applied to animal recognition in video
Yang, X. (Author), Mirmehdi, M. (Supervisor) & Burghardt, T. (Supervisor), 20 Jun 2023Student thesis: Doctoral Thesis › Doctor of Philosophy (PhD)
File