Back to the Future: Cycle Encoding Prediction for Self-supervised Video Representation Learning

Xinyu Yang*, Majid Mirmehdi, Tilo Burghardt

*Corresponding author for this work

Research output: Contribution to conferenceConference Paperpeer-review

35 Downloads (Pure)


We show that learning video feature spaces in which temporal cycles are maximally predictable benefits action classification. In particular, we propose a novel learning approach, Cycle Encoding Prediction (CEP), that is able to effectively represent high-level spatio-temporal structure of unlabelled video content. CEP builds a latent space wherein the concept of closed forward-backward, as well as backward-forward, temporal loops is approximately preserved. As a self-supervision signal, CEP leverages the bi-directional
temporal coherence of entire video snippets and applies loss functions that encourage both temporal cycle closure and contrastive feature separation. Architecturally, the underpinning network architecture utilises a single feature encoder for all input videos, adding two predictive modules that learn temporal forward and backward transitions. We apply our framework for pretext training of networks for action recognition and report significantly improved results for the standard datasets UCF101 and HMDB51.
Original languageEnglish
Number of pages15
Publication statusUnpublished - 25 Nov 2021
EventThe 32nd British Machine Vision Conference - Online
Duration: 22 Nov 202125 Nov 2021
Conference number: 32


ConferenceThe 32nd British Machine Vision Conference
Abbreviated titleBMVC 2021
Internet address


Dive into the research topics of 'Back to the Future: Cycle Encoding Prediction for Self-supervised Video Representation Learning'. Together they form a unique fingerprint.

Cite this