Rescaling Egocentric Vision: Collection Pipeline and Challenges for EPIC-KITCHENS-100

Dima Damen*, Hazel R Doughty, Giovanni Maria Farinella, Antonino Furnari, Vangelis Kazakos, Jian Ma, Davide Moltisanti, Jonathan P N Munro, Toby J Perrett, Will Price, Michael Wray

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

200 Citations (Scopus)
232 Downloads (Pure)

Abstract

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras.
Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments).
This collection enables new challenges such as action detection and evaluating the "test of time'' - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later.

The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics
Original languageEnglish
Pages (from-to)33–55
Number of pages23
JournalInternational Journal of Computer Vision (IJCV)
Volume130
Issue number1
Early online date20 Oct 2021
DOIs
Publication statusPublished - Jan 2022

Bibliographical note

Funding Information:
Research at Bristol is supported by Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1). Research at Catania is sponsored by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, by MISE - PON I&C 2014-2020, ENIGMA project (CUP: B61B19000520008) and by MIUR AIM - Attrazione e Mobilita Internazionale Linea 1 - AIM1893589 - CUP E64118002540007. We thank David Fouhey and Dandan Shan from University of Michigan for providing the ego-trained hand-object detection model prior to its public release. We also thank Sanja Fidler from University of Toronto for contributing to the first edition of EPIC-KITCHENS. We appreciate the efforts of all voluntary participants to collect and narrate this dataset.

Funding Information:
Research at Bristol is supported by Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1). Research at Catania is sponsored by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, by MISE - PON I&C 2014-2020, ENIGMA project (CUP: B61B19000520008) and by MIUR AIM - Attrazione e Mobilita Internazionale Linea 1 - AIM1893589 - CUP E64118002540007. We thank David Fouhey and Dandan Shan from University of Michigan for providing the ego-trained hand-object detection model prior to its public release. We also thank Sanja Fidler from University of Toronto for contributing to the first edition of EPIC-KITCHENS. We appreciate the efforts of all voluntary participants to collect and narrate this dataset.

Publisher Copyright:
© 2021, The Author(s).

Keywords

  • Video Dataset
  • Egocentric Vision
  • Action Understanding
  • computer vision

Fingerprint

Dive into the research topics of 'Rescaling Egocentric Vision: Collection Pipeline and Challenges for EPIC-KITCHENS-100'. Together they form a unique fingerprint.

Cite this