TY - JOUR
T1 - Rescaling Egocentric Vision
T2 - Collection Pipeline and Challenges for EPIC-KITCHENS-100
AU - Damen, Dima
AU - Doughty, Hazel R
AU - Farinella, Giovanni Maria
AU - Furnari, Antonino
AU - Kazakos, Vangelis
AU - Ma, Jian
AU - Moltisanti, Davide
AU - Munro, Jonathan P N
AU - Perrett, Toby J
AU - Price, Will
AU - Wray, Michael
N1 - Funding Information:
Research at Bristol is supported by Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1). Research at Catania is sponsored by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, by MISE - PON I&C 2014-2020, ENIGMA project (CUP: B61B19000520008) and by MIUR AIM - Attrazione e Mobilita Internazionale Linea 1 - AIM1893589 - CUP E64118002540007. We thank David Fouhey and Dandan Shan from University of Michigan for providing the ego-trained hand-object detection model prior to its public release. We also thank Sanja Fidler from University of Toronto for contributing to the first edition of EPIC-KITCHENS. We appreciate the efforts of all voluntary participants to collect and narrate this dataset.
Funding Information:
Research at Bristol is supported by Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1). Research at Catania is sponsored by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, by MISE - PON I&C 2014-2020, ENIGMA project (CUP: B61B19000520008) and by MIUR AIM - Attrazione e Mobilita Internazionale Linea 1 - AIM1893589 - CUP E64118002540007. We thank David Fouhey and Dandan Shan from University of Michigan for providing the ego-trained hand-object detection model prior to its public release. We also thank Sanja Fidler from University of Toronto for contributing to the first edition of EPIC-KITCHENS. We appreciate the efforts of all voluntary participants to collect and narrate this dataset.
Publisher Copyright:
© 2021, The Author(s).
PY - 2022/1
Y1 - 2022/1
N2 - This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments).This collection enables new challenges such as action detection and evaluating the "test of time'' - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later.The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics
AB - This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments).This collection enables new challenges such as action detection and evaluating the "test of time'' - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later.The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics
KW - Video Dataset
KW - Egocentric Vision
KW - Action Understanding
KW - computer vision
U2 - 10.1007/s11263-021-01531-2
DO - 10.1007/s11263-021-01531-2
M3 - Article (Academic Journal)
VL - 130
SP - 33
EP - 55
JO - International Journal of Computer Vision (IJCV)
JF - International Journal of Computer Vision (IJCV)
IS - 1
ER -