Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Dima Damen, Hazel Doughty, Giovanni Farinella, Sanja Fidler, Antonio Furnari, Vangelis Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Michael Wray

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

92 Citations (Scopus)
116 Downloads (Pure)

Abstract

First-person vision is gaining interest as it offers a unique viewpoint on people’s interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict non-scripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse cooking styles. Our dataset features 55h of video consisting of 11.5M frames, which we densely labelled for a total of 39.6K action segments and 454.3K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens.
Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018
Subtitle of host publication15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV
PublisherSpringer, Cham
Pages753-771
Number of pages19
ISBN (Electronic)9783030012250
ISBN (Print)9783030012243
DOIs
Publication statusPublished - 6 Oct 2018
EventEuropean Conference on Computer Vision - Munich, Germany
Duration: 7 Sep 201814 Sep 2018

Publication series

NameLecture Notes in Computer Science
Volume11208
ISSN (Print)0302-9743

Conference

ConferenceEuropean Conference on Computer Vision
Abbreviated titleECCV
CountryGermany
CityMunich
Period7/09/1814/09/18

Keywords

  • Egocentric vision
  • Dataset
  • Benchmarks
  • First-person vision
  • Egocentric object detection
  • Action recognition and anticipation

Fingerprint Dive into the research topics of 'Scaling Egocentric Vision: The EPIC-KITCHENS Dataset'. Together they form a unique fingerprint.

  • Projects

    Student Theses

    Verbs and Me: An Investigation Into Verbs as Labels for Action Recognition in Video Understanding

    Author: Wray, M., 23 Jan 2020

    Supervisor: Damen, D. (Supervisor)

    Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)

    File

    Cite this

    Damen, D., Doughty, H., Farinella, G., Fidler, S., Furnari, A., Kazakos, V., Moltisanti, D., Munro, J., Perrett, T., & Wray, M. (2018). Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. In Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV (pp. 753-771). (Lecture Notes in Computer Science; Vol. 11208). Springer, Cham. https://doi.org/10.1007/978-3-030-01225-0_44