Ego-Exo4D: Understanding Skilled Human Activity from First and Third-Person Perspectives

Kristen Grauman*, Siddhant Bansal, Zhifan Zhu, Dima Damen, Michael Wray

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

50 Downloads (Pure)

Abstract

We present Ego-Exo4D, a diverse, large-scale multi- modal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured ego- centric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form cap- tures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is un- precedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions—including a novel “expert commentary” done by coaches and teach- ers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity un- derstanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community.
Original languageEnglish
Title of host publicationProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages19383-19400
Number of pages18
ISBN (Electronic)9798350353006
ISBN (Print)9798350353013
DOIs
Publication statusPublished - 16 Sept 2024
EventIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): CVPR - Seattle, United States
Duration: 17 Jun 202421 Jun 2024
https://cvpr.thecvf.com

Publication series

NameConference on Computer Vision and Pattern Recognition (CVPR)
PublisherIEEE
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

ConferenceIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Country/TerritoryUnited States
CitySeattle
Period17/06/2421/06/24
Internet address

Fingerprint

Dive into the research topics of 'Ego-Exo4D: Understanding Skilled Human Activity from First and Third-Person Perspectives'. Together they form a unique fingerprint.

Cite this