Epic-Sounds: A Large-Scale Dataset of Actions that Sound

Jaesung Huh*, Jacob Chalk*, Evangelos Kazakos, Dima Damen, Andrew Zisserman

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

28 Downloads (Pure)

Abstract

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos from EPIC-KITCHENS-100. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping free-form descriptions into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from visual labels, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes, as well as 39.2k non-categorised segments, totalling 117.6k segments spanning 100 hours of audio, capturing diverse actions that sound in home kitchens. We train and evaluate two state-of-the-art audio recognition models on our dataset, highlighting the importance of audio-only labels and the limitations of current models to recognise actions that sound.EPIC-SOUNDS and baseline source code is available from: https://epic-kitchens.github.io/epic-sounds.
Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
ISBN (Electronic)9781728163277
ISBN (Print)9781728163284
DOIs
Publication statusPublished - 5 May 2023
Event2023 IEEE International Conference on Acoustics, Speech, and Signal Processing - Rodos Palace Luxury Convention Resort, Rhodes, Greece
Duration: 4 Jun 202310 Jun 2023
https://2023.ieeeicassp.org/

Publication series

NameProceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2023 IEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2023
Country/TerritoryGreece
CityRhodes
Period4/06/2310/06/23
Internet address

Fingerprint

Dive into the research topics of 'Epic-Sounds: A Large-Scale Dataset of Actions that Sound'. Together they form a unique fingerprint.

Cite this