SLOW-FAST AUDITORY STREAMS FOR AUDIO RECOGNITION

Evangellos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

1 Downloads (Pure)

Abstract

We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve state- of-the-art results on both.
Original languageEnglish
Title of host publicationICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
ISBN (Electronic)978-1-7281-7605-5
ISBN (Print)978-1-7281-7606-2
Publication statusE-pub ahead of print - 13 May 2021

Publication series

Name
ISSN (Print)2379-190X

Keywords

  • training
  • visualization
  • time-frequency analysis
  • convolution
  • channel capacity
  • conferences
  • speech recognition
  • audio recognition
  • action recognition
  • fusion
  • multi-stream networks

Fingerprint

Dive into the research topics of 'SLOW-FAST AUDITORY STREAMS FOR AUDIO RECOGNITION'. Together they form a unique fingerprint.

Cite this