Successes and critical failures of neural networks in capturing human-like speech recognition

Federico Gonzalez Adolfi*, Jeffrey S Bowers, David Poeppel

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

9 Citations (Scopus)

Abstract

Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition — an area ripe for such exploration — is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.
Original languageEnglish
Pages (from-to)199-211
Number of pages13
JournalNeural Networks
Volume162
Early online date24 Feb 2023
DOIs
Publication statusE-pub ahead of print - 24 Feb 2023

Bibliographical note

Funding Information:
We thank Oded Ghitza for clarifications on the original repackaging experiments and Franck Ramus for providing information on various replications and extensions. We thank 3 anonymous reviewers for constructive feedback that allowed us to improve a previous version of the manuscript. This project has received funding from the European Research Council (ERC) under the European498 Union's Horizon 2020 research and innovation programme (grant agreement No 741134), and the Ernst Strüngmann Foundation, Germany.

Funding Information:
We thank Oded Ghitza for clarifications on the original repackaging experiments and Franck Ramus for providing information on various replications and extensions. We thank 3 anonymous reviewers for constructive feedback that allowed us to improve a previous version of the manuscript. This project has received funding from the European Research Council (ERC) under the European498 Union’s Horizon 2020 research and innovation programme (grant agreement No 741134 ), and the Ernst Strüngmann Foundation, Germany .

Publisher Copyright:
© 2023 The Author(s)

Fingerprint

Dive into the research topics of 'Successes and critical failures of neural networks in capturing human-like speech recognition'. Together they form a unique fingerprint.

Cite this