Developmental shifts in detection and attention for auditory, visual, and audiovisual speech

Susan Jerger*, Markus F. Damian, Cassandra Karl, Hervé Abdi

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

5 Citations (Scopus)
282 Downloads (Pure)


Purpose: Successful speech processing depends on our ability to detect and integrate multisensory cues, yet there is minimal research on multisensory speech detection and integration by children. To address this need, we studied the development of speech detection for auditory (A), visual (V), and audiovisual (AV) input. Method: Participants were 115 typically developing children clustered into age groups between 4 and 14 years. Speech detection (quantified by response times [RTs]) was determined for 1 stimulus, /buh/, presented in A, V, and AV modes (articulating vs. static facial conditions). Performance was analyzed not only in terms of traditional mean RTs but also in terms of the faster versus slower RTs (defined by the 1st vs. 3rd quartiles of RT distributions). These time regions were conceptualized respectively as reflecting optimal detection with efficient focused attention versus less optimal detection with inefficient focused attention due to attentional lapses. Results: Mean RTs indicated better detection (a) of multisensory AV speech than A speech only in 4-to 5-year-olds and (b) of A and AV inputs than V input in all age groups. The faster RTs revealed that AV input did not improve detection in any group. The slower RTs indicated that (a) the processing of silent V input was significantly faster for the articulating than static face and (b) AV speech or facial input significantly minimized attentional lapses in all groups except 6-to 7-year-olds (a peaked U-shaped curve). Apparently, the AV benefit observed for mean performance in 4-to 5-year-olds arose from effects of attention. Conclusions: The faster RTs indicated that AV input did not enhance detection in any group, but the slower RTs indicated that AV speech and dynamic V speech (mouthing) significantly minimized attentional lapses and thus did influence performance. Overall, A and AV inputs were detected consistently faster than V input; this result endorsed stimulus-bound auditory processing by these children.

Original languageEnglish
Pages (from-to)3095-3112
Number of pages18
JournalJournal of Speech, Language, and Hearing Research
Issue number12
Early online date10 Dec 2018
Publication statusPublished - Dec 2018

Structured keywords

  • Language
  • Cognitive Science


Dive into the research topics of 'Developmental shifts in detection and attention for auditory, visual, and audiovisual speech'. Together they form a unique fingerprint.

Cite this