Vision perceptually restores auditory spectral dynamics in speech

18Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.

Abstract

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

References Powered by Scopus

Snakes: Active contour models

13640Citations
N/AReaders
Get full text

Hearing lips and seeing voices

4586Citations
N/AReaders
Get full text

Humans integrate visual and haptic information in a statistically optimal fashion

3539Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Neurophysiological indices of audiovisual speech processing reveal a hierarchy of multisensory integration effects

26Citations
N/AReaders
Get full text

Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker

13Citations
N/AReaders
Get full text

Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age

8Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Plass, J., Brang, D., Suzuki, S., & Grabowecky, M. (2020). Vision perceptually restores auditory spectral dynamics in speech. Proceedings of the National Academy of Sciences of the United States of America, 117(29), 16920–16927. https://doi.org/10.1073/pnas.2002887117

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 20

59%

Researcher 11

32%

Professor / Associate Prof. 3

9%

Readers' Discipline

Tooltip

Psychology 13

41%

Neuroscience 12

38%

Linguistics 4

13%

Computer Science 3

9%

Save time finding and organizing research with Mendeley

Sign up for free