Vision perceptually restores auditory spectral dynamics in speech

John Plass; David Brang; Satoru Suzuki; Marcia Grabowecky

Journal ArticleOPEN ACCESS

Vision perceptually restores auditory spectral dynamics in speech

Proceedings of the National Academy of Sciences of the United States of America (2020) 117(29) 16920-16927

DOI: 10.1073/pnas.2002887117

18Citations

56Readers

Abstract

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Plass, J., Brang, D., Suzuki, S., & Grabowecky, M. (2020). Vision perceptually restores auditory spectral dynamics in speech. Proceedings of the National Academy of Sciences of the United States of America, 117(29), 16920–16927. https://doi.org/10.1073/pnas.2002887117

Readers' Seniority

PhD / Post grad / Masters / Doc 20

59%

Researcher 11

32%

Professor / Associate Prof. 3

Readers' Discipline

Psychology 13

41%

Neuroscience 12

38%

Linguistics 4

13%

Computer Science 3

Vision perceptually restores auditory spectral dynamics in speech

Abstract

Author supplied keywords

References Powered by Scopus

Snakes: Active contour models

Hearing lips and seeing voices

Humans integrate visual and haptic information in a statistically optimal fashion

Cited by Powered by Scopus

Neurophysiological indices of audiovisual speech processing reveal a hierarchy of multisensory integration effects

Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker

Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline