This paper focuses on the time and phase-domain analysis of speech signals to extract breathing patterns. The speech signals under investigation fall into two categories: reading and spontaneous speaking. We introduce SBreathNet, a deep Long Short-Term Memory (LSTM) based regressive model, to extract breathing patterns from speech signals. SBreathNet is trained with speech collected from 100 individuals reading a phonetically balanced text and extracts the breathing patterns with an average Pearson correlation coefficient (r-value) of 0.61 with the true breathing signal captured using a respiratory belt. The average breaths-per-minute error (BPME) across 100 speakers is 2.50. The analysis is done using leave-one-speaker-out approach. Similarly, when SBreathNet is trained with spontaneous speech signals, it extracts the breathing patterns with an r-value of 0.41 and an average BPME of 3.9. By comparing the performance across speakers, speech categories, and speech-breathing categories, we aim to uncover the factors influencing SBreathNet’s effectiveness when applied to these two types of speech signals.
CITATION STYLE
Deshpande, G., Schuller, B. W., Deshpande, P., Joshi, A. R., Oza, S. K., & Patel, S. (2023). Analysing Breathing Patterns in Reading and Spontaneous Speech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14339 LNAI, pp. 3–17). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-48312-7_1
Mendeley helps you to discover research relevant for your work.