Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment

2Citations
Citations of this article
61Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Goal: Smartphones can be used to passively assess and monitor patients' speech impairments caused by ailments such as Parkinson's disease, Traumatic Brain Injury (TBI), Post-Traumatic Stress Disorder (PTSD) and neurodegenerative diseases such as Alzheimer's disease and dementia. However, passive audio recordings in natural settings often capture the speech of non-target speakers (cross-talk). Consequently, speaker separation, which identifies the target speakers' speech in audio recordings with two or more speakers' voices, is a crucial pre-processing step in such scenarios. Prior speech separation methods analyzed raw audio. However, in order to preserve speaker privacy, passively recorded smartphone audio and machine learning-based speech assessment are often performed on derived speech features such as Mel-Frequency Cepstral Coefficients (MFCCs). In this paper, we propose a novel Deep MFCC bAsed SpeaKer Separation (Deep-MASKS). Methods: Deep-MASKS uses an autoencoder to reconstruct MFCC components of an individual's speech from an i-vector, x-vector or d-vector representation of their speech learned during the enrollment period. Deep-MASKS utilizes a Deep Neural Network (DNN) for MFCC signal reconstructions, which yields a more accurate, higher-order function compared to prior work that utilized a mask. Unlike prior work that operates on utterances, Deep-MASKS operates on continuous audio recordings. Results: Deep-MASKS outperforms baselines, reducing the Mean Squared Error (MSE) of MFCC reconstruction by up to 44% and the number of additional bits required to represent clean speech entropy by 36%.

References Powered by Scopus

Elements of Information Theory

36815Citations
N/AReaders
Get full text

Front-end factor analysis for speaker verification

3499Citations
N/AReaders
Get full text

Divergence Measures Based on the Shannon Entropy

3494Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Continuous TBI Monitoring From Spontaneous Speech Using Parametrized Sinc Filters and a Cascading GRU

7Citations
N/AReaders
Get full text

Representation Learning for Audio Privacy Preservation Using Source Separation and Robust Adversarial Learning

3Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Ditthapron, A., Agu, E. O., & Lammert, A. C. (2021). Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment. IEEE Open Journal of Engineering in Medicine and Biology, 2, 304–313. https://doi.org/10.1109/OJEMB.2021.3063994

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 18

78%

Researcher 5

22%

Readers' Discipline

Tooltip

Computer Science 15

65%

Engineering 3

13%

Medicine and Dentistry 3

13%

Nursing and Health Professions 2

9%

Save time finding and organizing research with Mendeley

Sign up for free