Multiresolution Decomposition Analysis via Wavelet Transforms for Audio Deepfake Detection

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Voice and face recognition are becoming omnipresent, and the need for secure biometric technologies increases as technologies like deepfake are making it increasingly harder to spot fake generated content. To improve current audio spoofing detection, we propose a curated selection of wavelet transforms based-models where, instead of the widely employed acoustic features, the Mel-spectrogram image features are decomposed through multiresolution decomposition analysis to better handle spectral information. For that, we adopt the use of median-filtering harmonic percussive source separation (HPSS), and perform a large-scale study on the application of several recent state-of-the-art computer vision models on audio anti-spoofing detection. These wavelet transforms are experimentally found to be very useful and lead to a notable performance of 4.8% EER on the ASVspoof2019 challenge logical access (LA) evaluation set. Finally, a more adversarialy robust WaveletCNN-based model is proposed.

Cite

CITATION STYLE

APA

Fathan, A., Alam, J., & Kang, W. (2022). Multiresolution Decomposition Analysis via Wavelet Transforms for Audio Deepfake Detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13721 LNAI, pp. 188–200). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20980-2_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free