Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

Amir Mohammad Rostami; Mohammad Mehdi Homayounpour; Ahmad Nickabadi

Journal ArticleOPEN ACCESS

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

Circuits, Systems, and Signal Processing (2023) 42(7) 4252-4270

DOI: 10.1007/s00034-023-02314-5

6Citations

20Readers

Abstract

Many endeavors have sought to develop countermeasure techniques as enhancements on Automatic Speaker Verification (ASV) systems, in order to make them more robust against spoof attacks. As evidenced by the latest ASVspoof 2019 countermeasure challenge, models currently deployed for the task of ASV are, at their best, devoid of suitable degrees of generalization to unseen attacks. A joint improvement of components of ASV spoof detection systems including the classifier, feature extraction phase, and model loss function may lead to a better detection of attacks by these systems. Accordingly, the present study proposes the Efficient Attention Branch Network (EABN) architecture with a combined loss function to address the model generalization to unseen attacks. The EABN is based on attention and perception branches. The attention branch provides an attention mask that improves the classification performance and at the same time is interpretable from a human point of view. The perception branch, is used for our main purpose which is spoof detection. The new EfficientNet-A0 architecture was optimized and employed for the perception branch, with nearly ten times fewer parameters and approximately seven times fewer floating-point operations than the SE-Res2Net50 as the best existing network. The proposed method on ASVspoof 2019 dataset achieved EER = 0.86% and t-DCF = 0.0239 in the Physical Access (PA) scenario using the logPowSpec as the input feature extraction method. Furthermore, using the LFCC feature, and the SE-Res2Net50 for the perception branch, the proposed model achieved EER = 1.89% and t-DCF = 0.507 in the Logical Access (LA) scenario, which to the best of our knowledge, is the best single system ASV spoofing countermeasure method.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Rostami, A. M., Homayounpour, M. M., & Nickabadi, A. (2023). Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection. Circuits, Systems, and Signal Processing, 42(7), 4252–4270. https://doi.org/10.1007/s00034-023-02314-5

Readers' Seniority

PhD / Post grad / Masters / Doc 5

71%

Researcher 2

29%

Readers' Discipline

Computer Science 7

64%

Engineering 3

27%

Biochemistry, Genetics and Molecular Bi... 1

Article Metrics

Mentions

News Mentions: 1

View details >

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

Abstract

Author supplied keywords

References Powered by Scopus

Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models

Speaker recognition by machines and humans: A tutorial review

Attention branch network: Learning of attention mechanism for visual explanation

Cited by Powered by Scopus

Lightweight Voice Spoofing Detection Using Improved One-Class Learning and Knowledge Distillation

Optimized deep network based spoof detection in automatic speaker verification system

Speech Spoofing Detection Based on Graph Attention Networks with Spectral and Temporal Information

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics