Speaker Identification is a problem that consists in discovering the identity of an individual from the captured speech signal and it is still an open problem. Recent advances in deep learning in combination with spectrograms encouraged us to propose the use of entropygrams in combination with convolutional neural networks. We extract the entropygrams of specific words uttered by the individual whose identity needs to be found among those known to the system. An entropygram is an image that shows how the information contentained in the speech signal distributes along with frequency and how such distribution evolves in time. By extracting the entropygram from the speech signal we effectively transform the problem into an image recognition issue, and Convolutional Neural Networks (CNN) are known to be very useful for image recognition. In our experiments we used a collection of 21 young mexican speakers from both genders and confirmed our hypothesis that entropygrams can successfully be used instead of spectrograms for speaker identification using CNN. We also experimented with noisy speech and found that entropygrams outperform spectrograms as images that better represent speakers to be identified using CNN.
CITATION STYLE
Camarena-Ibarrola, A., Figueroa, K., & García, J. (2020). Speaker Identification Using Entropygrams and Convolutional Neural Networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12468 LNAI, pp. 23–34). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60884-2_2
Mendeley helps you to discover research relevant for your work.