BenAV: a Bengali Audio-Visual Corpus for Visual Speech Recognition

Ashish Pondit; Muhammad Eshaque Ali Rukon; Anik Das; Muhammad Ashad Kabir

Conference Proceedings

BenAV: a Bengali Audio-Visual Corpus for Visual Speech Recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 13109 LNCS 526-535

DOI: 10.1007/978-3-030-92270-2_45

0Citations

4Readers

Get full text

Abstract

Visual speech recognition (VSR) is a very challenging task. It has many applications such as facilitating speech recognition when the acoustic data is noisy or missing, assisting hearing impaired people, etc. Modern VSR systems require a large amount of data to achieve a good performance. Popular VSR datasets are mostly available for the English language and none in Bengali. In this paper, we have introduced a large-scale Bengali audio-visual dataset, named “BenAV”. To the best of our knowledge, BenAV is the first publicly available large-scale dataset in the Bengali language. BenAV contains a lexicon of 50 words from 128 speakers with a total number of 26,300 utterances. We have also applied three existing deep learning based VSR models to provide a baseline performance of our BenAV dataset. We run extensive experiments in two different configurations of the dataset to study the robustness of those models and achieved 98.70% and 82.5% accuracy, respectively. We believe that this research provides a basis to develop Bengali lip reading systems and opens the doors to conduct further research on this topic.

Author supplied keywords

Cite

CITATION STYLE

APA

Pondit, A., Rukon, M. E. A., Das, A., & Kabir, M. A. (2021). BenAV: a Bengali Audio-Visual Corpus for Visual Speech Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13109 LNCS, pp. 526–535). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-92270-2_45

BenAV: a Bengali Audio-Visual Corpus for Visual Speech Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions