The speaker activity at the Canary Islands Parliament is recorded, and later manually annotated. This task can be modelled as a diarization problem, that is a way to automatically annotated who and when is speaking. In this paper, we propose the use of the visual cue to solve the diarization task. To perform this approach, it is mandatory to detect individuals, determine the one speaking, and extract features for matching. In order to test the performance of our proposal, we evaluate four different strategies based on the visual shot features.
CITATION STYLE
Marín-Reyes, P. A., Lorenzo-Navarro, J., Castrillón-Santana, M., & Sánchez-Nielsen, E. (2018). Who is Really Talking? A Visual-Based Speaker Diarization Strategy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10672 LNCS, pp. 322–329). Springer Verlag. https://doi.org/10.1007/978-3-319-74727-9_38
Mendeley helps you to discover research relevant for your work.