Bidirectional LSTM networks employing stacked bottleneck features for expressive speech-driven head motion synthesis

22Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Previous work in speech-driven head motion synthesis is centred around Hidden Markov Model (HMM) based methods and data that does not show a large variability of expressiveness in both speech and motion. When using expressive data, these systems often fail to produce satisfactory results. Recent studies have shown that using deep neural networks (DNNs) results in a better synthesis of head motion, in particular when employing bidirectional long short-term memory (BLSTM). We present a novel approach which makes use of DNNs with stacked bottleneck features combined with a BLSTM architecture to model context and expressive variability. Our proposed DNN architecture outperforms conventional feed-forward DNNs and simple BLSTM networks in an objective evaluation. Results from a subjective evaluation show a significant improvement of the bottleneck architecture over feed-forward DNNs.

Cite

CITATION STYLE

APA

Haag, K., & Shimodaira, H. (2016). Bidirectional LSTM networks employing stacked bottleneck features for expressive speech-driven head motion synthesis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10011 LNAI, pp. 198–207). Springer Verlag. https://doi.org/10.1007/978-3-319-47665-0_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free