Emotional speech acoustic model for Malay: Iterative versus isolated unit training

Mumtaz Begum Mustafa; Raja Noor Ainon

Journal ArticleOPEN ACCESS

Emotional speech acoustic model for Malay: Iterative versus isolated unit training

Mustafa M
Ainon R

The Journal of the Acoustical Society of America (2013) 134(4) 3057-3066

DOI: 10.1121/1.4818741

12Citations

22Readers

Get full text

Abstract

The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Mustafa, M. B., & Ainon, R. N. (2013). Emotional speech acoustic model for Malay: Iterative versus isolated unit training. The Journal of the Acoustical Society of America, 134(4), 3057–3066. https://doi.org/10.1121/1.4818741

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 9

69%

Researcher 2

15%

Professor / Associate Prof. 1

Lecturer / Post doc 1

Readers' Discipline

Computer Science 3

30%

Engineering 3

30%

Nursing and Health Professions 2

20%

Medicine and Dentistry 2

20%

Emotional speech acoustic model for Malay: Iterative versus isolated unit training

Abstract

References Powered by Scopus

Vocal communication of emotion: A review of research paradigms

Statistical parametric speech synthesis

Speech parameter generation algorithms for HMM-based speech synthesis

Cited by Powered by Scopus

New approach in quantification of emotional intensity from the speech signal: Emotional temperature

Continuous tracking of the emotion temperature

Code-Switching in Automatic Speech Recognition: The Issues and Future Directions

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline