Emotional speech acoustic model for Malay: Iterative versus isolated unit training

  • Mustafa M
  • Ainon R
12Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech.

References Powered by Scopus

Get full text
1007Citations
393Readers
Get full text

Speech parameter generation algorithms for HMM-based speech synthesis

866Citations
155Readers
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Mustafa, M. B., & Ainon, R. N. (2013). Emotional speech acoustic model for Malay: Iterative versus isolated unit training. The Journal of the Acoustical Society of America, 134(4), 3057–3066. https://doi.org/10.1121/1.4818741

Readers over time

‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘2402468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 9

69%

Researcher 2

15%

Professor / Associate Prof. 1

8%

Lecturer / Post doc 1

8%

Readers' Discipline

Tooltip

Computer Science 3

30%

Engineering 3

30%

Nursing and Health Professions 2

20%

Medicine and Dentistry 2

20%

Save time finding and organizing research with Mendeley

Sign up for free
0