Demo page for "Multi-speaker Emotional Text-to-speech Synthesizer"

Brief information

Brief introduction to demo

In the following demonstration, audios synthesized by our multi-speaker emotional text-to-speech synthesizer are demonstrated. Our synthesizer can synthesize speech for 7 emotions (neutral, angry, disgust, fear, happiness, sadness, and surprise) across 10 speakers (5 females, 5 males).

We, humans, recognize emotion from speech, depending on what textual content is contained. Thus, for each emotion, we present 3 sentences containing content of a particular emotion. For neutral emotion, we synthesized 3 sentences for all possible emotion-speaker pairs. For another emotion, we synthesized 3 sentences for neutral and that textual emotion across all speakers.

In the first demonstration, synthesized utterances of neutral sentences are presented. What we want you to pay attention to is how varying given audios are across emotions and speakers. We recommend you to listen to a neutral audio, and then listen to an emotional audio. Next, check out what varies across speakers. Note that the blue boxes where audios were synthesized even without training supervision. Those expressions were transferred from other speakers' expressions.

You can move onto another page to listen to audios for another emotional sentence, through the following hyperlinks.

[Neutral sentences (home)] [Angry sentences] [Disgusted sentences] [Fearful sentences] [Happy sentences] [Sad sentences] [Surprised sentences]

[One page view]

Neutral sentences

Neutral sentence 1

Sentence 이 음성합성기는 열 명의 화자와 일곱 개의 감정을 합성할 수 있습니다.
Pronouncing i eumseonghabseong-gineun yeol myeong-ui hwajawa ilgob gaeui gamjeong-eul habseonghal su issseubnida.
Meaning This speech synthesizer can synthesize for ten speakers and seven emotions.
Speaker Emotion
Neutral Anger Disgust Fear Happiness Sadness Surprise
ketts-30f
ketts-30m
ketts2-20m
ketts2-30f
ketts2-40m
ketts2-50f
ketts2-50m
ketts2-60f
ketts3-f
ketts3-m

Neutral sentence 2

Sentence 카이스트는 대한민국의 이공계 연구중심대학이다.
Pronouncing kaiseuteuneun daehanmingug-ui igong-gye yeongujungsimdaehag-ida.
Meaning KAIST is a research-oriented science and engineering university in South Korea.
Speaker Emotion
Neutral Anger Disgust Fear Happiness Sadness Surprise
ketts-30f
ketts-30m
ketts2-20m
ketts2-30f
ketts2-40m
ketts2-50f
ketts2-50m
ketts2-60f
ketts3-f
ketts3-m

Neutral sentence 3

Sentence 이 학회는 음성처리 분야에서 저명하다.
Pronouncing i haghoeneun eumseongcheoli bun-ya-eseo jeomyeonghada.
Meaning This conference is prominent in the field of speech processing.
Speaker Emotion
Neutral Anger Disgust Fear Happiness Sadness Surprise
ketts-30f
ketts-30m
ketts2-20m
ketts2-30f
ketts2-40m
ketts2-50f
ketts2-50m
ketts2-60f
ketts3-f
ketts3-m

Acknowledgement

This work was supported by Ministry of Culture, Sports and Tourism and Korea Creative Content Agency [R2019020013, R2020040298].