Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

We present multi-speaker text-to-speech corpora for Javanese and Sundanese, the second and third largest languages of Indonesia spoken by well over a hundred million people. The key objectives were to collect high-quality data in an affordable way and to share the data publicly with the speech community. To achieve this, we collaborated with two local universities in Java and streamlined our recording and crowdsourcing processes to produce corpora consisting of 5,800 (Javanese) and 4,200 (Sundanese) mixed-gender recordings. We used these corpora to build several configurations of multi-speaker neural network-based text-to-speech systems for Javanese and Sundanese. Subjective evaluations performed on these configurations demonstrate that multilingual configurations for which Javanese and Sundanese are trained jointly with a larger corpus of Standard Indonesian significantly outperform the systems constructed from a single language. We hope that sharing these corpora publicly and presenting our multilingual approach to text-to-speech will help the community to scale up text-to-speech technologies to other lesser resourced languages of Indonesia.

Resources

Details

Paper ID

lrec2018-main-255

Pages

N/A

DOI

10.63317/4iw9raqzmnyn

BibKey

wibawa-etal-2018-building

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

JW
Jaka Aris Eko Wibawa
SS
Supheakmungkol Sarin
CL
Chenfang Li
KP
Knot Pipatsrisawat
KS
Keshan Sodimana
OK
Oddur Kjartansson
AG
Alexander Gutkin
MJ
Martin Jansche
LH
Linne Ha

Links

URL

DOI