Parallel Speech Corpora of Japanese Dialects

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

Clean speech data is necessary for spoken language processing, however, there is no public Japanese dialect corpus collected for speech processing. Parallel speech corpora of dialect are also important because real dialect affects each other, however, the existing data only includes noisy speech data of dialects and their translation in common language. In this paper, we collected parallel speech corpora of Japanese dialect, 100 read speeches utterance of 25 dialect speakers and their transcriptions of phoneme. We recorded speeches of 5 common language speakers and 20 dialect speakers from 4 areas, 5 speakers from 1 area, respectively. Each dialect speaker converted the same common language texts to their dialect and read them. Speeches are recorded with closed-talk microphone, using for spoken language processing (recognition, synthesis, pronounce estimation). In the experiments, accuracies of automatic speech recognition (ASR) and KanaÀ�Kanji conversion (KKC) system are improved by adapting the system with the data.

Resources

Details

Paper ID

lrec2016-main-737

Pages

pp. 4652-4657

DOI

10.63317/5gtnededkf3o

BibKey

yoshino-etal-2016-parallel

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

KY
Koichiro Yoshino
NH
Naoki Hirayama
SM
Shinsuke Mori
FT
Fumihiko Takahashi
KI
Katsutoshi Itoyama
HO
Hiroshi G. Okuno

Links

URL

DOI