Back to Main Conference 2018
LREC 2018main

BULBasaa: A Bilingual Basaa-French Speech Corpus for the Evaluation of Language Documentation Tools

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/29r9yggbjv5c

Abstract

Basaa is one of the three Bantu languages of BULB (Breaking the Unwritten Language Barrier), a project whose aim is to provide NLP-based tools to support linguists in documenting under-resourced and unwritten languages. To develop technologies such as automatic phone transcription or machine translation, a massive amount of speech data is needed. Approximately 50 hours of Basaa speech were thus collected and then carefully re-spoken and orally translated into French in a controlled environment by a few bilingual speakers. For a subset of approx. 10 hours of the corpus, each utterance was additionally phonetically transcribed to establish a golden standard for the output of our NLP tools. The experiments described in this paper are meant to provide an automatic phonetic transcription using a set of derived phone-like units. As every language features a specific set of idiosyncrasies, automating the process of phonetic unit discovery in its entirety is a challenging task. Within BULB, we envision a workflow where linguists are able to refine the set of automatically discovered units and the system is then able to re-iterate on the data, providing a better approximation of the actual phone set.

Details

Paper ID
lrec2018-main-533
Pages
N/A
BibKey
hamlaoui-etal-2018-bulbasaa
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • FH

    Fatima Hamlaoui

  • EM

    Emmanuel-Moselly Makasso

  • MM

    Markus Müller

  • JE

    Jonas Engelmann

  • GA

    Gilles Adda

  • AW

    Alex Waibel

  • SS

    Sebastian Stüker

Links