Back to Main Conference 2008
LREC 2008main

Bootstrapping Language Description: the case of Mpiemo (Bantu A, Central African Republic)

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/4asfnkisyow4

Abstract

Linguists have long been producing grammatical decriptions of yet undescribed languages. This is a time-consuming process, which has already adapted to improved technology for recording and storage. We present here a novel application of NLP techniques to bootstrap analysis of collected data and speed-up manual selection work. To be more precise, we argue that unsupervised induction of morphology and part-of-speech analysis from raw text data is mature enough to produce useful results. Experiments with Latent Semantic Analysis were less fruitful. We exemplify this on Mpiemo, a so-far essentially undescribed Bantu language of the Central African Republic, for which raw text data was available.

Details

Paper ID
lrec2008-main-229
Pages
N/A
BibKey
hammarstrom-etal-2008-bootstrapping
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • HH

    Harald Hammarström

  • CT

    Christina Thornell

  • MP

    Malin Petzell

  • TW

    Torbjörn Westerlund

Links