Back to Main Conference 2000
LREC 2000main

Issues from Corpus Analysis that have influenced the On-going Development of Various Haitian Creole Text- and Speech-based NLP Systems and Applications

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/4vg9jipcme89

Abstract

This paper describes issues that are relevant to using small- to large-sized corpora for the training and testing of various text- and speech-based natural language processing (NLP) systems for minority and vernacular languages. These R&D and commercial systems and applications include machine translation, orthography conversion, optical character recognition, speech recognition, and speech synthesis that have already been produced for the Haitian Creole (HC) language. Few corpora for minority and vernacular languages have been created specifically for language resource distribution and for NLP system training. As a result, some of the only available corpora are those that are produced within real end-user environments. It is therefore of utmost importance that written language standards be created and then observed so that research on various text- and speech-based systems can be fruitful. In doing so, this also provides vernacular and minority languages with the opportunity to have an impact within the globalization and advanced communication needs efforts of the modern day world. Such technologies can significantly influence the status of these languages, yet the lack of standardization is a severe impediment to technological development. A number of relevant issues are discussed in this paper.

Details

Paper ID
lrec2000-main-255
Pages
N/A
BibKey
mason-2000-issues
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • MM

    Marilyn Mason

Links