Back to Main Conference 2016
LREC 2016main

LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/3xtxb975en3d

Abstract

In this paper, we describe the textual linguistic resources in nearly 3 dozen languages being produced by Linguistic Data Consortium for DARPA’s LORELEI (Low Resource Languages for Emergent Incidents) Program. The goal of LORELEI is to improve the performance of human language technologies for low-resource languages and enable rapid re-training of such technologies for new languages, with a focus on the use case of deployment of resources in sudden emergencies such as natural disasters. Representative languages have been selected to provide broad typological coverage for training, and surprise incident languages for testing will be selected over the course of the program. Our approach treats the full set of language packs as a coherent whole, maintaining LORELEI-wide specifications, tagsets, and guidelines, while allowing for adaptation to the specific needs created by each language. Each representative language corpus, therefore, both stands on its own as a resource for the specific language and forms part of a large multilingual resource for broader cross-language technology development.

Details

Paper ID
lrec2016-main-521
Pages
pp. 3273-3280
BibKey
strassel-tracey-2016-lorelei
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • SS

    Stephanie Strassel

  • JT

    Jennifer Tracey

Links