HomeLREC 2020WorkshopsSLTUlrec2020-ws-sltu-39
Back to SLTU 2020
LREC 2020workshop

Basic Language Resources for 31 Languages (Plus English): The LORELEI Representative and Incident Language Packs

Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

DOI:10.63317/3i548ozbtio9

Abstract

This paper documents and describes the thirty-one basic language resource packs created for the DARPA LORELEI program for use in development and testing of systems capable of providing language-independent situational awareness in emerging scenarios in a low resource language context. Twenty-four Representative Language Packs cover a broad range of language families and typologies, providing large volumes of monolingual and parallel text, smaller volumes of entity and semantic annotations, and a variety of grammatical resources and tools designed to support research into language universals and cross-language transfer. Seven Incident Language Packs provide test data to evaluate system capabilities on a previously unseen low resource language. We discuss the makeup of Representative and Incident Language Packs, the methods used to produce them, and the evolution of their design and implementation over the course of the multi-year LORELEI program. We conclude with a summary of the final language packs including their low-cost publication in the LDC catalog.

Details

Paper ID
lrec2020-ws-sltu-39
Pages
pp. 277-284
BibKey
tracey-strassel-2020-basic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • JT

    Jennifer Tracey

  • SS

    Stephanie Strassel

Links