Back to Main Conference 2002
LREC 2002main

Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/2mvcga28z5zq

Abstract

Changes in the supply of and demand for language resources continues to affect the role of large data centers such as the Linguistic Data Consortium (LDC) and European Language Resource Center (ELRA) within the research communities they serve. The past few years have seen increased demand for: intensively multi-modal resources, larger data sets in high-density languages and new data in low density languages; standards and tools for corpus development and re-useable resources. The next few years will bring demand for extensive batteries of coordinated language resources with sophisticated annotation in several major languages. The DARPA program in Translingual Information Detection Extraction and Summarization (TIDES) has already undertaken such resource development; programs with similarly broad scope addressing other technologies will surely follow. Data centers will be well placed to address these needs if they integrate new resource development with distribution of existing resources to fill known gaps by creating or assisting the creation of new data. LDC has projects ongoing to address all of these issues. This paper will provide an overview of LDC activity in corpus creation, annotation and distribution and describe new efforts bring together communities of researchers, to identify best practices and develop tools of general use.

Details

Paper ID
lrec2002-main-245
Pages
N/A
BibKey
cieri-liberman-2002-language
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • CC

    Christopher Cieri

  • ML

    Mark Liberman

Links