Back to Main Conference 2004
LREC 2004main

The Lácio-Web: Corpora and Tools to Advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/2xnvjy7pqcww

Abstract

In this paper we discuss the five requirements for building large publicly available corpora which geared the construction of the Lácio-Web corpora and their environments: 1) a comprehensive text typology; 2) text copyright clearance, compilation and annotation scheme; 3) a friendly and didactic interface; 4) the need to serve as support for several types of research; 5) the need to offer an array of associated tools. Also, we present the features that make Lácio-Web corpora interesting and novel as well as the limitations of this project, such as corpora size and balance, and the non-inclusion of spoken texts in the project’s reference corpus.

Details

Paper ID
lrec2004-main-238
Pages
N/A
BibKey
aluisio-etal-2004-lacio
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • SA

    Sandra Aluisio

  • GP

    Gisele Montilha Pinheiro

  • AM

    Aline M. P. Manfrin

  • Ld

    Leandro H. M. de Oliveira

  • LG

    Luiz C. Genoves, Jr.

  • ST

    Stella E. O. Tagnin

Links