Back to Main Conference 2000
LREC 2000main

Portuguese Corpora at CLUL

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/25nk6adub5a4

Abstract

The Corpus de Referência do Português Contemporâneo (CRPC) is being developed in the Centro de Linguística da Universidade de Lisboa (CLUL) since 1988 under a perspective of research data enlargement, in the sense of concepts and hypothesis verification by rejecting the sole use of intuitive data. The intention of creating this open corpus is to establish an on-line representative sample collection of general usage contemporary Portuguese: a main corpus of great dimension as well as several specialized corpora. The CRPC has nowadays around 92 million words. Following the use in this area, the CRPC project intends to establish a linguistic database accessible to everyone interested in making theoretical and practical studies or applications. The Dialectal oral corpus of the Atlas Linguístico-Etnográfico de Portugal e da Galiza (ALEPG) is constituted by approximately 3500 hours of speech collected by the CLUL Dialectal Studies Research Group and recorded in analogic audio tape. This corpus contains mainly directed speech: answers to a linguistic questionnaire essentially lexical, but also focusing on some phonetic and morpho-phonological phenomena. An important part of spontaneous speech enables other kind of studies such as syntactic, morphological or phonetic ones.

Details

Paper ID
lrec2000-main-053
Pages
N/A
BibKey
do-nascimento-etal-2000-portuguese
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • Md

    Maria Fernanda Bacelar do Nascimento

  • LP

    Luisa Pereira

  • JS

    João Saramago

Links