LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Providing Internet Access to Portuguese Corpora: the AC/DC Project
Authors Santos Diana (SINTEF Telecom and Informatics, Postboks 1024 Blindern, N-0314 Oslo, Norway, Diana.Santos@informatics.sintef.no)
Bick Eckhard (SINTEF Telecom and Informatics, Postboks 1024 Blindern, N-0314 Oslo, Norway, lineb@hum.au.dk)
Keywords Constraint Grammar, Corpora, Language Resource Creation, Parsing, Web Interfaces
Session Session WO5 - Corpus Tools
Full Paper 85.ps, 85.pdf
Abstract In this paper we report on the activity of the project Computational Processing of Portuguese (Processamento computacional do portugues) in what concerns providing access to Portuguese corpora through the Internet. One of its activities, the AC/DC project (Acesso a corpora/Disponibilizacao de Corpora, roughly ''Access and Availability of Corpora'') allows a user to query around 40 million words of Portuguese text. After describing the aims of the service, which is still being subject to regular improvements, we focus on the process of tagging and parsing the underlying corpora, using a Constraint Grammar parser for Portuguese.