Back to Main Conference 2006
LREC 2006main

Open Resources and Tools for the Shallow Processing of Portuguese: The TagShare Project

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4rszhsnbuk4n

Abstract

This paper presents the TagShare project and the linguistic resources and tools for the shallow processing of Portuguese developed in its scope. These resources include a 1 million token corpus that has been accurately hand annotated with a variety of linguistic information, as well as several state of the art shallow processing tools capable of automatically producing that type of annotation. At present, the linguistic annotations in the corpus are sentence and paragraph boundaries, token boundaries, morphosyntactic POS categories, values of inflection features, lemmas and namedentities. Hence, the set of tools comprise a sentence chunker, a tokenizer, a POS tagger, nominal and verbal analyzers and lemmatizers, a verbal conjugator, a nominal “inflector”, and a namedentity recognizer, some of which underline several online services.

Details

Paper ID
lrec2006-main-177
Pages
N/A
BibKey
barreto-etal-2006-open
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • FB

    Florbela Barreto

  • AB

    António Branco

  • EF

    Eduardo Ferreira

  • AM

    Amália Mendes

  • MN

    Maria Fernanda Bacelar do Nascimento

  • FN

    Filipe Nunes

  • JS

    João Ricardo Silva

Links