Back to Main Conference 2010
LREC 2010main

When CORDIAL Becomes Friendly: Endowing the CORDIAL Corpus with a Syntactic Annotation Layer

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/2bevd4y5n25k

Abstract

This paper reports on the syntactic annotation of a previously compiled and tagged corpus of European Portuguese (EP) dialects ― The Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN). The parsed version of CORDIAL-SIN is intended to be a more efficient resource for the purpose of studying dialect syntax by allowing automated searches for various syntactic constructions of interest. To achieve this goal we adopted a rich annotation system (the UPenn corpora annotation system) which codifies syntactic information of high relevance. The annotation produces tree representations, in form of labelled parenthesis, that are integrally searchable with CorpusSearch, a search engine for parsed corpora (Randall, 2005-2007). The present paper focuses on CORDIAL-SIN annotation issues, namely it presents the general principles and guidelines of the adopted annotation system and describes the methodology for constructing the parsed version of the corpus and for searching it (tools and procedures). Last section addresses the question of how an annotation system originally designed for Middle English can be adapted to meet the particular needs of a Portuguese corpus of dialectal speech.

Details

Paper ID
lrec2010-main-511
Pages
N/A
BibKey
magro-2010-cordial
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • CM

    Catarina Magro

Links