Back to Main Conference 2000
LREC 2000main

Coreference in Annotating a Large Corpus

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/45ipk2dqz2o9

Abstract

The Prague Dependency Treebank (PDT) is a part of the Czech National Corpus, annotated with disambiguated structural descriptions representing the meaning of every sentence in its environment. To achieve that aim, it is necessary i.a. to make explicit (at least some basic) coreferential relations within the sentence boundaries and also beyond them. The PDT scenario includes both automatic and 'manual' procedures; among the former type, there is one that concerns coreference, indicating the lemma of the subject in a specific attribute of the label belonging to a node for a reflexive pronoun, and assigning the deleted nodes in coordinated constructions the lemmas of their counterparts in the given construction. 'Manual' operations restore nodes for the deleted items mostly as pronouns. The distinction between grammatical and textual coreference is reflected. In order to get a possibility of handling textual coreference, specific attributes reflect the linking of sentences to each other and to the context of situation, and the development of the degrees of activation of the 'stock of shared knowledge' will be registered in so far as they are derivable from the use of nouns in subsequent utterances in a discourse.

Details

Paper ID
lrec2000-main-015
Pages
N/A
BibKey
hajicova-etal-2000-coreference
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • EH

    Eva Hajičová

  • JP

    Jarmila Panevová

  • PS

    Petr Sgall

Links