C-3: Coherence and Coreference Corpus
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)
Abstract
The phenomenon of coreference, covering entities, their mentions and their properties, is intricately linked to the phenomenon of coherence, covering the structure of rhetorical relations in a discourse. A text corpus that has both phenomena annotated can be used to test hypotheses about their interrelation or to detect other phenomena. We present the process by which C-3, a new corpus, was obtained by annotating the Discourse GraphBank coherence corpus with entity and mention information. The annotation followed a set of ACE guidelines adapted to favor coreference and to include entities of unknown types in the annotation. Together with the corpus we offer a new annotation tool specifically designed to annotate entity and mention information within a simple and functional graphical interface that combines the best of all worlds from available annotation tools. The potential usefulness of C-3 is discussed, as well as an application in which the corpus proved to be a valuable resource.