Back to Main Conference 2008
LREC 2008main

The Kalshnikov 691 Dependency Bank

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/24hzh9trt9qm

Abstract

The PARC 700 dependency bank has a number of features that would seem to make it less than optimally suited for its intended purpose, parser evaluation. However, it is difficult to know precisely what impact these problems have on the evaluation results, and as a first step towards making comparison possible, a subset of the same sentences is presented here, marked up using a different format that avoids them. In this new representation, the tokens contain exactly the same sequence of characters as the original text, word order is encoded explicitly, and there is no artificial distinction between full tokens and attribute tokens. There is also a clear division between word tokens and empty nodes, and the token attributes are stored together with the word, instead of being spread out individually in the file. A standard programming language syntax is used for the data, so there is little room for markup errors. Finally, the dependency links are closer to standard grammatical terms, which presumably makes it easier to understand what they mean and to convert any particular parser output format to the Kalashnikov 691 representation. The data is provided both in machine-readable format and as graphical dependency trees.

Details

Paper ID
lrec2008-main-245
Pages
N/A
BibKey
by-2008-kalshnikov
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • TB

    Tomas By

Links