Back to Main Conference 2002
LREC 2002main

An API for Discourse-level Access to XML-encoded Corpora

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/2mt885sqpixa

Abstract

We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.

Details

Paper ID
lrec2002-main-296
Pages
N/A
BibKey
muller-strube-2002-api
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • CM

    Christoph Müller

  • MS

    Michael Strube

Links