Back to Main Conference 2004
LREC 2004main

The CLaRK System: XML-based Corpora Development System for Rapid Prototyping

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/34ct4i5896mw

Abstract

The paper presents the CLaRK System as a tool for the creation of XML-based corpora and a platform for rapid prototyping. The system provides a set of basic tools for processing XML documents. These tools include: tokenizers, regular grammars, constraints; remove, insert, extract, sort, transformation operations. Additionally, the system is equipped with a macro language which allows the creation of tools sequences. The macro language includes a set of control operators for guiding the application of the tools in the macro. Usually, a tool or a macro works over a single document changing it or producing a new document. In some cases processing of more than one document is necessary --- in iterative statistics for treebank transformation, stand-off annotation, etc. For such processing the macro language allows a dynamic change of the processed documents.

Details

Paper ID
lrec2004-main-134
Pages
N/A
BibKey
simov-etal-2004-clark
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • KS

    Kiril Simov

  • AS

    Alexander Simov

  • HG

    Hristo Ganev

  • KI

    Krasimira Ivanova

  • IG

    Ilko Grigorov

Links