Back to Main Conference 2024
LREC-COLING 2024main

Corpus Services: A Framework to Curate XML Corpus Data

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/5mwcvd6emzsb

Abstract

This paper provides a comprehensive description of the Corpus Services framework—a collection of Java validation tools for language corpora compiled in XML-based data formats, in particular those using EXMARaLDA corpus software. Having successfully found application in several research projects, the core functionality of the framework is currently integrated in the automated curation and publication workflows for EXMARaLDA-driven corpora of Northern Eurasian languages, as developed by the long-term project INEL. Preliminary stages of development and examples of practical use cases are covered, a structured explanation of the framework’s current functionality and operational mechanisms is provided. Furthermore, the utilization of Corpus Services is extensively illustrated within the context of INEL workflows.

Details

Paper ID
lrec2024-main-0358
Pages
pp. 4030-4035
BibKey
riaposov-lazarenko-2024-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • AR

    Aleksandr Riaposov

  • EL

    Elena Lazarenko

Links