Back to Main Conference 2016
LREC 2016main

KorAP Architecture ― Diving in the Deep Sea of Corpus Data

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4rg89pzcmu75

Abstract

KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP's design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DeReKo for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.

Details

Paper ID
lrec2016-main-569
Pages
pp. 3586-3591
BibKey
diewald-etal-2016-korap
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • ND

    Nils Diewald

  • MH

    Michael Hanl

  • EM

    Eliza Margaretha

  • JB

    Joachim Bingel

  • MK

    Marc Kupietz

  • PB

    Piotr Bański

  • AW

    Andreas Witt

Links