Back to Main Conference 2014
LREC 2014main

Off-Road LAF: Encoding and Processing Annotations in NLP Workflows

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/4ncw74qf2yps

Abstract

The Linguistic Annotation Framework (LAF) provides an abstract data model for specifying interchange representations to ensure interoperability among different annotation formats. This paper describes an ongoing effort to adapt the LAF data model as the interchange representation in complex workflows as used in the Language Analysis Portal (LAP), an on-line and large-scale processing service that is developed as part of the Norwegian branch of the Common Language Resources and Technology Infrastructure (CLARIN) initiative. Unlike several related on-line processing environments, which predominantly instantiate a distributed architecture of web services, LAP achives scalability to potentially very large data volumes through integration with the Norwegian national e-Infrastructure, and in particular job sumission to a capacity compute cluster. This setup leads to tighter integration requirements and also calls for efficient, low-overhead communication of (intermediate) processing results with workflows. We meet these demands by coupling the LAF data model with a lean, non-redundant JSON-based interchange format and integration of an agile and performant NoSQL database, allowing parallel access from cluster nodes, as the central repository of linguistic annotation.

Details

Paper ID
lrec2014-main-734
Pages
N/A
BibKey
lapponi-etal-2014-road
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • EL

    Emanuele Lapponi

  • EV

    Erik Velldal

  • SO

    Stephan Oepen

  • RK

    Rune Lain Knudsen

Links