Back to Main Conference 2014
LREC 2014main

Extending standoff annotation

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/4fqqhei4w657

Abstract

Textual information is sometimes accompanied by additional encodings (such as visuals). These multimodal documents may be interesting objects of investigation for linguistics. Another class of complex documents are pre-annotated documents. Classic XML inline annotation often fails for both document classes because of overlapping markup. However, standoff annotation, that is the separation of primary data and markup, is a valuable and common mechanism to annotate multiple hierarchies and/or read-only primary data. We demonstrate an extended version of the XStandoff meta markup language, that allows the definition of segments in spatial and pre-annotated primary data. Together with the ability to import already established (linguistic) serialization formats as annotation levels and layers in an XStandoff instance, we are able to annotate a variety of primary data files, including text, audio, still and moving images. Application scenarios that may benefit from using XStandoff are the analyzation of multimodal documents such as instruction manuals, or sports match analysis, or the less destructive cleaning of web pages.

Details

Paper ID
lrec2014-main-274
Pages
pp. 169-174
BibKey
stuhrenberg-2014-extending
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • MS

    Maik Stührenberg

Links