Back to Main Conference 2016
LREC 2016main

TEITOK: Text-Faithful Annotated Corpora

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4mj8h46vt9mb

Abstract

TEITOK is a web-based framework for corpus creation, annotation, and distribution, that combines textual and linguistic annotation within a single TEI based XML document. TEITOK provides several built-in NLP tools to automatically (pre)process texts, and is highly customizable. It features multiple orthographic transcription layers, and a wide range of user-defined token-based annotations. For searching, TEITOK interfaces with a local CQP server. TEITOK can handle various types of additional resources including Facsimile images and linked audio files, making it possible to have a combined written/spoken corpus. It also has additional modules for PSDX syntactic annotation and several types of stand-off annotation.

Details

Paper ID
lrec2016-main-637
Pages
pp. 4037-4043
BibKey
janssen-2016-teitok
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • MJ

    Maarten Janssen

Links