Back to Main Conference 2000
LREC 2000main

Enhancing Speech Corpus Resources with Multiple Lexical Tag Layers

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/4fztr6cra7mn

Abstract

We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transfor-mation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).

Details

Paper ID
lrec2000-main-137
Pages
N/A
BibKey
witt-etal-2000-enhancing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • AW

    Andreas Witt

  • HL

    Harald Lüngen

  • DG

    Dafydd Gibbon

Links