Back to Main Conference 2000
LREC 2000main

Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/55n6sk874462

Abstract

This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that COMBI-BOOTSTRAP: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.

Details

Paper ID
lrec2000-main-113
Pages
N/A
BibKey
zavrel-daelemans-2000-bootstrapping
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • JZ

    Jakub Zavrel

  • WD

    Walter Daelemans

Links