Back to Main Conference 2004
LREC 2004main

Building part-of-speech Corpora Through Histogram Hopping

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/2x54cw7qpvpz

Abstract

This paper are concerned with lowering the cost of producing training resources for part-of-speech taggers. We focus primarily on the resource needs of unsupervised taggers, as these can be trained with simpler resources than their supervised counterparts. We introduce histogram hopping, a new approach for developing the central training resources of unsupervised taggers, and describe a simple annotation prototype that implements the approach. We then discuss the applicability of histogram hopping to the development of resources for supervised taggers. Finally, we report on a preliminary pilot study for French that validates this work.

Details

Paper ID
lrec2004-main-497
Pages
N/A
BibKey
vilain-2004-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • MV

    Marc Vilain

Links