Back to Main Conference 2004
LREC 2004main
Building part-of-speech Corpora Through Histogram Hopping
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
Abstract
This paper are concerned with lowering the cost of producing training resources for part-of-speech taggers. We focus primarily on the resource needs of unsupervised taggers, as these can be trained with simpler resources than their supervised counterparts. We introduce histogram hopping, a new approach for developing the central training resources of unsupervised taggers, and describe a simple annotation prototype that implements the approach. We then discuss the applicability of histogram hopping to the development of resources for supervised taggers. Finally, we report on a preliminary pilot study for French that validates this work.