Back to Main Conference 2008
LREC 2008main
A Simple Method for Tagset Comparision
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)
Abstract
Based on the idea that local contexts predict the same basic category across a language, we develop a simple method for comparing tagsets across corpora. The principle differences between tagsets are evidenced by variation in categories in one corpus in the same contexts where another corpus exhibits only a single tag. Such mismatches highlight differences in the definitions of tags which are crucial when porting technology from one annotation scheme to another.