Learning of word sense disambiguation rules by Co-training, checking co-occurrence of features

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

Abstract

In this paper, we propose a method to improve Co-training and apply it to word sense disambiguation problems. Co-training is an unsupervised learning method to overcome the problem that labeled training data is fairly expensive to obtain. Co-training is theoretically promising, but it requires two feature sets with the conditional independence assumption. This assumption is too rigid. In fact there is no choice but to use incomplete feature sets, and then the accuracy of learned rules reaches a limit. In this paper, we check co-occurrence between two feature sets to avoid such undesirable situation when we add unlabeled instances to training data. In experiments, we applied our method to word sense disambiguation problems for the three Japanese words ‘koe’, ‘toppu’ and ‘kabe’ and demonstrated that it improved Co-training.