Pattern Discovery in Named Organization Corpus

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

Abstract

This paper presents how to mine formulation rules from a named organization corpus. The TEIRESIAS algorithm, which is widely used in bioinformatics domain, is adopted. The experimental results based on MET2 test bed show that the approach of regarding the morpheme of a keyword as a cluster is the best, the approach of regarding all the keywords as the same cluster is the next, and the approach of regarding each keyword as a cluster is the worse. The performance using morpheme-based approach is a little better than that of hand-crafted approach. The methodology can be easily extended to other types of named entities.