Back to Main Conference 2004
LREC 2004main
Cluster Analysis and Classification of Named Entities
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
Abstract
This paper presents a statistics-based and language independent unsupervised approach for clustering possible named entities. We describe and motivate the features and statistical filters used by our clustering process. Using the Model-Based Clustering Analysis software we obtained different clusters of named entities. The method was applied to Bulgarian and English. For some clusters, precision is close to 100%; this helps human validation and saves time. Other clusters still need further refinement. Based on the obtained clusters, it is possible to classify new named entities.