Back to Main Conference 2004
LREC 2004main

Cluster Analysis and Classification of Named Entities

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/3xbgqf5airh3

Abstract

This paper presents a statistics-based and language independent unsupervised approach for clustering possible named entities. We describe and motivate the features and statistical filters used by our clustering process. Using the Model-Based Clustering Analysis software we obtained different clusters of named entities. The method was applied to Bulgarian and English. For some clusters, precision is close to 100%; this helps human validation and saves time. Other clusters still need further refinement. Based on the obtained clusters, it is possible to classify new named entities.

Details

Paper ID
lrec2004-main-520
Pages
N/A
BibKey
da-silva-etal-2004-cluster
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • Jd

    Joaquim F. Ferreira da Silva

  • ZK

    Zornitsa Kozareva

  • JL

    José Gabriel Pereira Lopes

Links