Back to Main Conference 2008
LREC 2008main

Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/4z3rqowss9z7

Abstract

Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the similarity matrix corresponding to the data set. Therefore, it is not practical to use spectral clustering for a large data set. To overcome this problem, we propose the method to reduce the similarity matrix size. First, using k-means, we obtain a clustering result for the given data set. From each cluster, we pick up some data, which are near to the central of the cluster. We take these data as one data. We call this data set as “committee”. Data except for committees remain one data. For these data, we construct the similarity matrix. Definitely, the size of this similarity matrix is reduced so much that we can perform spectral clustering using the reduced similarity matrix.

Details

Paper ID
lrec2008-main-288
Pages
N/A
BibKey
shinnou-sasaki-2008-spectral
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • HS

    Hiroyuki Shinnou

  • MS

    Minoru Sasaki

Links