•  
  •  
 

Journal of System Simulation

Abstract

Abstract: In process of clustering with traditional K-means algorithm, it is difficult to identify the value of the number of clusters K and its clustering results are influenced by initial centers. It has the weakness of sensitivity to noise and instability. Meanwhile, to solve the problems for the high dimensions, sparse spatial and latent semantic structure of the text data, an algorithm for Chinese text clustering was proposed. This new algorithm uses the physical significance of Singular Value Decomposition (SVD) to firstly classify the data rough, and then uses K-means for text clustering. It applies SVD to decompose and keep semantic features, remove noise, make smoothing process of text data, meanwhile, it takes the advantage of physical significance of SVD to have rough set classification, and then regard classification results as initial centers of K-means. Experiment results demonstrate that the F-Measure of cluster quality has been improved compared with other K-means algorithms.

First Page

3835

Revised Date

2017-01-11

Last Page

3842

CLC

TP317

Recommended Citation

Dai Yueming, Wang Minghui, Zhang Ming, Wang Yan. Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering[J]. Journal of System Simulation, 2018, 30(10): 3835-3842.

Corresponding Author

Yueming Dai,

DOI

10.16182/j.issn1004731x.joss.201810029

Share

COinS