Journal of System Simulation
Abstract
Abstract: In process of clustering with traditional K-means algorithm, it is difficult to identify the value of the number of clusters K and its clustering results are influenced by initial centers. It has the weakness of sensitivity to noise and instability. Meanwhile, to solve the problems for the high dimensions, sparse spatial and latent semantic structure of the text data, an algorithm for Chinese text clustering was proposed. This new algorithm uses the physical significance of Singular Value Decomposition (SVD) to firstly classify the data rough, and then uses K-means for text clustering. It applies SVD to decompose and keep semantic features, remove noise, make smoothing process of text data, meanwhile, it takes the advantage of physical significance of SVD to have rough set classification, and then regard classification results as initial centers of K-means. Experiment results demonstrate that the F-Measure of cluster quality has been improved compared with other K-means algorithms.
Recommended Citation
Dai, Yueming; Wang, Minghui; Ming, Zhang; and Yan, Wang
(2019)
"Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering,"
Journal of System Simulation: Vol. 30:
Iss.
10, Article 29.
DOI: 10.16182/j.issn1004731x.joss.201810029
Available at:
https://dc-china-simulation.researchcommons.org/journal/vol30/iss10/29
First Page
3835
Revised Date
2017-01-11
DOI Link
https://doi.org/10.16182/j.issn1004731x.joss.201810029
Last Page
3842
CLC
TP317
Recommended Citation
Dai Yueming, Wang Minghui, Zhang Ming, Wang Yan. Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering[J]. Journal of System Simulation, 2018, 30(10): 3835-3842.
DOI
10.16182/j.issn1004731x.joss.201810029
Included in
Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Numerical Analysis and Scientific Computing Commons, Operations Research, Systems Engineering and Industrial Engineering Commons, Systems Science Commons