Journal of System Simulation

Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering

Yueming Dai, Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China;
Minghui Wang, Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China;
Zhang Ming, Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China;
Wang Yan, Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China;

Abstract

Abstract: In process of clustering with traditional K-means algorithm, it is difficult to identify the value of the number of clusters K and its clustering results are influenced by initial centers. It has the weakness of sensitivity to noise and instability. Meanwhile, to solve the problems for the high dimensions, sparse spatial and latent semantic structure of the text data, an algorithm for Chinese text clustering was proposed. This new algorithm uses the physical significance of Singular Value Decomposition (SVD) to firstly classify the data rough, and then uses K-means for text clustering. It applies SVD to decompose and keep semantic features, remove noise, make smoothing process of text data, meanwhile, it takes the advantage of physical significance of SVD to have rough set classification, and then regard classification results as initial centers of K-means. Experiment results demonstrate that the F-Measure of cluster quality has been improved compared with other K-means algorithms.

Recommended Citation

Dai, Yueming; Wang, Minghui; Ming, Zhang; and Yan, Wang (2019) "Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering," Journal of System Simulation: Vol. 30: Iss. 10, Article 29.
DOI: 10.16182/j.issn1004731x.joss.201810029
Available at: https://dc-china-simulation.researchcommons.org/journal/vol30/iss10/29

First Page

3835

Revised Date

2017-01-11

DOI Link

https://doi.org/10.16182/j.issn1004731x.joss.201810029

Last Page

3842

CLC

TP317

Recommended Citation

Dai Yueming, Wang Minghui, Zhang Ming, Wang Yan. Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering[J]. Journal of System Simulation, 2018, 30(10): 3835-3842.

Corresponding Author

Yueming Dai,

DOI

10.16182/j.issn1004731x.joss.201810029

Download

Included in

Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Numerical Analysis and Scientific Computing Commons, Operations Research, Systems Engineering and Industrial Engineering Commons, Systems Science Commons

COinS

Journal of System Simulation

Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering

Authors

Abstract

Recommended Citation

First Page

Revised Date

DOI Link

Last Page

CLC

Recommended Citation

Corresponding Author

DOI

Included in

Share

Search