•  
  •  
 

Journal of System Simulation

Abstract

Abstract: In order to solve the problem of inconsistent probability distribution between synthetic samples by imputation and real samples, a data generation model-based synthetic sample imputation (DGM-SSI) method is proposed. The data generation model of real samples is constructed based on the Gaussian mixture model, and the number of corresponding components of the Gaussian mixture model is determined by the multi-model fusion strategy. The synthetic samples required for model imputation are generated by using the data obtained from the real samples. Specifically, the components of the data generation model and their weights are used to control the generation of synthetic samples. The feasibility and effectiveness of the DGM-SSI method are verified on 20 multi-model and multidimensional mixed distributions. The experiment result shows that compared with random sample imputation, synthetic minority over-sampling technique (SMOTE), and its two latest variants, the proposed method can obtain synthetic samples with a more consistent probability distribution, which proves that this method is a reasonable synthetic sample imputation method.

First Page

1948

Last Page

1964

CLC

TP391.9

Recommended Citation

He Yulin, Chen Jiaqi, Xu Hepeng, et al. Data Generation Model-based Synthetic Sample Imputation Method[J]. Journal of System Simulation, 2023, 35(9): 1948-1964.

DOI

10.16182/j.issn1004731x.joss.22-0554

Share

COinS