•  
  •  
 

Journal of System Simulation

Abstract

Abstract: To address the problems of high information redundancy and slow convergence speed of traditional reinforcement learning in air-combat autonomous decision-making applications, a proximal policy optimization air-combat autonomous decision-making method, based on dual observation and composite reward is proposed. A dual observation space, which contains interaction information as the main information and individual feature information as a supplement, was designed to reduce the influence of redundant battlefield information on the training efficiency of the decision model. A composite reward function combining result reward and process reward was designed to improve convergence speed. The generalized advantage estimator was applied in the proximal policy optimization strategy algorithm to improve the accuracy of advantage function estimation. Simulation results show that the method decision-making model can make precise autonomous decisions and complete air-combat tasks according to the battlefield situation in two types of experimental scenarios: against fixedprogrammed and matrix gaming opponents.

First Page

2208

Last Page

2218

CLC

TP391.9

Recommended Citation

Qian Dianwei, Qi Hongmin, Liu Zhen, et al. Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization[J]. Journal of System Simulation, 2024, 36(9): 2208-2218.

Corresponding Author

Zhou Zhiming

DOI

10.16182/j.issn1004731x.joss.23-0584

Share

COinS