•  
  •  
 

Journal of System Simulation

Abstract

Abstract: To address the limited exploration capabilities and sparse rewards of conventional reinforcement learning methods in air combat environment, a curriculum learning distributed proximal policy optimization (CLDPPO) reinforcement learning algorithm is proposed. A reward function informed by professional empirical knowledge is integrated, a discrete action space is developed, and a global observation and local value and decision network featuring separated global and local observations is established. A methodology for unmanned aerial vehicles UAVs is presented to acquire combat expertise through a sequence of fundamental courses that progressively intensify in their offensive, defensive, and comprehensive content. The experimental results show that the methodology surpasses the specialist system and the other mainstream reinforcement learning algorithms, which has the ability of the autonomous acquisition of air warfare tactics and can enhance the sparse rewards.

First Page

1452

Last Page

1467

CLC

TP391.9

Recommended Citation

Zhu Jingyu, Zhang Hongli, Kuang Minchi, et al. Curriculum Learning-based Simulation of UAV Air Combat Under Sparse Rewards[J]. Journal of System Simulation, 2024, 36(6): 1452-1467.

Corresponding Author

Zhang Hongli

DOI

10.16182/j.issn1004731x.joss.23-0349

Share

COinS