Journal of System Simulation
Abstract
Abstract: To address the limited exploration capabilities and sparse rewards of conventional reinforcement learning methods in air combat environment, a curriculum learning distributed proximal policy optimization (CLDPPO) reinforcement learning algorithm is proposed. A reward function informed by professional empirical knowledge is integrated, a discrete action space is developed, and a global observation and local value and decision network featuring separated global and local observations is established. A methodology for unmanned aerial vehicles UAVs is presented to acquire combat expertise through a sequence of fundamental courses that progressively intensify in their offensive, defensive, and comprehensive content. The experimental results show that the methodology surpasses the specialist system and the other mainstream reinforcement learning algorithms, which has the ability of the autonomous acquisition of air warfare tactics and can enhance the sparse rewards.
Recommended Citation
Zhu, Jingyu; Zhang, Hongli; Kuang, Minchi; Shi, Heng; Zhu, Jihong; Qiao, zhi; and Zhou, Wenqing
(2024)
"Curriculum Learning-based Simulation of UAV Air Combat Under Sparse Rewards,"
Journal of System Simulation: Vol. 36:
Iss.
6, Article 18.
DOI: 10.16182/j.issn1004731x.joss.23-0349
Available at:
https://dc-china-simulation.researchcommons.org/journal/vol36/iss6/18
First Page
1452
Last Page
1467
CLC
TP391.9
Recommended Citation
Zhu Jingyu, Zhang Hongli, Kuang Minchi, et al. Curriculum Learning-based Simulation of UAV Air Combat Under Sparse Rewards[J]. Journal of System Simulation, 2024, 36(6): 1452-1467.
DOI
10.16182/j.issn1004731x.joss.23-0349
Included in
Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Numerical Analysis and Scientific Computing Commons, Operations Research, Systems Engineering and Industrial Engineering Commons, Systems Science Commons