Journal of System Simulation
Abstract
Abstract: Traditional optimization methods struggle with efficiency, while reinforcement learning approaches often yield low solution quality and high training costs. In response, this paper proposes an attention mechanism-based reinforcement learning method. A dynamic attention strategy network with multi-information fusion is designed to improve solution quality. A visibility-graph approach is employed to simplify threat zone constraints and speed up convergence, and a decoding sequence reordering mechanism is introduced for further performance optimization of the solution. The simulation results show that the method generates high-quality solutions within milliseconds, achieving total rewards that approach or even surpass those obtained by traditional solvers such as Ortools and PyVRP within several seconds to hundreds of seconds. The training efficiency is enhanced significantly, with the training time per epoch reducing from several hours to about 30 minutes.
Recommended Citation
Yang, Can; Chen, Kai; and Zhu, Feng
(2026)
"Reinforcement Learning Based Method for UAV Team Orienteering Optimization under Multi-constraint Condition,"
Journal of System Simulation: Vol. 38:
Iss.
2, Article 9.
DOI: 10.16182/j.issn1004731x.joss.25-0595
Available at:
https://dc-china-simulation.researchcommons.org/journal/vol38/iss2/9
First Page
360
Last Page
371
CLC
TP391.9; TP181
Recommended Citation
Yang Can, Chen Kai, Zhu Feng. Reinforcement Learning Based Method for UAV Team Orienteering Optimization under Multi-constraint Condition[J]. Journal of System Simulation, 2026, 38(2): 360-371.
DOI
10.16182/j.issn1004731x.joss.25-0595
Included in
Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Numerical Analysis and Scientific Computing Commons, Operations Research, Systems Engineering and Industrial Engineering Commons, Systems Science Commons