•  
  •  
 

Journal of System Simulation

Abstract

Abstract: To address the issues of overestimated and underestimated biases, low sample utilization rate, and the inability to balance exploration and exploitation in reinforcement learning for path planning, an improved SAC method was proposed. The size balance of entropy was explored and utilized through adaptive temperature coefficient adjustment; on the basis of the SAC framework, a triple Critic architecture was introduced to dynamically weight and fuse the minimum and average values through Qvalue uncertainty, balancing overestimated and underestimated biases. A mixed dynamic sampling experience replay buffer was designed; experience data was partitioned based on reward thresholds; sampling ratios were dynamically adjusted to achieve progressive learning from core strategies to comprehensive generalization. A hierarchical heuristic reward function was designed to guide robots to balance the multi-objective needs of approaching goals and avoiding obstacles in tasks. The simulation experiment results demonstrate that the improved algorithm outperforms in several aspects such as path length, planning time, and success rate, enhancing both efficiency and robustness in path planning.

First Page

714

Last Page

724

CLC

TP242

Recommended Citation

Li Dequan, Xiong Wan. Robot Path Planning by Reinforcement Learning Based on SAC3Q-HDM[J]. Journal of System Simulation, 2026, 38(3): 714-724.

DOI

10.16182/j.issn1004731x.joss.25-0399

Share

COinS