•  
  •  
 

Journal of System Simulation

Abstract

Abstract: To address the theoretical challenges of exploration and exploitation trade-offs and uncertainty modeling in multi-objective reinforcement learning (MORL), this study developed a learning framework, MO-PAC, based on PAC-Bayes theory. By introducing a multi-objective stochastic Critic network and a dynamic preference mechanism, the framework extended the conventional A2C architecture, enabling adaptive and efficient approximation of complex Pareto fronts. Experimental results demonstrate that in multi-objective MuJoCo environments, MO-PAC outperforms baseline algorithms, achieving approximately 20% improvement in hypervolume and 60% increase in expected utility, while exhibiting superior convergence efficiency and robustness. It verifies both theoretical value and practical performance advantages in multi-objective decision-making. The MO-PAC framework overcomes the theoretical limitations of existing methods in dynamic trade-off and uncertainty modeling, providing a novel methodological foundation for advancing the MORL theoretical framework.

First Page

3212

Last Page

3223

CLC

TP391.9

Recommended Citation

Liu Xiang, Jin Qiankun. Research on PAC-Bayes-Based A2C Algorithm for Multi-objective Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(12): 3212-3223.

Corresponding Author

Jin Qiankun

DOI

10.16182/j.issn1004731x.joss.25-FZ0691

Share

COinS