Smooth Clip Advantage PPO in Reinforcement Learning

Junwei Wang; Zilin Zeng; Peng Shang

doi:10.1088/1742-6596/2513/1/012005

Journal of Physics: Conference Series

Paper • The following article is Open access

Smooth Clip Advantage PPO in Reinforcement Learning

Junwei Wang¹, Zilin Zeng¹ and Peng Shang¹

Published under licence by IOP Publishing Ltd
Journal of Physics: Conference Series, Volume 2513, 2023 7th International Conference on Artificial Intelligence, Automation and Control Technologies (AIACT 2023) 24/02/2023 - 26/02/2023 Kunming, China Citation Junwei Wang et al 2023 J. Phys.: Conf. Ser. 2513 012005 DOI 10.1088/1742-6596/2513/1/012005

Download Article PDF

Article metrics

143 Total downloads

Author e-mails

jw.wang@siat.ac.cn

Author affiliations

¹ Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518000, China

Buy this article in print

Journal RSS

Sign up for new issue notifications

Abstract

Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.

Smooth Clip Advantage PPO in Reinforcement Learning

Article metrics

Share this article

Author e-mails

Author affiliations

Abstract