QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning.

Saved in:
Bibliographic Details
Title: QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning.
Authors: Pang, Teng1 (AUTHOR) silencept7@gmail.com, Wu, Guoqiang1 (AUTHOR) guoqiangwu90@gmail.com, Zhang, Yan1 (AUTHOR) yannzhang9@gmail.com, Wang, Bingzheng1 (AUTHOR) binzhwang@gmail.com, Yin, Yilong1 (AUTHOR) ylyin@sdu.edu.cn
Source: Pattern Recognition. Feb2025, Vol. 158, pN.PAG-N.PAG. 1p.
Subjects: Deep reinforcement learning, Reinforcement learning, Learning ability, Algorithms
Abstract: Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches. • This paper theoretically analyzes the impact of action exploration on policy learning, which implies that action exploration can improve policy learning. • Inspired by the theoretical analysis, this paper proposes a simple yet effective method QFAE, which can be embedded into existing offline RL algorithms based on policy constraints. • The experimental results show the effectiveness and compatibility of QFAE. [ABSTRACT FROM AUTHOR]
Copyright of Pattern Recognition is the property of Pergamon Press - An Imprint of Elsevier Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches. • This paper theoretically analyzes the impact of action exploration on policy learning, which implies that action exploration can improve policy learning. • Inspired by the theoretical analysis, this paper proposes a simple yet effective method QFAE, which can be embedded into existing offline RL algorithms based on policy constraints. • The experimental results show the effectiveness and compatibility of QFAE. [ABSTRACT FROM AUTHOR]
ISSN:00313203
DOI:10.1016/j.patcog.2024.111032