QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning.
Saved in:
| Title: | QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning. |
|---|---|
| Authors: | Pang, Teng1 (AUTHOR) silencept7@gmail.com, Wu, Guoqiang1 (AUTHOR) guoqiangwu90@gmail.com, Zhang, Yan1 (AUTHOR) yannzhang9@gmail.com, Wang, Bingzheng1 (AUTHOR) binzhwang@gmail.com, Yin, Yilong1 (AUTHOR) ylyin@sdu.edu.cn |
| Source: | Pattern Recognition. Feb2025, Vol. 158, pN.PAG-N.PAG. 1p. |
| Subjects: | Deep reinforcement learning, Reinforcement learning, Learning ability, Algorithms |
| Abstract: | Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches. • This paper theoretically analyzes the impact of action exploration on policy learning, which implies that action exploration can improve policy learning. • Inspired by the theoretical analysis, this paper proposes a simple yet effective method QFAE, which can be embedded into existing offline RL algorithms based on policy constraints. • The experimental results show the effectiveness and compatibility of QFAE. [ABSTRACT FROM AUTHOR] |
| Copyright of Pattern Recognition is the property of Pergamon Press - An Imprint of Elsevier Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 180766583 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Pang%2C+Teng%22">Pang, Teng</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> silencept7@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Wu%2C+Guoqiang%22">Wu, Guoqiang</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> guoqiangwu90@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Zhang%2C+Yan%22">Zhang, Yan</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> yannzhang9@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Wang%2C+Bingzheng%22">Wang, Bingzheng</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> binzhwang@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Yin%2C+Yilong%22">Yin, Yilong</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> ylyin@sdu.edu.cn</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Pattern+Recognition%22">Pattern Recognition</searchLink>. Feb2025, Vol. 158, pN.PAG-N.PAG. 1p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Deep+reinforcement+learning%22">Deep reinforcement learning</searchLink><br /><searchLink fieldCode="DE" term="%22Reinforcement+learning%22">Reinforcement learning</searchLink><br /><searchLink fieldCode="DE" term="%22Learning+ability%22">Learning ability</searchLink><br /><searchLink fieldCode="DE" term="%22Algorithms%22">Algorithms</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches. • This paper theoretically analyzes the impact of action exploration on policy learning, which implies that action exploration can improve policy learning. • Inspired by the theoretical analysis, this paper proposes a simple yet effective method QFAE, which can be embedded into existing offline RL algorithms based on policy constraints. • The experimental results show the effectiveness and compatibility of QFAE. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Pattern Recognition is the property of Pergamon Press - An Imprint of Elsevier Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=180766583 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1016/j.patcog.2024.111032 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 1 StartPage: N.PAG Subjects: – SubjectFull: Deep reinforcement learning Type: general – SubjectFull: Reinforcement learning Type: general – SubjectFull: Learning ability Type: general – SubjectFull: Algorithms Type: general Titles: – TitleFull: QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Pang, Teng – PersonEntity: Name: NameFull: Wu, Guoqiang – PersonEntity: Name: NameFull: Zhang, Yan – PersonEntity: Name: NameFull: Wang, Bingzheng – PersonEntity: Name: NameFull: Yin, Yilong IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 02 Text: Feb2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 00313203 Numbering: – Type: volume Value: 158 Titles: – TitleFull: Pattern Recognition Type: main |
| ResultId | 1 |