Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Journal article (2022)

Authors

Simone Baldi , Southeast University

S. Baldi , Southeast University

Z. Zhang Southeast University

Di Liu Southeast University, University Medical Center Groningen

DOI: https://doi.org/10.1002/acs.3282

Reinforcement learning Least squares Eligibility traces Instrumental variable method Temporal difference

To reference this document use:

http://resolver.tudelft.nl/uuid:a7ae1c3e-dd7c-4760-8c5f-95cddd17c975

More Info

expand_more

Published Date

2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

Files

WileyNJD_AMA4.pdf

(pdf | 0.94 Mb)

Unknown license

Download not available

Adaptive_Control_Signal_2021_B... (pdf)

(pdf | 0.991 Mb)

- Embargo expired in 01-07-2023

Unknown license