Abstract
This paper proposes two novel adaptive optimal control algorithms for continuous-time nonlinear affine systems based on reinforcement learning: i) generalized policy iteration (GPI) and ii) Q-learning. As a result, the a priori knowledge of the system drift f (x) is not needed via GPI, which gives us a partially model-free and online solution. We then for the first time extend the idea of Q-learning to the nonlinear continuous time optimal control problem in a noniterative manner. Thisleads to a completely model-free method where neither the system drift f (x) nor the input gain g(x) is needed. For both methods, the adaptive critic and actor are continuously and simultaneously updating each other without iterative steps, which effectively avoids the hybrid structure and the need or an initial stabilizing control policy. Moreover, finite-time convergence is guaranteed by using a sliding mode technique in the new adaptive approach, where the persistent excitation (PE) condition can be directly verified online. We also prove the overall Lyapunov stability and demonstrate the effectiveness of the proposed algorithms using numerical examples.
Original language | English |
---|---|
Title of host publication | 2019 IEEE 58th Conference on Decision and Control (CDC) |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Number of pages | 6 |
ISBN (Electronic) | 978-1-7281-1398-2 |
DOIs | |
Publication status | Published - 12 Mar 2020 |
Event | IEEE Conference on Decision and Control - Nice, France Duration: 11 Dec 2019 → 13 Dec 2019 Conference number: 58 https://cdc2019.ieeecss.org/ |
Publication series
Name | IEEE Conference on Decision and Control |
---|---|
Publisher | IEEE |
ISSN (Electronic) | 2576-2370 |
Conference
Conference | IEEE Conference on Decision and Control |
---|---|
Abbreviated title | CDC2019 |
Country/Territory | France |
City | Nice |
Period | 11/12/19 → 13/12/19 |
Internet address |
Keywords
- adaptive critic
- approximate dynamic programming
- reinforcement learning
- nonlinear systems
- Q-learning
- adaptive optimal control