liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Greedy adaptive critics for LPQ [dvs LQR] problems: Convergence Proofs
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, The Institute of Technology.ORCID iD: 0000-0002-9091-4724
1996 (English)Report (Other academic)
Abstract [en]

A number of success stories have been told where reinforcement learning has been applied to problems in continuous state spaces using neural nets or other sorts of function approximators in the adaptive critics. However, the theoretical understanding of why and when these algorithms work is inadequate. This is clearly exemplified by the lack of convergence results for a number of important situations. To our knowledge only two such results been presented for systems in the continuous state space domain. The first is due to Werbos and is concerned with linear function approximation and heuristic dynamic programming. Here no optimal strategy can be found why the result is of limited importance. The second result is due to Bradtke and deals with linear quadratic systems and quadratic function approximators. Bradtke's proof is limited to ADHDP and policy iteration techniques where the optimal solution is found by a number of successive approximations. This paper deals with greedy techniques, where the optimal solution is directly aimed for. Convergence proofs for a number of adaptive critics, HDP, DHP, ADHDP and ADDHP, are presented. Optimal controllers for linear quadratic regulation (LQR) systems can be found by standard techniques from control theory but the assumptions made in control theory can be weakened if adaptive critic techniques are employed. The main point of this paper is, however, not to emphasize the differences but to highlight the similarities and by so doing contribute to a theoretical understanding of adaptive critics.

Place, publisher, year, edition, pages
Linköping, Sweden: Linköping University, Department of Electrical Engineering , 1996. , 20 p.
LiTH-ISY-R, ISSN 1400-3902 ; 1896
Keyword [en]
Linear quadratic regulation, Reinforcement learning
National Category
Engineering and Technology
URN: urn:nbn:se:liu:diva-53354ISRN: LiTH-ISY-R-1896OAI: diva2:288542
Available from: 2010-01-21 Created: 2010-01-20 Last updated: 2014-09-15Bibliographically approved

Open Access in DiVA

fulltext(245 kB)299 downloads
File information
File name FULLTEXT01.pdfFile size 245 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Knutsson, Hans
By organisation
Computer VisionThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 299 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 479 hits
ReferencesLink to record
Permanent link

Direct link