liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Reinforcement Learning and Distributed Local Model Synthesis
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, The Institute of Technology.
1997 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Reinforcement learning is a general and powerful way to formulate complex learning problems and acquire good system behaviour. The goal of a reinforcement learning system is to maximize a long term sum of instantaneous rewards provided by a teacher. In its extremum form, reinforcement learning only requires that the teacher can provide a measure of success. This formulation does not require a training set with correct responses, and allows the system to become better than its teacher.

In reinforcement learning much of the burden is moved from the teacher to the training algorithm. The exact and general algorithms that exist for these problems are based on dynamic programming (DP), and have a computational complexity that grows exponentially with the dimensionality of the state space. These algorithms can only be applied to real world problems if an efficient encoding of the state space can be found.

To cope with these problems, heuristic algorithms and function approximation need to be incorporated. In this thesis it is argued that local models have the potential to help solving problems in high-dimensional spaces and that global models have not. This is motivated with the biasvariance dilemma, which is resolved with the assumption that the system is constrained to live on a low-dimensional manifold in the space of inputs and outputs. This observation leads to the introduction of bias in terms of continuity and locality.

A linear approximation of the system dynamics and a quadratic function describing the long term reward are suggested to constitute a suitable local model. For problems involving one such model, i.e. linear quadratic regulation problems, novel convergence proofs for heuristic DP algorithms are presented. This is one of few available convergence proofs for reinforcement learning in continuous state spaces.

Reinforcement learning is closely related to optimal control, where local models are commonly used. Relations to present methods are investigated, e.g. adaptive control, gain scheduling, fuzzy control, and jump linear systems. Ideas from these areas are compiled in a synergistic way to produce a new algorithm for heuristic dynamic programming where function parameters and locality, expressed as model applicability, are learned on-line. Both top-down and bottom-up versions are presented.

The emerging local models and their applicability need to be memorized by the learning system. The binary tree is put forward as a suitable data structure for on-line storage and retrieval of these functions.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 1997. , 193 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 469
National Category
Engineering and Technology
URN: urn:nbn:se:liu:diva-54348ISBN: 91-7871-892-9OAI: diva2:302961
Public defence
1997-03-07, C2, Hus C, Campus Valla, Linköpings universitet, Linköping, 10:15 (English)
Available from: 2010-03-10 Created: 2010-03-10 Last updated: 2013-08-28Bibliographically approved

Open Access in DiVA

Reinforcement Learning and Distributed Local Model Synthesis(1945 kB)722 downloads
File information
File name FULLTEXT01.pdfFile size 1945 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Computer VisionThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 722 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 741 hits
ReferencesLink to record
Permanent link

Direct link