liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
From Pixels to Torques: Policy Learning with Deep Dynamical Models
Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, Faculty of Science & Engineering.
Department of Information Technology, Uppsala University, Sweden.
Department of Computing, Imperial College London, United Kingdom.
2015 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

Place, publisher, year, edition, pages
2015.
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:liu:diva-122394OAI: oai:DiVA.org:liu-122394DiVA: diva2:866120
Conference
Deep Learning Workshop at the 32nd International Conference on Machine Learning (ICML 2015), July 10-11, Lille, France
Projects
COOPLOC
Funder
Swedish Foundation for Strategic Research , COOPLOCSwedish Research Council, 621-2013-5524
Available from: 2015-10-31 Created: 2015-10-31 Last updated: 2015-11-04
In thesis
1. Modeling of Magnetic Fields and Extended Objects for Localization Applications
Open this publication in new window or tab >>Modeling of Magnetic Fields and Extended Objects for Localization Applications
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The level of automation in our society is ever increasing. Technologies like self-driving cars, virtual reality, and fully autonomous robots, which all were unimaginable a few decades ago, are realizable today, and will become standard consumer products in the future. These technologies depend upon autonomous localization and situation awareness where careful processing of sensory data is required. To increase efficiency, robustness and reliability, appropriate models for these data are needed.In this thesis, such models are analyzed within three different application areas, namely (1) magnetic localization, (2) extended target tracking, and (3) autonomous learning from raw pixel information.

Magnetic localization is based on one or more magnetometers measuring the induced magnetic field from magnetic objects. In this thesis we present a model for determining the position and the orientation of small magnets with an accuracy of a few millimeters. This enables three-dimensional interaction with computer programs that cannot be handled with other localization techniques. Further, an additional model is proposed for detecting wrong-way drivers on highways based on sensor data from magnetometers deployed in the vicinity of traffic lanes. Models for mapping complex magnetic environments are also analyzed. Such magnetic maps can be used for indoor localization where other systems, such as GPS, do not work.

In the second application area, models for tracking objects from laser range sensor data are analyzed. The target shape is modeled with a Gaussian process and is estimated jointly with target position and orientation. The resulting algorithm is capable of tracking various objects with different shapes within the same surveillance region.

In the third application area, autonomous learning based on high-dimensional sensor data is considered. In this thesis, we consider one instance of this challenge, the so-called pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. To solve this problem, high-dimensional time series are described using a low-dimensional dynamical model. Techniques from machine learning together with standard tools from control theory are used to autonomously design a controller for the system without any prior knowledge.

System models used in the applications above are often provided in continuous time. However, a major part of the applied theory is developed for discrete-time systems. Discretization of continuous-time models is hence fundamental. Therefore, this thesis ends with a method for performing such discretization using Lyapunov equations together with analytical solutions, enabling efficient implementation in software.

Abstract [sv]

Hur kan man få en dator att följa pucken i bordshockey för att sammanställa match-statistik, en pensel att måla virtuella vattenfärger, en skalpell för att digitalisera patologi, eller ett multi-verktyg för att skulptera i 3D?  Detta är fyra applikationer som bygger på den patentsökta algoritm som utvecklats i avhandlingen. Metoden bygger på att man gömmer en liten magnet i verktyget, och placerar ut ett antal tre-axliga magnetometrar - av samma slag som vi har i våra smarta telefoner - i ett nätverk kring vår arbetsyta. Magnetens magnetfält ger upphov till en unik signatur i sensorerna som gör att man kan beräkna magnetens position i tre frihetsgrader, samt två av dess vinklar. Avhandlingen tar fram ett komplett ramverk för dessa beräkningar och tillhörande analys.

En annan tillämpning som studerats baserat på denna princip är detektion och klassificering av fordon. I ett samarbete med Luleå tekniska högskola med projektpartners har en algoritm tagits fram för att klassificera i vilken riktning fordonen passerar enbart med hjälp av mätningar från en två-axlig magnetometer. Tester utanför Luleå visar på i princip 100% korrekt klassificering.

Att se ett fordon som en struktur av magnetiska dipoler i stället för en enda stor, är ett exempel på ett så kallat utsträckt mål. I klassisk teori för att följa flygplan, båtar mm, beskrivs målen som en punkt, men många av dagens allt noggrannare sensorer genererar flera mätningar från samma mål. Genom att ge målen en geometrisk utsträckning eller andra attribut (som dipols-strukturer) kan man inte enbart förbättra målföljnings-algoritmerna och använda sensordata effektivare, utan också klassificera målen effektivare. I avhandlingen föreslås en modell som beskriver den geometriska formen på ett mer flexibelt sätt och med en högre detaljnivå än tidigare modeller i litteraturen.

En helt annan tillämpning som studerats är att använda maskininlärning för att lära en dator att styra en plan pendel till önskad position enbart genom att analysera pixlarna i video-bilder. Metodiken går ut på att låta datorn få studera mängder av bilder på en pendel, i det här fallet 1000-tals, för att förstå dynamiken av hur en känd styrsignal påverkar pendeln, för att sedan kunna agera autonomt när inlärningsfasen är klar. Tekniken skulle i förlängningen kunna användas för att utveckla autonoma robotar.

Place, publisher, year, edition, pages
Linköping University Electronic Press, 2015. 236 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1723
Keyword
Localization, magnetic tracking, extended target tracking, signal processing, machine learning, Gaussian processes, deep dynamical model, discretization
National Category
Signal Processing
Identifiers
urn:nbn:se:liu:diva-122396 (URN)10.3384/diss.diva-122396 (DOI)978-91-7685-903-2 (ISBN)
Public defence
2015-12-04, Visionen, House B, Campus Valla, Linköping, 10:15 (English)
Opponent
Supervisors
Projects
COOPLOC
Funder
Swedish Foundation for Strategic Research , COOP-LOC
Note

In the electronic version figure 2.2a is corrected.

Available from: 2015-11-03 Created: 2015-10-31 Last updated: 2015-11-30Bibliographically approved

Open Access in DiVA

fulltext(1145 kB)252 downloads
File information
File name FULLTEXT01.pdfFile size 1145 kBChecksum SHA-512
d3a84fb77a63238c610e3468d814138c872d3009a7edee9be9fe58fd33d8177adfc60158484a2c214b5a2a1f3964f1aeed7b242c0734aecd815f5cfcfa858e13
Type fulltextMimetype application/pdf

Authority records BETA

Wahlström, Niklas

Search in DiVA

By author/editor
Wahlström, Niklas
By organisation
Automatic ControlFaculty of Science & Engineering
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 252 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 610 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf