liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Humanoids learning to walk: a natural CPG-actor-critic architecture
Högskolan i Skövde, Institutionen för kommunikation och information.
Högskolan i Skövde, Institutionen för kommunikation och information.
Högskolan i Skövde, Institutionen för kommunikation och information.
2013 (English)In: Frontiers in Neurorobotics, ISSN 1662-5218, Vol. 7, no 5Article in journal (Refereed) Published
Abstract [en]

The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value.

Place, publisher, year, edition, pages
Frontiers Media S.A. , 2013. Vol. 7, no 5
Keyword [en]
reinforcement learning, humanoid walking, central pattern generators, actor-critic, dynamical systems theory, embodied cognition, value system
National Category
Computer and Information Science
Research subject
URN: urn:nbn:se:liu:diva-106770DOI: 10.3389/fnbot.2013.00005PubMedID: 23675345OAI: diva2:718756
Available from: 2013-08-08 Created: 2014-05-22 Last updated: 2014-11-12Bibliographically approved
In thesis
1. Reinforcement Learning of Locomotion based on Central Pattern Generators
Open this publication in new window or tab >>Reinforcement Learning of Locomotion based on Central Pattern Generators
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Locomotion learning for robotics is an interesting and challenging area in which the movement capabilities of animals have been deeply investigated and acquired knowledge has been transferred into modelling locomotion on robots. What modellers are required to understand is what structure can represent locomotor systems in different animals and how such animals develop various and dexterous locomotion capabilities. Notwithstanding the depth of research in the area, modelling locomotion requires a deep rethinking.

In this thesis, based on the umbrella of embodied cognition, a neural-body-environment interaction is emphasised and regarded as the solution to locomotion learning/development. Central pattern generators (CPGs) are introduced in the first part (Chapter 2) to generally interpret the mechanism of locomotor systems in animals. With a deep investigation on the structure of CPGs and inspiration from human infant development, a layered CPG architecture with baseline motion generation and dynamics adaptation interfaces are proposed. In the second part, reinforcement learning (RL) is elucidated as a good method for dealing with locomotion learning from the perspectives of psychology, neuroscience and robotics (Chapter 4). Several continuous-space RL techniques (e.g. episodic natural actor critic, policy learning by weighting explorations with returns, continuous action space learning automaton are introduced for practical use (Chapter 3). With the knowledge of CPGs and RL, the architecture and concept of CPG-Actor-Critic is constructed. Finally, experimental work based on published papers is highlighted in a path of my PhD research (Chapter 5). This includes the implementation of CPGs and the learning on the NAO robot for crawling and walking. The implementation is also extended to test the generalizability to different morphologies (the ghostdog robot). The contribution of this thesis is discussed from two angles: the investigation of the CPG architecture and the implementation (Chapter 6).

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2014. 71 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1602
National Category
Computer Science
urn:nbn:se:liu:diva-105884 (URN)978-91-7519-313-7 (print) (ISBN)
Public defence
2014-06-04, G110, hus G, Högskolan i Skövde, Skövde, 12:30 (English)
Available from: 2014-05-22 Created: 2014-04-11 Last updated: 2014-05-22Bibliographically approved

Open Access in DiVA

fulltext(1368 kB)99 downloads
File information
File name FULLTEXT01.pdfFile size 1368 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedLänk till fulltext

Search in DiVA

By author/editor
Li, CaiLowe, RobertZiemke, Tom
In the same journal
Frontiers in Neurorobotics
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 99 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 48 hits
ReferencesLink to record
Permanent link

Direct link