liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Intelligent Formation Control using Deep Reinforcement Learning
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. (AIICS)
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this thesis, deep reinforcement learning is applied to the problem of formation control to enhance performance. The current state-of-the-art formation control algorithms are often not adaptive and require a high degree of expertise to tune. By introducing reinforcement learning in combination with a behavior-based formation control algorithm, simply tuning a reward function can change the entire dynamics of a group. In the experiments, a group of three agents moved to a goal which had its direct path blocked by obstacles. The degree of randomness in the environment varied: in some experiments, the obstacle positions and agent start positions were fixed between episodes, whereas in others they were completely random. The greatest improvements were seen in environments which did not change between episodes; in these experiments, agents could more than double their performance with regards to the reward. These results could be applicable to both simulated agents and physical agents operating in static areas, such as farms or warehouses. By adjusting the reward function, agents could improve the speed with which they approach a goal, obstacle avoidance, or a combination of the two. Two different and popular reinforcement algorithms were used in this work: Deep Double Q-Networks (DDQN) and Proximal Policy Optimization (PPO). Both algorithms showed similar success.

Place, publisher, year, edition, pages
2018. , p. 53
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-152687ISRN: LLIU-IDA/LITH-EX-A--2017/001--SEOAI: oai:DiVA.org:liu-152687DiVA, id: diva2:1263015
Subject / course
Computer Engineering
Available from: 2018-11-14 Created: 2018-11-14 Last updated: 2019-12-06Bibliographically approved

Open Access in DiVA

fulltext(2703 kB)190 downloads
File information
File name FULLTEXT01.pdfFile size 2703 kBChecksum SHA-512
200495431e67ab57b8c12a75cb9f9579c98aab00e6af4493e9f4c02ce100a0c96bdb4dece43b03bbabec8d3749fb3ccf0b7f999b597e5bf22efdae00f17627dc
Type fulltextMimetype application/pdf

By organisation
Artificial Intelligence and Integrated Computer Systems
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 190 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 875 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf