liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning for Improved Utility of Simulation-Based Training
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-4144-4893
2023 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Team training in complex domains often requires a substantial number of resources, e.g. vehicles, machines, and role-players. For this reason, it may be difficult to realise efficient and effective training scenarios in a real-world setting. Instead, part of the training can be conducted in synthetic, computer-generated environments. In these environments trainees can operate simulators instead of real vehicles, while synthetic actors can replace human role-players to increase the complexity of the simulated scenario at low operating cost. However, constructing behaviour models for synthetic actors is challenging, especially for the end users, who typically do not have expertise in artificial intelligence. In this dissertation, we study how machine learning can be used to simplify the construction of intelligent agents for simulation-based training. A simulation-based air combat training system is used as case study. 

The contributions of the dissertation are divided into two parts. The first part aims at improving the understanding of reinforcement learning in the domain of simulation-based training. First, a user-study is conducted to identify important capabilities and characteristics of learning agents that are intended to support training of fighter pilots. It is identified that one of the most important capabilities of learning agents in the context of simulation-based training is that their behaviour can be adapted to different phases of training, as well as to the training needs of individual human trainees. Second, methods for learning how to coordinate with other agents are studied in simplified training scenarios, to investigate how the design of the agent’s observation space, action space, and reward signal affects the performance of learning. It is identified that temporal abstractions and hierarchical reinforcement learning can improve the efficiency of learning, while also providing support for modelling of doctrinal behaviour. In more complex settings, curriculum learning and related methods are expected to help find novel tactics even when sparse, abstract reward signals are used. Third, based on the results from the user study and the practical experiments, a system concept for a user-adaptive training system is developed to support further research. 

The second part of the contributions focuses on methods for utility-based multi-objective reinforcement learning, which incorporates knowledge of the user’s utility function in the search for policies that balance multiple conflicting objectives. Two new agents for multi-objective reinforcement learning are proposed: the Tunable Actor (T-Actor) and the Multi-Objective Dreamer (MO-Dreamer). T-Actor provides decision support to instructors by learning a set of Pareto optimal policies, represented by a single neural network conditioned on objective preferences. This enables tuning of the agent’s behaviour to fit trainees’ current training needs. Experimental evaluations in gridworlds and in the target system show that T-Actor reduces the number of training steps required for learning. MO-Dreamer adapts online to changes in users’ utility, e.g. changes in training needs. It does so by learning a model of the environment, which it can use for anticipatory rollouts with a diverse set of utility functions to explore which policy to follow to optimise the return for a given set of objective preferences. An experimental evaluation shows that MO-Dreamer outperforms prior model-free approaches in terms of experienced regret, for frequent as well as sparse changes in utility. 

Overall, the research conducted in this dissertation contributes to improved knowledge about how to apply machine learning methods to construction of simulation-based training environments. While our focus was on air combat training, the results are general enough to be applicable in other domains. 

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2023. , p. 168
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2351
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-198923DOI: 10.3384/9789180753678ISBN: 9789180753661 (print)ISBN: 9789180753678 (electronic)OAI: oai:DiVA.org:liu-198923DiVA, id: diva2:1809077
Public defence
2023-12-08, Ada Lovelace, B-building, Campus Valla, Linköping, 13:15 (English)
Opponent
Supervisors
Note

2023-11-02: The thesis was first published online. The online published version reflects the printed version.

2023-11-15: The PDF-file has been replaced by a new file from LiU-Print to enable speech-to-text functionality and to allow text copying. Before this date the PDF has been downloaded 40 times.

Funding: This work was partially supported by the Swedish Governmental Agency for Innovation Systems (grant NFFP7/2017-04885), and the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The computations were enabled by the resources provided by the Swedish National Infrastructure for Computing (SNIC) at Tetralith/NSC partially funded by the Swedish Research Council through grant agreement no. 2020/5-230, as well as the supercomputing resource Berzelius provided by the National Supercomputer Centre at Linköping University and the Knut and Alice Wallenberg foundation.

Available from: 2023-11-02 Created: 2023-11-02 Last updated: 2023-11-15Bibliographically approved

Open Access in DiVA

fulltext(33438 kB)4226 downloads
File information
File name FULLTEXT02.pdfFile size 33438 kBChecksum SHA-512
edd5fc77aebff1bfb5dcdec5daf606d585483375c37dd16b5f3719b094845bd38285c9436c756e36f82637b71dd0235423008716c71b8c1533e233fcd28708ba
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Källström, Johan

Search in DiVA

By author/editor
Källström, Johan
By organisation
Artificial Intelligence and Integrated Computer SystemsFaculty of Science & Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 4299 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 3126 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf