liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning General Policies with Policy Gradient Methods
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering. (Representation, Learning and Planning Group)ORCID iD: 0000-0002-4092-8175
Universitat Pompeu Fabra, Spain.
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering. RWTH Aachen University, Germany.ORCID iD: 0000-0001-9851-8219
2023 (English)In: Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning / [ed] Pierre Marquis, Tran Cao Son, Gabriele Kern-Isberner, 2023, p. 647-657Conference paper, Published paper (Refereed)
Abstract [en]

While reinforcement learning methods have delivered remarkable results in a number of settings, generalization, i.e., the ability to produce policies that generalize in a reliable and systematic way, has remained a challenge. The problem of generalization has been addressed formally in classical planning where provable correct policies that generalize over all instances of a given domain have been learned using combinatorial methods. The aim of this work is to bring these two research threads together to illuminate the conditions under which (deep) reinforcement learning approaches, and in particular, policy optimization methods, can be used to learn policies that generalize like combinatorial methods do. We draw on lessons learned from previous combinatorial and deep learning approaches, and extend them in a convenient way. From the former, we model policies as state transition classifiers, as (ground) actions are not general and change from instance to instance. From the latter, we use graph neural networks (GNNs) adapted to deal with relational structures for representing value functions over planning states, and in our case, policies. With these ingredients in place, we find that actor-critic methods can be used to learn policies that generalize almost as well as those obtained using combinatorial approaches while avoiding the scalability bottleneck and the use of feature pools. Moreover, the limitations of the DRL methods on the benchmarks considered have little to do with deep learning or reinforcement learning algorithms, and result from the well-understood expressive limitations of GNNs, and the tradeoff between optimality and generalization (general policies cannot be optimal in some domains). Both of these limitations are addressed without changing the basic DRL methods by adding derived predicates and an alternative cost structure to optimize.

Place, publisher, year, edition, pages
2023. p. 647-657
Keywords [en]
Reasoning about actions and change, action languages, Learning action theories, Symbolic reinforcement learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-198747DOI: 10.24963/kr.2023/63ISBN: 9781956792027 (print)OAI: oai:DiVA.org:liu-198747DiVA, id: diva2:1807281
Conference
20th International Conference on Principles of Knowledge Representation and Reasoning, Rhodes, Greece, KR 2023 September 2-8, 2023
Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2023-11-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Ståhlberg, SimonGeffner, Hector

Search in DiVA

By author/editor
Ståhlberg, SimonGeffner, Hector
By organisation
Artificial Intelligence and Integrated Computer SystemsFaculty of Science & Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 65 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf