liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Publications (10 of 11) Show all publications
Drexler, D., Ståhlberg, S., Bonet, B. & Geffner, H. (2024). Equivalence-Based Abstractions for Learning General Policies. In: Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning: . Paper presented at 34th International Conference on Automated Planning and Scheduling.
Open this publication in new window or tab >>Equivalence-Based Abstractions for Learning General Policies
2024 (English)In: Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning, 2024Conference paper, Published paper (Refereed)
Abstract [en]

Identifying state symmetries plays a crucial role in minimiz-ing the number of states explored during search, yet identify-ing precisely all symmetries is computationally hard. In thecontext of learning general policies that solve instances ofarbitrary size from small instances, however, this computa-tional bottleneck is not a problem. In this paper, we addressthe task of identifying all state symmetries through the lensof the graph isomorphism problem. To accomplish this, werepresent states as undirected, labeled graphs that reflect therelational structure of states and the goal. We then use off-the-shelf graph isomorphism algorithms to determine whethertwo states are isomorphic with respect to the goal. The iso-morphism relationship forms equivalent classes that result inan abstract state space that can be used instead of the origi-nal one to learn general policies more efficiently. While thisabstract state space can be used for many different learningtasks, we focus on learning symbolic general policies wherewe show that the proposed approach can lead to significantspeedups.

National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-215814 (URN)
Conference
34th International Conference on Automated Planning and Scheduling
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)EU, Horizon 2020, 952215European Commission, 885107
Available from: 2025-06-29 Created: 2025-06-29 Last updated: 2025-08-13
Funkquist, M., Ståhlberg, S. & Geffner, H. (2024). Learning to Ground Existentially Quantified Goals. In: Proceedings of the Twenty-First International Conference on Principles of Knowledge Representation and Reasoning (KR 2024): . Paper presented at Conference on Principles of Knowledge Representation and Reasoning (KR 2024).
Open this publication in new window or tab >>Learning to Ground Existentially Quantified Goals
2024 (English)In: Proceedings of the Twenty-First International Conference on Principles of Knowledge Representation and Reasoning (KR 2024), 2024Conference paper, Published paper (Refereed)
Abstract [en]

Goal instructions for autonomous AI agents cannot assume that objects have unique names. Instead, objects in goals must be referred to by providing suitable descriptions. However, this raises problems in both classical planning and generalized planning. The standard approach to handling existentially quantified goals in classical planning involves compiling them into a DNF formula that encodes all possible variable bindings and adding dummy actions to map each DNF term into the new, dummy goal. This preprocessing is exponential in the number of variables. In generalized planning, the problem is different: even if general policies can deal with any initial situation and goal, executing a general policy requires the goal to be grounded to define a value for the policy features. The problem of grounding goals, namely finding the objects to bind the goal variables, is subtle: it is a generalization of classical planning, which is a special case when there are no goal variables to bind, and constraint reasoning, which is a special case when there are no actions. In this work, we address the goal grounding problem with a novel supervised learning approach. A GNN architecture, trained to predict the cost of partially quantified goals over small domain instances is tested on larger instances involving more objects and different quantified goals. The proposed architecture is evaluated experimentally over several planning domains where generalization is tested along several dimensions including the number of goal variables and objects that can bind such variables. The scope of the approach is also discussed in light of the known relationship between GNNs and C₂ logics.

Series
arXiv.org ; 2409.20259
Keywords
classical planning, gnn, deep learning, artificial intelligence
National Category
Artificial Intelligence
Identifiers
urn:nbn:se:liu:diva-214384 (URN)10.24963/kr.2024/80 (DOI)
Conference
Conference on Principles of Knowledge Representation and Reasoning (KR 2024)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-06-06 Created: 2025-06-06 Last updated: 2025-06-06
Drexler, D., Ståhlberg, S., Bonet, B. & Geffner, H. (2024). Symmetries and Expressive Requirements for Learning General Policies. In: Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning (KR2024): . Paper presented at 21st International Conference on Principles of Knowledge Representation and Reasoning, Hanoi, Vietnam, November 2-8, 2024.
Open this publication in new window or tab >>Symmetries and Expressive Requirements for Learning General Policies
2024 (English)In: Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning (KR2024), 2024Conference paper, Published paper (Refereed)
Abstract [en]

State symmetries play an important role in planning and generalized planning. In the first case, state symmetries can be used to reduce the size of the search; in the second, to reduce the size of the training set. In the case of general planning, however, it is also critical to distinguish non-symmetric states, i.e., states that represent non-isomorphic relational structures. However, while the language of first-order logic distinguishes non-symmetric states, the languages and architectures used to represent and learn general policies do not. In particular, recent approaches for learning general policies use state features derived from description logics or learned via graph neural networks (GNNs) that are known to be limited by the expressive power of C2, first-order logic with two variables and counting. In this work, we address the problem of detecting symmetries in planning and generalized planning and use the results to assess the expressive requirements for learning general policies over various planning domains. For this, we map planning states to plain graphs, run off-the-shelf algorithms to determine whether two states are isomorphic with respect to the goal, and run coloring algorithms to determine if C2 features computed logically or via GNNs distinguish non-isomorphic states. Symmetry detection results in more effective learning, while the failure to detect non-symmetries prevents general policies from being learned at all in certain domains.

National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-207393 (URN)
Conference
21st International Conference on Principles of Knowledge Representation and Reasoning, Hanoi, Vietnam, November 2-8, 2024
Funder
Knut and Alice Wallenberg FoundationEU, European Research Council, 885107
Available from: 2024-09-07 Created: 2024-09-07 Last updated: 2025-02-27
Ståhlberg, S., Bonet, B. & Geffner, H. (2023). Learning General Policies with Policy Gradient Methods. In: Pierre Marquis, Tran Cao Son, Gabriele Kern-Isberner (Ed.), Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning: . Paper presented at 20th International Conference on Principles of Knowledge Representation and Reasoning, Rhodes, Greece, KR 2023 September 2-8, 2023 (pp. 647-657). International Joint Conferences on Artificial Intelligence
Open this publication in new window or tab >>Learning General Policies with Policy Gradient Methods
2023 (English)In: Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning / [ed] Pierre Marquis, Tran Cao Son, Gabriele Kern-Isberner, International Joint Conferences on Artificial Intelligence , 2023, p. 647-657Conference paper, Published paper (Refereed)
Abstract [en]

While reinforcement learning methods have delivered remarkable results in a number of settings, generalization, i.e., the ability to produce policies that generalize in a reliable and systematic way, has remained a challenge. The problem of generalization has been addressed formally in classical planning where provable correct policies that generalize over all instances of a given domain have been learned using combinatorial methods. The aim of this work is to bring these two research threads together to illuminate the conditions under which (deep) reinforcement learning approaches, and in particular, policy optimization methods, can be used to learn policies that generalize like combinatorial methods do. We draw on lessons learned from previous combinatorial and deep learning approaches, and extend them in a convenient way. From the former, we model policies as state transition classifiers, as (ground) actions are not general and change from instance to instance. From the latter, we use graph neural networks (GNNs) adapted to deal with relational structures for representing value functions over planning states, and in our case, policies. With these ingredients in place, we find that actor-critic methods can be used to learn policies that generalize almost as well as those obtained using combinatorial approaches while avoiding the scalability bottleneck and the use of feature pools. Moreover, the limitations of the DRL methods on the benchmarks considered have little to do with deep learning or reinforcement learning algorithms, and result from the well-understood expressive limitations of GNNs, and the tradeoff between optimality and generalization (general policies cannot be optimal in some domains). Both of these limitations are addressed without changing the basic DRL methods by adding derived predicates and an alternative cost structure to optimize.

Place, publisher, year, edition, pages
International Joint Conferences on Artificial Intelligence, 2023
Series
Proceedings of the Conference on Principles of Knowledge Representation and Reasoning, ISSN 2334-1033
Keywords
Reasoning about actions and change, action languages, Learning action theories, Symbolic reinforcement learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-198747 (URN)10.24963/kr.2023/63 (DOI)9781956792027 (ISBN)
Conference
20th International Conference on Principles of Knowledge Representation and Reasoning, Rhodes, Greece, KR 2023 September 2-8, 2023
Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2025-11-13Bibliographically approved
Ståhlberg, S. (2023). Lifted Successor Generation by Maximum Clique Enumeration. In: Kobi Gal, Ann Nowé, Grzegorz J. Nalepa, Roy Fairstein, Roxana Rădulescu (Ed.), ECAI 2023: . Paper presented at 26th European Conference on Artificial Intelligence ECAI 2023, Kraków, Poland, 30 september - 4 october, 2023 (pp. 2194-2201). IOS Press, 372
Open this publication in new window or tab >>Lifted Successor Generation by Maximum Clique Enumeration
2023 (English)In: ECAI 2023 / [ed] Kobi Gal, Ann Nowé, Grzegorz J. Nalepa, Roy Fairstein, Roxana Rădulescu, IOS Press, 2023, Vol. 372, p. 2194-2201Conference paper, Published paper (Refereed)
Abstract [en]

Classical planning instances are often represented using first-order logic; however, the initial step for most classical planners is to transform the given instance into a propositional representation. For example, action schemas are converted into ground actions, aiming to generate as few ground actions as possible without eliminating any viable solutions to the problem. This step can become a bottleneck in some domains due to the exponential blowup caused by the grounding process. A recent approach to alleviate this issue involves using the lifted (first-order) representation of the instance and generating all applicable ground actions on-the-fly during the search for each expanded state. In this paper, we propose a method that addresses this problem by enumerating all maximum cliques of a graph encoding the state and the action schema’s preconditions. We compare our method with state-of-the-art across 47 domains, showcasing improved performance in 23 domains. In some cases, simply changing the maximum clique enumeration algorithm results in a significant speedup compared to the state-of-the-art.

Place, publisher, year, edition, pages
IOS Press, 2023
Series
Frontiers in Artificial Intelligence and Applications, ISSN 0922-6389, E-ISSN 1879-8314 ; 372
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-198746 (URN)10.3233/FAIA230516 (DOI)001599323300270 ()2-s2.0-85175840167 (Scopus ID)9781643684369 (ISBN)9781643684376 (ISBN)
Conference
26th European Conference on Artificial Intelligence ECAI 2023, Kraków, Poland, 30 september - 4 october, 2023
Note

Funding Agencies|Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725, 2018-05973]

Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2026-02-05Bibliographically approved
Ståhlberg, S., Bonet, B. & Geffner, H. (2022). Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits. In: Akshat Kumar, Sylvie Thiébaux, Pradeep Varakantham, William Yeoh (eds) (Ed.), Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling (ICAPS 2022): . Paper presented at 32nd International Conference on Automated Planning and Scheduling, June 13 - 24, 2022 (pp. 629-637). Palo Alto: AAAI Press, 32
Open this publication in new window or tab >>Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits
2022 (English)In: Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling (ICAPS 2022) / [ed] Akshat Kumar, Sylvie Thiébaux, Pradeep Varakantham, William Yeoh (eds), Palo Alto: AAAI Press, 2022, Vol. 32, p. 629-637Conference paper, Published paper (Refereed)
Abstract [en]

It has been recently shown that general policies for many clas-sical planning domains can be expressed and learned in termsof a pool of features defined from the domain predicates usinga description logic grammar. At the same time, most descrip-tion logics correspond to a fragment of k-variable countinglogic (Ck ) for k = 2, that has been shown to provide a tightcharacterization of the expressive power of graph neural net-works. In this work, we make use of these results to under-stand the power and limits of using graph neural networks(GNNs) for learning optimal general policies over a numberof tractable planning domains where such policies are knownto exist. For this, we train a simple GNN in a supervised man-ner to approximate the optimal value function V ∗(s) of anumber of sample states s. As predicted by the theory, it is ob-served that general optimal policies are obtained in domainswhere general optimal value functions can be defined withC2 features but not in those requiring more expressive C3 fea-tures. In addition, it is observed that the features learned are inclose correspondence with the features needed to express V ∗in closed form. The theory and the analysis of the domainslet us understand the features that are actually learned as wellas those that cannot be learned in this way, and let us movein a principled manner from a combinatorial optimization ap-proach to learning general policies to a potentially, more ro-bust and scalable approach based on deep learning.

Place, publisher, year, edition, pages
Palo Alto: AAAI Press, 2022
Series
Proceedings of the International Conference on Automated Planning and Scheduling, ISSN 2334-0835, E-ISSN 2334-0843
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-188842 (URN)10.1609/icaps.v32i1.19851 (DOI)2-s2.0-85130563464 (Scopus ID)9781577358749 (ISBN)
Conference
32nd International Conference on Automated Planning and Scheduling, June 13 - 24, 2022
Available from: 2022-09-27 Created: 2022-09-27 Last updated: 2025-09-18Bibliographically approved
Ståhlberg, S., Bonet, B. & Geffner, H. (2022). Learning Generalized Policies without Supervision Using GNNs. In: Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning: Special Session on KR and Machine Learning. Paper presented at 19th International Conference on Principles of Knowledge Representation and Reasoning, Haifa, Israel, July 31 - August 5, 2022 (pp. 474-483). IJACAI Organization
Open this publication in new window or tab >>Learning Generalized Policies without Supervision Using GNNs
2022 (English)In: Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning: Special Session on KR and Machine Learning, IJACAI Organization , 2022, p. 474-483Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of learning generalized policies forclassical planning domains using graph neural networks fromsmall instances represented in lifted STRIPS. The problemhas been considered before but the proposed neural architec-tures are complex and the results are often mixed. In thiswork, we use a simple and general GNN architecture andaim at obtaining crisp experimental results and a deeper un-derstanding: either the policy greedy in the learned valuefunction achieves close to 100% generalization over instanceslarger than those used in training, or the failure must be under-stood, and possibly fixed, logically. For this, we exploit therelation established between the expressive power of GNNsand the C2 fragment of first-order logic (namely, FOL with2 variables and counting quantifiers). We find for examplethat domains with general policies that require more expres-sive features can be solved with GNNs once the states are ex-tended with suitable ”derived atoms” encoding role composi-tions and transitive closures that do not fit into C2. The workfollows an existing approach based on GNNs for learning op-timal general policies in a supervised fashion, but the learnedpolicies are no longer required to be optimal (which expandsthe scope, as many planning domains do not have general op-timal policies) and are learned without supervision. Interest-ingly, value-based reinforcement learning methods that aimto produce optimal policies, do not always yield policies thatgeneralize, as the goals of optimality and generality are inconflict in domains where optimal planning is NP-hard.

Place, publisher, year, edition, pages
IJACAI Organization, 2022
Series
Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), ISSN 2334-1033
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-188843 (URN)10.24963/kr.2022/49 (DOI)2-s2.0-85141885575 (Scopus ID)9781956792010 (ISBN)
Conference
19th International Conference on Principles of Knowledge Representation and Reasoning, Haifa, Israel, July 31 - August 5, 2022
Available from: 2022-09-27 Created: 2022-09-27 Last updated: 2024-08-22Bibliographically approved
Ståhlberg, S., Francès, G. & Seipp, J. (2021). Learning Generalized Unsolvability Heuristics for Classical Planning. In: Zhi-Hua Zhou (Ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21): . Paper presented at The Thirtieth International Joint Conference on Artificial Intelligence, Montreal, 19-27 August 2021 (pp. 4175-4181). International Joint Conferences on Artifical Intelligence (IJCAI)
Open this publication in new window or tab >>Learning Generalized Unsolvability Heuristics for Classical Planning
2021 (English)In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) / [ed] Zhi-Hua Zhou, International Joint Conferences on Artifical Intelligence (IJCAI) , 2021, p. 4175-4181Conference paper, Published paper (Refereed)
Abstract [en]

Recent work in classical planning has introduced dedicated techniques for detecting unsolvable states, i.e., states from which no goal state can be reached. We approach the problem from a generalized planning perspective and learn first-order-like formulas that characterize unsolvability for entire planning domains. We show how to cast the problem as a self-supervised classification task. Our training data is automatically generated and labeled by exhaustive exploration of small instances of each domain, and candidate features are automatically computed from the predicates used to define the domain. We investigate three learning algorithms with different properties and compare them to heuristics from the literature. Our empirical results show that our approach often captures important classes of unsolvable states with high classification accuracy. Additionally, the logical form of our heuristics makes them easy to interpret and reason about, and can be used to show that the characterizations learned in some domains capture exactly all unsolvable states of the domain. 

Place, publisher, year, edition, pages
International Joint Conferences on Artifical Intelligence (IJCAI), 2021
Series
Proceedings of the International Joint Conference on Artificial Intelligence, ISSN 1045-0823
Keywords
Automated planning, Learning, Artificial Intelligence, WASP
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-179326 (URN)10.24963/ijcai.2021/574 (DOI)001202335504034 ()2-s2.0-85123009540 (Scopus ID)9780999241196 (ISBN)
Conference
The Thirtieth International Joint Conference on Artificial Intelligence, Montreal, 19-27 August 2021
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish National Infrastructure for Computing (SNIC), 2018-05973
Available from: 2021-09-17 Created: 2021-09-17 Last updated: 2024-09-23Bibliographically approved
Ståhlberg, S. (2017). Methods for Detecting Unsolvable Planning Instances using Variable Projection. (Doctoral dissertation). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Methods for Detecting Unsolvable Planning Instances using Variable Projection
2017 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

In this thesis we study automated planning, a branch of artificialintelligence, which deals with construction of plans. A plan is typically an action sequence that achieves some specific goal. In particular, we study unsolvable planning instances, i.e. there is no plan. Historically, this topic has been neglected by the planning community, and up to recently the International Planning Competition has only evaluated planners on solvable planning instances. For many applications we can know, e.g. by design, that there is a solution, but this cannot be a general assumption. One example is penetration testing in computer security, where a system inconsidered safe if there is no plan for intrusion. Other examples are resource bound planning instances that have insufficient resources to achieve the goal.

The main theme of this thesis is to use variable projection to prove unsolvability of planning instances. We implement and evaluate two planners: the first checks variable projections with the goal of finding an unsolvable projection, and the second builds a pattern collection to provide dead-end detection. In addition to comparing the planners to existing planners, we also utilise a large computer cluser to statistically assess whether they can be optimised further. On the benchmarks of planning instances that we used, it turns out that further improvement is likely to come from supplementary techniques rather than optimisation. We pursued this and enhanced variable projections with mutexes, which yielded a very competitive planner. We also inspect whether unsolvable variable projections tend to be composed of variables that play different roles, i.e. they are not 'similar'. We devise a variable similarity measure to rate how similar two variables are on a scale, and statistically analyse it. The measure can differentiate between unsolvable and solvable planning instances quite well, and is integrated into our planners. We also define a binary version of the measure, namely, that two variables are isomorphic if they behave exactly the same in some optimal solution (extremely similar). With the help of isomorphic variables we identified a computationally tractable class of planning instances that meet certain restrictions. There are several special cases of this class that are of practical interest, and this result encompass them.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2017. p. 147
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1863
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-139802 (URN)10.3384/diss.diva-139802 (DOI)9789176854983 (ISBN)
Public defence
2017-10-03, Ada Lovelace, hus B, Campus Valla, Linköping, 13:15 (English)
Opponent
Supervisors
Funder
CUGS (National Graduate School in Computer Science)
Available from: 2017-09-11 Created: 2017-09-08 Last updated: 2022-02-18Bibliographically approved
Aghighi, M., Jonsson, P. & Ståhlberg, S. (2015). Tractable Cost-Optimal Planning over Restricted Polytree Causal Graphs. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence: . Paper presented at 29th AAAI Conference on Artificial Intelligence (AAAI-15), January 25–30, Austin, TX, USA. AAAI Press
Open this publication in new window or tab >>Tractable Cost-Optimal Planning over Restricted Polytree Causal Graphs
2015 (English)In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI Press, 2015Conference paper, Published paper (Refereed)
Abstract [en]

Causal graphs are widely used to analyze the complexity of planning problems. Many tractable classes have been identified with their aid and state-of-the-art heuristics have been derived by exploiting such classes. In particular, Katz and Keyder have studied causal graphs that are hourglasses (which is a generalization of forks and inverted-forks) and shown that the corresponding cost-optimal planning problem is tractable under certain restrictions. We continue this work by studying polytrees (which is a generalization of hourglasses) under similar restrictions. We prove tractability of cost-optimal planning by providing an algorithm based on a novel notion of variable isomorphism. Our algorithm also sheds light on the k-consistency procedure for identifying unsolvable planning instances. We speculate that this may, at least partially, explain why merge-and-shrink heuristics have been successful for recognizing unsolvable instances.

Place, publisher, year, edition, pages
AAAI Press, 2015
Series
Proceedings of the AAAI Conference on Artificial Intelligence, ISSN 2159-5399, E-ISSN 2374-3468
Keywords
automated planning, causal graph, polynomial-time algorithm, cost-optimal planning, polytree
National Category
Computer Systems
Identifiers
urn:nbn:se:liu:diva-118729 (URN)000485625503038 ()978-1-57735-703-2 (ISBN)
Conference
29th AAAI Conference on Artificial Intelligence (AAAI-15), January 25–30, Austin, TX, USA
Funder
CUGS (National Graduate School in Computer Science)
Available from: 2015-06-03 Created: 2015-06-03 Last updated: 2022-02-18
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4092-8175

Search in DiVA

Show all publications