liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Adib Yaghmaie, FarnazORCID iD iconorcid.org/0000-0002-6665-5881
Publications (2 of 2) Show all publications
Modares, A., Sadati, N., Esmaeili, B., Adib Yaghmaie, F. & Modares, H. (2024). Safe Reinforcement Learning via a Model-Free Safety Certifier. IEEE Transactions on Neural Networks and Learning Systems, 35(3), 3302-3311
Open this publication in new window or tab >>Safe Reinforcement Learning via a Model-Free Safety Certifier
Show others...
2024 (English)In: IEEE Transactions on Neural Networks and Learning Systems, ISSN 2162-237X, E-ISSN 2162-2388, Vol. 35, no 3, p. 3302-3311Article in journal (Refereed) Published
Abstract [en]

This article presents a data-driven safe reinforcement learning (RL) algorithm for discrete-time nonlinear systems. A data-driven safety certifier is designed to intervene with the actions of the RL agent to ensure both safety and stability of its actions. This is in sharp contrast to existing model-based safety certifiers that can result in convergence to an undesired equilibrium point or conservative interventions that jeopardize the performance of the RL agent. To this end, the proposed method directly learns a robust safety certifier while completely bypassing the identification of the system model. The nonlinear system is modeled using linear parameter varying (LPV) systems with polytopic disturbances. To prevent the requirement for learning an explicit model of the LPV system, data-based $\lambda$ -contractivity conditions are first provided for the closed-loop system to enforce robust invariance of a prespecified polyhedral safe set and the systems asymptotic stability. These conditions are then leveraged to directly learn a robust data-based gain-scheduling controller by solving a convex program. A significant advantage of the proposed direct safe learning over model-based certifiers is that it completely resolves conflicts between safety and stability requirements while assuring convergence to the desired equilibrium point. Data-based safety certification conditions are then provided using Minkowski functions. They are then used to seemingly integrate the learned backup safe gain-scheduling controller with the RL controller. Finally, we provide a simulation example to verify the effectiveness of the proposed approach.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2024
Keywords
Data-driven control; gain-scheduling control; reinforcement learning (RL); safe control
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-193589 (URN)10.1109/TNNLS.2023.3264815 (DOI)000973264800001 ()37053065 (PubMedID)
Note

Funding Agencies|Excellence Centerat Linkoeping-Lund in Information Technology (ELLIIT); ZENITH

Available from: 2023-05-09 Created: 2023-05-09 Last updated: 2024-10-10Bibliographically approved
Adib Yaghmaie, F. & Modares, H. (2023). Online Optimal Tracking of Linear Systems with Adversarial Disturbances. Transactions on Machine Learning Research (04)
Open this publication in new window or tab >>Online Optimal Tracking of Linear Systems with Adversarial Disturbances
2023 (English)In: Transactions on Machine Learning Research, E-ISSN 2835-8856, no 04Article in journal (Refereed) Published
Abstract [en]

This paper presents a memory-augmented control solution to the optimal reference tracking problem for linear systems subject to adversarial disturbances. We assume that the dynamics of the linear system are known and that the reference signal is generated by a linear system with unknown dynamics. Under these assumptions, finding the optimal tracking controller is formalized as an online convex optimization problem that leverages memory of past disturbance and reference values to capture their temporal effects on the performance. That is, a (disturbance, reference)-action control policy is formalized, which selects the control actions as a linear map of the past disturbance and reference values. The online convex optimization is then formulated over the parameters of the policy on its past disturbance and reference values to optimize general convex costs. It is shown that our approach outperforms robust control methods and achieves a tight regret bound O(√T) where in our regret analysis, we have benchmarked against the best linear policy.

National Category
Control Engineering
Identifiers
urn:nbn:se:liu:diva-197389 (URN)
Available from: 2023-09-04 Created: 2023-09-04 Last updated: 2024-01-08
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6665-5881

Search in DiVA

Show all publications