liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning for Partially Observable Linear Gaussian Systems Using Batch Dynamics of Noisy Observations
Linköpings universitet, Institutionen för systemteknik, Reglerteknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-6665-5881
Michigan State Univ, MI 48824 USA.
Linköpings universitet, Institutionen för systemteknik, Reglerteknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0003-3270-171X
2024 (Engelska)Ingår i: IEEE Transactions on Automatic Control, ISSN 0018-9286, E-ISSN 1558-2523, Vol. 69, nr 9, s. 6397-6404Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Reinforcement learning algorithms are commonly used to control dynamical systems with measurable state variables. If the dynamical system is partially observable, reinforcement learning algorithms are modified to compensate for the effect of partial observability. One common approach is to feed a finite history of input-output data instead of the state variable. In this article, we study and quantify the effect of this approach in linear Gaussian systems with quadratic costs. We coin the concept of L-Extra-Sampled-dynamics to formalize the idea of using a finite history of input-output data instead of state and show that this approach increases the average cost.

Ort, förlag, år, upplaga, sidor
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC , 2024. Vol. 69, nr 9, s. 6397-6404
Nyckelord [en]
Costs; History; Noise; Dynamical systems; Noise measurement; Heuristic algorithms; Data models; Linear quadratic Gaussian; partiially observable dynamical systems; reinforcement learning
Nationell ämneskategori
Reglerteknik
Identifikatorer
URN: urn:nbn:se:liu:diva-207993DOI: 10.1109/TAC.2024.3385680ISI: 001302507600064OAI: oai:DiVA.org:liu-207993DiVA, id: diva2:1903271
Anmärkning

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP); Alice Wallenberg Foundation; ZENITH, Excellence Center at Linkoeping-Lund in Information Technology (EL-LIIT); Sensor informatics and Decision-making for the Digital Transformation (SEDDIT); Wallenberg AI, Autonomous Systems and Software Program (WASP); National Science Foundation [ECCS-2227311]; Vinnova Competence Center LINK-SIC; Scalable Kalman Filters project through the Swedish Research Council

Tillgänglig från: 2024-10-03 Skapad: 2024-10-03 Senast uppdaterad: 2024-10-03

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Sök vidare i DiVA

Av författaren/redaktören
Adib Yaghmaie, FarnazGustafsson, Fredrik
Av organisationen
ReglerteknikTekniska fakulteten
I samma tidskrift
IEEE Transactions on Automatic Control
Reglerteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 354 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf