liu.seSearch for publications in DiVA
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning for Partially Observable Linear Gaussian Systems Using Batch Dynamics of Noisy Observations
Linköpings universitet, Institutionen för systemteknik, Reglerteknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-6665-5881
Michigan State Univ, MI 48824 USA.
Linköpings universitet, Institutionen för systemteknik, Reglerteknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0003-3270-171X
2024 (engelsk)Inngår i: IEEE Transactions on Automatic Control, ISSN 0018-9286, E-ISSN 1558-2523, Vol. 69, nr 9, s. 6397-6404Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Reinforcement learning algorithms are commonly used to control dynamical systems with measurable state variables. If the dynamical system is partially observable, reinforcement learning algorithms are modified to compensate for the effect of partial observability. One common approach is to feed a finite history of input-output data instead of the state variable. In this article, we study and quantify the effect of this approach in linear Gaussian systems with quadratic costs. We coin the concept of L-Extra-Sampled-dynamics to formalize the idea of using a finite history of input-output data instead of state and show that this approach increases the average cost.

sted, utgiver, år, opplag, sider
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC , 2024. Vol. 69, nr 9, s. 6397-6404
Emneord [en]
Costs; History; Noise; Dynamical systems; Noise measurement; Heuristic algorithms; Data models; Linear quadratic Gaussian; partiially observable dynamical systems; reinforcement learning
HSV kategori
Identifikatorer
URN: urn:nbn:se:liu:diva-207993DOI: 10.1109/TAC.2024.3385680ISI: 001302507600064OAI: oai:DiVA.org:liu-207993DiVA, id: diva2:1903271
Merknad

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP); Alice Wallenberg Foundation; ZENITH, Excellence Center at Linkoeping-Lund in Information Technology (EL-LIIT); Sensor informatics and Decision-making for the Digital Transformation (SEDDIT); Wallenberg AI, Autonomous Systems and Software Program (WASP); National Science Foundation [ECCS-2227311]; Vinnova Competence Center LINK-SIC; Scalable Kalman Filters project through the Swedish Research Council

Tilgjengelig fra: 2024-10-03 Laget: 2024-10-03 Sist oppdatert: 2024-10-03

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Søk i DiVA

Av forfatter/redaktør
Adib Yaghmaie, FarnazGustafsson, Fredrik
Av organisasjonen
I samme tidsskrift
IEEE Transactions on Automatic Control

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 353 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf