liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Multistate Temporal Difference Target for Model-Free Reinforcement Learning
Univ Newcastle, Australia.
Univ Newcastle, Australia.
Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska fakulteten.
2025 (Engelska)Ingår i: IEEE Transactions on Neural Networks and Learning Systems, ISSN 2162-237X, E-ISSN 2162-2388, Vol. 36, nr 9, s. 16854-16863Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value function estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. We propose an enhanced multistate TD (MSTD) target that utilizes multiple subsequent states for a more accurate value function estimation compared to traditional TD learning, which relies on a single subsequent state. Building on this new MSTD concept, we develop actor-critic algorithms that include the management of replay buffers in two modes and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Numerical experiment results demonstrate that algorithms employing the MSTD target improve learning performance compared to traditional methods. In addition, we analyze the convergence of Q-learning with MSTD.

Ort, förlag, år, upplaga, sidor
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC , 2025. Vol. 36, nr 9, s. 16854-16863
Nyckelord [en]
Reinforcement learning; Estimation; Training; Convergence; Trajectory; Accuracy; Temporal difference learning; Optimization; Monte Carlo methods; Indexes; Actor-critic learning; Q value; reinforcement learning; state-action value; temporal difference (TD)
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:liu:diva-213694DOI: 10.1109/TNNLS.2025.3564078ISI: 001484784300001PubMedID: 40343824Scopus ID: 2-s2.0-105004943838OAI: oai:DiVA.org:liu-213694DiVA, id: diva2:1959573
Tillgänglig från: 2025-05-21 Skapad: 2025-05-21 Senast uppdaterad: 2026-04-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMedScopus

Sök vidare i DiVA

Av författaren/redaktören
Zhang, Lepeng
Av organisationen
Institutionen för datavetenskapTekniska fakulteten
I samma tidskrift
IEEE Transactions on Neural Networks and Learning Systems
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 68 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf