liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An online algorithm for the risk-aware restless bandit
Xian Jiaotong Liverpool Univ, Peoples R China.
Xian Jiaotong Liverpool Univ, Peoples R China.
Linköping University, Department of Management and Engineering, Production Economics. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0003-3058-7431
2021 (English)In: European Journal of Operational Research, ISSN 0377-2217, E-ISSN 1872-6860, Vol. 290, no 2, p. 622-639Article in journal (Refereed) Published
Abstract [en]

The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework. (C) 2020 Elsevier B.V. All rights reserved.

Place, publisher, year, edition, pages
ELSEVIER , 2021. Vol. 290, no 2, p. 622-639
Keywords [en]
Markov process; Online optimization; Multi-armed bandit; Risk-aware; Risk measure
National Category
Computational Mathematics
Identifiers
URN: urn:nbn:se:liu:diva-172920DOI: 10.1016/j.ejor.2020.08.028ISI: 000602863200015OAI: oai:DiVA.org:liu-172920DiVA, id: diva2:1522783
Note

Funding Agencies|China Europe International Business Schools (CEIBS) [18ASCS]; Research Development Funding of Xian Jiaotong-Liverpool University [RDF-19-01-34]; Natural Science Foundation of China Young Scholar project [71902159]

Available from: 2021-01-26 Created: 2021-01-26 Last updated: 2021-01-26

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Search in DiVA

By author/editor
Tang, Ou
By organisation
Production EconomicsFaculty of Science & Engineering
In the same journal
European Journal of Operational Research
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 71 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf