Boolean:	`(bicycle AND helmet) OR (head AND protection)` (always group AND in parenthesis)
Title:	`title:(climate change)`
Author:	`author:("bohr niels" OR "bohr n")` (avoid only full first name)
Phrase:	`"water pump control"` (does not work with wildcards)
Wildcards:	`wom?n pharm*`

Journal article

Finite-time Analysis of the Multiarmed Bandit Problem

In Machine Learning — 2002, Volume 47, Issue 3, pp. 235-256

By Auer, P.; Fischer, Paul¹; Cesa-Bianchi, N.

From

Department of Informatics and Mathematical Modeling, Technical University of Denmark¹

Abstract

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times.

One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others.

In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

Language:	English
Publisher:	Kluwer Academic Publishers
Year:	2002
Pages:	235-256
ISSN:	08856125 and 15730565
Types:	Journal article
DOI:	10.1023/A:1013689704352
ORCIDs:	Fischer, Paul

Keywords

Artificial Intelligence (incl. Robotics) Automation and Robotics Computer Science Computer Science, general adaptive allocation rules bandit problems finite horizon regret

Finite-time Analysis of the Multiarmed Bandit Problem

DTU Library

Address

Shortcuts

Log in?

Finite-time Analysis of the Multiarmed Bandit Problem

DTU Library

Address

Shortcuts