Home  | Publications | BHB+24

Dueling Bandits With Delayed Feedback

MCML Authors

Link to Profile Eyke Hüllermeier PI Matchmaking

Eyke Hüllermeier

Prof. Dr.

Principal Investigator

Abstract

Dueling Bandits is a well-studied extension of the Multi-Armed Bandits problem, in which the learner must select two arms in each time step and receives a binary feedback as an outcome of the chosen duel. However, all of the existing best arm identification algorithms for the Dueling Bandits setting assume that the feedback can be observed immediately after selecting the two arms. If this is not the case, the algorithms simply do nothing and wait until the feedback of the recent duel can be observed, which is a waste of runtime. We propose an algorithm that can already start a new duel even if the previous one is not finished and thus is much more time efficient. Our arm selection strategy balances the expected information gain of the chosen duel and the expected delay until we observe the feedback. By theoretically grounded confidence bounds we can ensure that the arms we discard are not the best arms with high probability.

inproceedings


DataNinja sAIOnARA 2024

DataNinja sAIOnARA Conference: Shaping Trustworthy AI: Opportunities, Innovation and Achievements for Reliable Approaches. Bielefeld, Germany, Jun 25-27, 2024.

Authors

J. Brandt • B. Haddenhorst • V. BengsE. Hüllermeier

Links

DOI

Research Area

 A3 | Computational Models

BibTeXKey: BHB+24

Back to Top