Home  | Publications | BHH24

Identifying Copeland Winners in Dueling Bandits With Indifferences

MCML Authors

Link to Profile Eyke Hüllermeier PI Matchmaking

Eyke Hüllermeier

Prof. Dr.

Principal Investigator

Abstract

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

inproceedings


AISTATS 2024

27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024.
Conference logo
A Conference

Authors

V. Bengs • B. Haddenhorst • E. Hüllermeier

Links

URL

Research Area

 A3 | Computational Models

BibTeXKey: BHH24

Back to Top