Home | Publications | KBB+23

On the Challenges and Practices of Reinforcement Learning From Real Human Feedback

MCML Authors

Timo Kaufmann

→ Group Eyke Hüllermeier
Artificial Intelligence and Machine Learning

Sarah Ball

→ Group Frauke Kreuter
Social Data Science and AI

Jacob Beck

Dr.

* Former Member

→ Group Frauke Kreuter
Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

Principal Investigator

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

Principal Investigator

Social Data Science and AI

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.

inproceedings KBB+23

HLDM @ECML-PKDD 2023

1st Workshop on Hybrid Human-Machine Learning and Decision Making at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023.

Authors

T. Kaufmann • S. Ball • J. Beck • E. Hüllermeier • F. Kreuter

Links

DOI

Research Areas

A3 | Computational Models

C4 | Computational Social Sciences

BibTeXKey: KBB+23

#p-huellermeier #p-kreuter