Home | Publications | KMK+25

RtRank: Stratified Response Time Ranking for Data-Efficient Reward Modeling

MCML Authors

Timo Kaufmann

→ Group Eyke Hüllermeier
Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Principal Investigator

Artificial Intelligence and Machine Learning

Abstract

Standard reinforcement learning from human feedback (RLHF) uses binary preferences (which option is better), ignoring the strength of that preference (how much better).This preference strength, crucial for decision-making under uncertainty and generalization, is hard to measure reliably.While human response times (RTs) during preference elicitation offer valuable implicit signals of strength, raw RTs are noisy and obscured by individual and contextual factors.To address this challenge of learning preference strength from RTs, we propose RtRank, a novel framework that robustly extracts this strength.RtRank leverages relative RT differences within carefully constructed strata (e.g., per-annotator) to rank pairwise comparisons by their inferred preference strength.By controlling for systemic variation, these strata enable robust learning of utility differences consistent with the RT-derived rankings, all while making minimal assumptions.Our contributions are threefold:(1) RtRank, a novel method that robustly learns preference strength by leveraging intra-stratum relative RT rankings;(2) empirical evidence of improved sample efficiency and robustness in synthetic preference learning tasks; and(3) the Pearson Distance Correlation (PDC), a novel metric that isolates cardinal utility learning from ordinal accuracy.

inproceedings KMK+25