Home  | Publications | Kmk 25

RtRank: Stratified Response Time Ranking for Data-Efficient Reward Modeling

MCML Authors

Abstract

Standard reinforcement learning from human feedback (RLHF) uses binary preferences (which option is better), ignoring the strength of that preference (how much better).This preference strength, crucial for decision-making under uncertainty and generalization, is hard to measure reliably.While human response times (RTs) during preference elicitation offer valuable implicit signals of strength, raw RTs are noisy and obscured by individual and contextual factors.To address this challenge of learning preference strength from RTs, we propose RtRank, a novel framework that robustly extracts this strength.RtRank leverages relative RT differences within carefully constructed strata (e.g., per-annotator) to rank pairwise comparisons by their inferred preference strength.By controlling for systemic variation, these strata enable robust learning of utility differences consistent with the RT-derived rankings, all while making minimal assumptions.Our contributions are threefold:(1) RtRank, a novel method that robustly learns preference strength by leveraging intra-stratum relative RT rankings;(2) empirical evidence of improved sample efficiency and robustness in synthetic preference learning tasks; and(3) the Pearson Distance Correlation (PDC), a novel metric that isolates cardinal utility learning from ordinal accuracy.

inproceedings KMK+25


NeurIPS 2025

39th Conference on Neural Information Processing Systems. San Diego, CA, USA, Nov 30-Dec 07, 2025. To be published.
Conference logo
A* Conference

Authors

T. Kaufmann • Y. Metz • D. Keim • E. Hüllermeier

Links

URL

Research Area

 A3 | Computational Models

BibTeXKey: KMK+25

Back to Top