Home | Publications | HCE+26

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

MCML Authors

Abstract

Flow and diffusion models produce high-quality samples, but adapting them to user preferences or constraints post-training remains costly and brittle, a challenge commonly called reward alignment. We argue that efficient reward alignment should be a property of the generative model itself, not an afterthought, and redesign the model for adaptability. We propose Diamond Maps, a stochastic flow-map model that enables efficient and accurate alignment to arbitrary rewards at inference time. Diamond Maps amortize many simulation steps into a single-step sampler, like flow maps, while preserving the stochasticity required for optimal reward adaptation. This design makes search, Sequential Monte Carlo, and guidance scalable by enabling efficient and consistent estimation of the value function. Our experiments show that Diamond Maps can be learned efficiently via distillation from GLASS Flows, achieve stronger reward-alignment performance, and scale better than existing alignment methods. Overall, our results point toward a practical route to generative models that can be rapidly adapted to arbitrary preferences and constraints at inference time.

inproceedings HCE+26


ICML 2026

43rd International Conference on Machine Learning. Seoul, South Korea, Jul 06-11, 2026. To be published. Preprint available.
Conference logo
A* Conference

Authors

P. Holderrieth • D. Chen • L. Eyring • I. Shah • G. Anantharaman • Y. He • Z. Akata • T. Jaakkola • N. M. Boffi • M. Simchowitz

Links

URL GitHub

Research Area

 B1 | Computer Vision

BibTeXKey: HCE+26

Back to Top