Home  | Publications | LHL+25

DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions

MCML Authors

Abstract

Diffusion-based world models have demonstrated strong capabilities in synthesizing realistic long-horizon trajectories for offline reinforcement learning (RL). However, many existing methods do not directly generate actions alongside states and rewards, limiting their compatibility with standard value-based offline RL algorithms that rely on one-step temporal difference (TD) learning. While prior work has explored joint modeling of states, rewards, and actions to address this issue, such formulations often lead to increased training complexity and reduced performance in practice. Therefore, in this paper, we propose a diffusion-based world model that generates state-reward trajectories conditioned on the current state, action, and return-to-go value, and efficiently infers missing actions via an inverse dynamics model (IDM). This modular design produces complete synthetic transitions suitable for one-step TD-based offline RL, enabling effective and computationally efficient training. Empirically, we show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories, consistently outperforming prior diffusion-based baselines across multiple tasks in the D4RL benchmark.

inproceedings


WM @ICML 2025

Workshop on Building Physically Plausible World Models at the 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025.

Authors

Z. Li • X. Han • Y. Li • N. StraußM. Schubert

Links


Research Area

 A3 | Computational Models

BibTeXKey: LHL+25

Back to Top