We explore the use of pure quantum policies, implemented via variational quantum circuits (VQCs), for offline reinforcement learning (RL). In contrast to hybrid models, our policy architecture contains no classical neural layers. Built on the MOOSE framework, we replace the classical policy with a VQC enhanced by trainable input and output weights. The policy is trained entirely offline using synthetic rollouts from a learned surrogate model of a physical cart-pole system. Evaluation in this simulated environment shows that the quantum policy performs on par with the classical baseline in terms of stability, smoothness, and reward accumulation. These results demonstrate that purely quantum models can effectively learn control strategies in model-based offline RL, offering a promising step toward real-world quantum-enhanced decision-making.
inproceedings SHW+25a
BibTeXKey: SHW+25a