Home | Publications | SHW+25a

Learning Control With Simulated Variational Quantum Policies in a Surrogate Cart-Pole Environment

MCML Authors

Yize Sun

→ Group Volker Tresp
Database Systems, Data Mining and AI

Yunpu Ma

Dr.

→ Group Volker Tresp
Database Systems, Data Mining and AI
→ Co-Group Hinrich Schütze

Volker Tresp

Prof. Dr.

Principal Investigator

Database Systems, Data Mining and AI

Abstract

We explore the use of pure quantum policies, implemented via variational quantum circuits (VQCs), for offline reinforcement learning (RL). In contrast to hybrid models, our policy architecture contains no classical neural layers. Built on the MOOSE framework, we replace the classical policy with a VQC enhanced by trainable input and output weights. The policy is trained entirely offline using synthetic rollouts from a learned surrogate model of a physical cart-pole system. Evaluation in this simulated environment shows that the quantum policy performs on par with the classical baseline in terms of stability, smoothness, and reward accumulation. These results demonstrate that purely quantum models can effectively learn control strategies in model-based offline RL, offering a promising step toward real-world quantum-enhanced decision-making.

inproceedings SHW+25a