Home | Publications | GZW+26

SSL4RL: Revisiting Self-Supervised Learning as Intrinsic Reward for Visual-Language Reasoning

MCML Authors

Stefanie Jegelka

Prof. Dr.

Core PI

Foundations of Deep Neural Networks

Abstract

Vision-language models (VLMs) have shown remarkable abilities by integrating large language models with visual inputs. However, they often rely on textual shortcuts rather than adequately using visual evidence during reasoning. Although reinforcement learning (RL) can align models with desired behaviors, its application to VLMs has been hindered by the lack of scalable and reliable rewards. To overcome this challenge, we propose SSL4RL, a novel framework that leverages self-supervised learning (SSL) tasks as a source of verifiable rewards for RL. Our approach reformulates SSL objectives like rotation prediction and patch reconstruction into dense automatic rewards, removing the need for human preferences or AI evaluators. Experiments show that SSL4RL substantially improves performance on both vision-centric and vision-language reasoning benchmarks, with encouraging potentials on open-ended scenarios and stronger resilience to visual corruptions. Through systematic ablations, we identify key factors influencing SSL4RL, including data volume, model scale, model choice, task combination, and task difficulty, thereby offering new design principles for future work. Our implementation is open-sourced at https://github.com/PKU-ML/SSL4RL, with models hosted on Huggingface collection https://huggingface.co/collections/PKU-ML/ssl4rl.

inproceedings GZW+26

ICML 2026

43rd International Conference on Machine Learning. Seoul, South Korea, Jul 06-11, 2026. To be published. Preprint available.

Authors

X. Guo • R. Zhou • Y. Wang • Q. Zhang • C. Zhang • S. Jegelka • X. Wang • J. Chai • G. Yin • W. Lin • Y. Wang

Links

URL GitHub

In Collaboration

Meituan

Research Area

A3 | Computational Models

BibTeXKey: GZW+26

#p-jegelka