Home | Publications | YYH+26

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

MCML Authors

Sikuan Yan

→ Group Volker Tresp
Database Systems, Data Mining and AI

Ercong Nie

→ Group Hinrich Schütze
Computational Linguistics

Zifeng Ding

Dr.

* Former Member

→ Group Volker Tresp
Database Systems, Data Mining and AI

Jinhe Bi

→ Group Volker Tresp
Database Systems, Data Mining and AI

Hinrich Schütze

Prof. Dr.

Core PI

Computational Linguistics

Volker Tresp

Prof. Dr.

Core PI

Database Systems, Data Mining and AI

Yunpu Ma

Dr.

→ Group Volker Tresp
Database Systems, Data Mining and AI
→ Co-Group Hinrich Schütze

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).

misc YYH+26

Preprint

Jan. 2026

Authors

S. Yan • X. Yang • Z. Huang • E. Nie • Z. Ding • Z. Li • X. Ma • J. Bi • K. Kersting • J. Z. Pan • H. Schütze • V. Tresp • Y. Ma

Links

arXiv

Research Areas

A3 | Computational Models

B2 | Natural Language Processing

BibTeXKey: YYH+26

#p-schuetze #p-tresp