Home | Publications | BYW+26

The Geometry of Reasoning: Self-Evaluation via Layerwise Trajectory Evolution

MCML Authors

Jinhe Bi

→ Group Volker Tresp
Database Systems, Data Mining and AI

Haokun Chen

Dr.

* Former Member

→ Group Volker Tresp
Database Systems, Data Mining and AI

Hinrich Schütze

Prof. Dr.

Core PI

Computational Linguistics

Volker Tresp

Prof. Dr.

Core PI

Database Systems, Data Mining and AI

Yunpu Ma

Dr.

→ Group Volker Tresp
Database Systems, Data Mining and AI
→ Co-Group Hinrich Schütze

Abstract

Large Reasoning Models (LRMs) enhance performance by generating explicit Chain-of-Thought (CoT) trajectories, yet enabling them to self-evaluate correctness without external supervision remains a critical challenge. Existing methods often rely on ground-truth labels or shallow output probabilities, neglecting the layerwise evolution of the reasoning trajectory. In this work, we introduce ourmethod (Geometry of Reasoning), a white-box self-evaluation framework based on layerwise trajectory evolution. ourmethod decomposes reasoning fidelity into two complementary dimensions: (1) Geometric Evolution, which synthesizes the first- and second-order evolution of layerwise hidden-state trajectories to quantify geometric progress in reasoning; and (2) Difficulty-Aware Calibration, which utilizes cross-entropy of reasoning progress to normalize the Geometric Evolution against intrinsic query uncertainty. By jointly modeling these factors, ourmethod effectively distinguishes the coherent evolution of correct reasoning from the chaotic trajectories of errors. Extensive experiments across eight LRMs and seven benchmarks demonstrate that ourmethod consistently outperforms state-of-the-art baselines in AUROC, AUPR, and FPR@95.

inproceedings BYW+26

ICML 2026

43rd International Conference on Machine Learning. Seoul, South Korea, Jul 06-11, 2026. To be published.

Authors

J. Bi • D. Yan • Y. Wang • W. Huang • H. Chen • G. Wan • M. Ye • X. Xiao • H. Schütze • V. Tresp • Y. Ma

Links

URL

Research Areas

A3 | Computational Models

B2 | Natural Language Processing

BibTeXKey: BYW+26

#p-schuetze #p-tresp