Home  | Publications | BYW+26

The Geometry of Reasoning: Self-Evaluation via Layerwise Trajectory Evolution

MCML Authors

Abstract

Large Reasoning Models (LRMs) enhance performance by generating explicit Chain-of-Thought (CoT) trajectories, yet enabling them to self-evaluate correctness without external supervision remains a critical challenge. Existing methods often rely on ground-truth labels or shallow output probabilities, neglecting the layerwise evolution of the reasoning trajectory. In this work, we introduce ourmethod (Geometry of Reasoning), a white-box self-evaluation framework based on layerwise trajectory evolution. ourmethod decomposes reasoning fidelity into two complementary dimensions: (1) Geometric Evolution, which synthesizes the first- and second-order evolution of layerwise hidden-state trajectories to quantify geometric progress in reasoning; and (2) Difficulty-Aware Calibration, which utilizes cross-entropy of reasoning progress to normalize the Geometric Evolution against intrinsic query uncertainty. By jointly modeling these factors, ourmethod effectively distinguishes the coherent evolution of correct reasoning from the chaotic trajectories of errors. Extensive experiments across eight LRMs and seven benchmarks demonstrate that ourmethod consistently outperforms state-of-the-art baselines in AUROC, AUPR, and FPR@95.

inproceedings BYW+26


ICML 2026

43rd International Conference on Machine Learning. Seoul, South Korea, Jul 06-11, 2026. To be published.
Conference logo
A* Conference

Authors

J. Bi • D. Yan • Y. Wang • W. Huang • H. Chen • G. Wan • M. Ye • X. Xiao • H. SchützeV. TrespY. Ma

Links

URL

Research Areas

 A3 | Computational Models

 B2 | Natural Language Processing

BibTeXKey: BYW+26

Back to Top