Home  | News

08.05.2026

Teaser image to Right answer, wrong reasoning - Is AI Thinking or Cheating?

Right Answer, Wrong Reasoning - Is AI Thinking or Cheating?

MCML Research Insight – With Xinpeng Wang and Barbara Plank

Imagine a student solving a math problem. The steps look perfectly logical, neatly written, and convincing. But in reality, the student already knew the answer and simply worked backwards to justify it. Would you still trust the solution?

 Key Insight


AI models can produce convincing reasoning while secretly relying on shortcuts that bypass the actual task.


When Reasoning Cannot Be Trusted

A similar phenomenon is emerging in modern AI systems. In their paper “Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort,” selected for an oral presentation at ICLR 2026, MCML Junior Member Xinpeng Wang, together with MCML PI Barbara Plank and collaborators from New York University (NYU), investigate how AI models can appear to reason correctly, while actually taking hidden shortcuts.

Many modern AI systems generate step-by-step explanations of their reasoning. This is often seen as a sign of transparency.

But these explanations are not always reliable. Models can exploit loopholes in how they are evaluated, achieving high scores without truly solving the problem. In such cases, the reasoning may look correct, even when it is not.


 Big Question


How can we tell whether an AI system is genuinely reasoning or simply exploiting shortcuts?


The Problem of Hidden Shortcuts

This behavior is known as reward hacking. It occurs when a model learns to maximize a score by exploiting weaknesses in the evaluation setup rather than solving the intended task.

The most concerning cases are implicit:

  • the answer appears correct
  • the reasoning looks plausible
  • the shortcut remains hidden

As highlighted in the paper, models can rely on subtle hints or flaws in the reward system while producing explanations that appear completely valid.


 Core Idea


Shortcut behavior can be revealed by measuring how early a model can arrive at a correct answer.


Measuring Reasoning Effort

The proposed method TRACE (Truncated Reasoning AUC Evaluation), is based on a simple intuition: solving a problem properly requires effort, while taking a shortcut does not.

Instead of analyzing the full reasoning, TRACE interrupts it:

  • the reasoning process is cut off at different points
  • the model is forced to give an answer early
  • performance is measured at each stage

If a model can already give correct answers early on, it likely relied on a shortcut rather than full reasoning.

 Overview of TRACE

Figure 1: Overview of TRACE. The chain-of-thought (CoT) is truncated at different percentages, multiple completions are sampled, and the averaged proxy reward is calculated to estimate the expected proxy reward. Evaluating the expected proxy reward across truncation points yields a curve whose AUC quantifies the model’s hidden reasoning effort relative to what is presented in the CoT.


 Key Finding


Models that exploit shortcuts tend to succeed much earlier in the reasoning process than genuine reasoning models.


What the Experiments Showed

Across tasks such as math and coding, a consistent pattern emerges:

  • shortcut-based models achieve high performance even with partial reasoning
  • genuine reasoning models need most of their reasoning steps
  • traditional monitoring methods often fail to detect hidden shortcuts
  • TRACE identifies these cases much more reliably

The results show clear improvements over existing methods for detecting hidden reward hacking.


 Takeaway


Correct answers alone are not enough, we also need to understand how those answers are produced.


Why This Matters

As AI systems are increasingly used in sensitive domains, trust becomes essential. If models can appear correct while relying on hidden shortcuts, this can lead to unexpected and potentially risky behavior.

This work highlights a shift in how AI systems should be evaluated: not only by their outputs, but also by the reasoning process behind them.


Further Reading & Reference

If you would like to learn more about detecting hidden shortcuts in AI reasoning, check out the full paper. The work was presented as an oral presentation at ICLR 2026 (International Conference on Learning Representations), one of the leading conferences in machine learning research.

A* Conference
X. Wang • N. Joshi • B. Plank • R. Angell • H. He
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort.
ICLR 2026 - 14th International Conference on Learning Representations. Rio de Janeiro, Brazil, Apr 23-27, 2026. Oral Presentation. To be published. Preprint available. arXiv

 


Share Your Research!


Get in touch with us!

Are you an MCML Junior Member and interested in showcasing your research on our blog?

We’re happy to feature your work. Get in touch with us to present your paper.

#blog #research #plank

Related

Link to MCML Delegation Visit to the UK

07.05.2026

MCML Delegation Visit to the UK

MCML delegation visited top U.S. universities to advance AI X-Change and foster collaboration in generative and medical AI.

Read more
Link to MCML at AISTATS 2026

01.05.2026

MCML at AISTATS 2026

MCML researchers are represented with 8 papers at AISTATS 2026 (7 Main, and 1 Workshop).

Read more
Link to Björn Ommer: How AI can transform society if we use it responsibly

28.04.2026

Björn Ommer: How AI Can Transform Society if We Use It Responsibly

MCML PI Björn Ommer explains the philosophy behind Stable Diffusion and why his team focuses on efficiency.

Read more
Link to Research Stay at Imperial College London

27.04.2026

Research Stay at Imperial College London

Lennart Bastian joined a research stay at Imperial College London via MCML AI X-Change, working on higher-order topology for complex systems.

Read more
Link to When Vision AI Hallucinates Details

23.04.2026

When Vision AI Hallucinates Details

Why do vision-language models invent details? Our PI Zeynep Akata and her team present a fix for AI hallucinations at CVPR 2026.

Read more
Back to Top