Home  | Publications | Nss25

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

MCML Authors

Link to Profile Hinrich Schütze PI Matchmaking

Hinrich Schütze

Prof. Dr.

Principal Investigator

Abstract

Language confusion—where large language models (LLMs) generate unintended languages against the user’s need—remains a critical challenge, especially for English-centric models. We present the first mechanistic interpretability (MI) study of language confusion, combining behavioral benchmarking with neuron-level analysis. Using the Language Confusion Benchmark (LCB), we show that confusion points (CPs)—specific positions where language switches occur—are central to this phenomenon. Through layer-wise analysis with TunedLens and targeted neuron attribution, we reveal that transition failures in the final layers drive confusion. We further demonstrate that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion while largely preserving general competence and fluency. Our approach matches multilingual alignment in confusion reduction for many languages and yields cleaner, higher-quality outputs. These findings provide new insights into the internal dynamics of LLMs and highlight neuron-level interventions as a promising direction for robust, interpretable multilingual language modeling.

inproceedings NSS25


Findings @EMNLP 2025

Findings of the Conference on Empirical Methods in Natural Language Processing. Suzhou, China, Nov 04-09, 2025.
Conference logo

Authors

E. Nie • H. Schmid • H. Schütze

Links

DOI

Research Area

 B2 | Natural Language Processing

BibTeXKey: NSS25

Back to Top