25.09.2025

Teaser image to Compress Then Explain: Faster, Steadier AI Explanations - with One Tiny Step

Compress Then Explain: Faster, Steadier AI Explanations - With One Tiny Step

MCML Research Insight - With Giuseppe Casalicchio and Bernd Bischl

Imagine re-running feature importance plots and getting slightly different “top features.” Annoying, right? That uncertainty often comes from a quiet assumption: model explanation algorithms typically sample points from data at random.

A new ICLR 2025 Spotlight paper by MCML Junior Member Giuseppe Casalicchio, MCML Director Bernd Bischl, first author Hubert Baniecki and co-author Przemyslaw Biecek flips this script with a simple idea: compress your data distribution first, then explain the model. The authors call it Compress Then Explain (CTE) and it consistently makes explanations more accurate, more stable, and much faster.


Compress then explain

Figure 1: Garbage sample in, garbage explanation out. Sample then explain is a conventional approach to decrease the computational cost of explanation estimation. Although fast, sampling is inefficient and prone to error, which may even lead to changes in feature importance rankings. The authors propose compress then explain (CTE), a new paradigm for accurate, yet efficient, estimation of explanations based on a marginal distribution that is compressed, e.g. with kernel thinning.


«CTE often achieves on-par error using 2–3× fewer samples, i.e., requiring 2–3× fewer model inferences. CTE is a simple, yet powerful, plug-in for a broad class of methods that sample from a dataset, e.g. removal-based and global explanations.»


Giuseppe Casalicchio

MCML Junior Member

Why this matters

Modern explainers (SHAP, SAGE, expected gradients, feature effects) often need a background/reference set. When those points are drawn independently and identically distributed (i.i.d.) at random, you get estimation noise that can distort attributions and even reshuffle importance rankings. CTE replaces that random pick with a small, representative coreset built by kernel thinning, which is a principled way to keep points that best preserve the original data distribution. Result: less error from fewer samples.


How CTE works

  1. Compress the dataset using kernel thinning (COMPRESS++), which minimizes a distribution gap (MMD) between the compressed sample and full data.
  2. Explain using your favorite method, but with the compressed set as background (and even foreground, when relevant). The theory shows: if your compressed set matches the original distribution better, the explanation error is provably bounded and smaller.

Highlights from the Paper

  • Accuracy: On five benchmark datasets, CTE made explanations 20–45% more accurate and about 50% more consistent, compared to the baseline random sampling approach.
  • Efficiency: You reach the same error with 2–3 × fewer samples → 2–3 × fewer model inferences.
  • Broad applicability: Works with SHAP, SAGE, Expected Gradients, and Feature Effects; across tabular and vision setups.

Results

When tested on challenging real-world datasets like HouseCat6D and NOCS‑REAL275, GCE‑Pose consistently outperforms prior methods. Sometimes reducing pose error by up to 16% on tight benchmarks. That’s the difference between a blurry guess and a precise estimate, even in cluttered or occluded scenes.


Key Technical Insight

CTE’s success is in minimizing the maximum mean discrepancy (MMD) between the original and sampled distributions. The paper ties the explanation approximation error directly to this distance: smaller MMD means tighter error bounds for both local and global explanations.


Further Reading & Reference

Want steadier attributions without rewriting your stack? Published as a spotlight presentation at the A* conference ICLR 2025, you can explore the full paper and find the open-source code on GitHub.

H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
Efficient and Accurate Explanation Estimation with Distribution Compression.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. Spotlight Presentation. URL GitHub
Abstract

We discover a theoretical connection between explanation estimation and distribution compression that significantly improves the approximation of feature attributions, importance, and effects. While the exact computation of various machine learning explanations requires numerous model inferences and becomes impractical, the computational cost of approximation increases with an ever-increasing size of data and model parameters. We show that the standard i.i.d. sampling used in a broad spectrum of algorithms for post-hoc explanation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm of sample-efficient explainability. It relies on distribution compression through kernel thinning to obtain a data sample that best approximates its marginal distribution. CTE significantly improves the accuracy and stability of explanation estimation with negligible computational overhead. It often achieves an on-par explanation approximation error 2-3x faster by using fewer samples, i.e. requiring 2-3x fewer model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

MCML Authors

Share Your Research!


Get in touch with us!

Are you an MCML Junior Member and interested in showcasing your research on our blog?

We’re happy to feature your work—get in touch with us to present your paper.


Subscribe to RSS News feed

Related

Link to Predicting Health with AI - with researcher Simon Schallmoser

22.09.2025

Predicting Health With AI - With Researcher Simon Schallmoser

Simon Schallmoser uses AI to predict health risks, detect low blood sugar in drivers, and advance personalized, safer healthcare.

Link to GCE-Pose – predicting whole objects from partial views

18.09.2025

GCE-Pose – Predicting Whole Objects From Partial Views

From fragments to whole: GCE-Pose, published at CVPR 2025, enhances pose estimation with global context for smarter AI vision.

Link to Research Stay at Harvard University

17.09.2025

Research Stay at Harvard University

Hannah Laus spent 11 weeks at Harvard with the AI X-Change Program, advancing research on LLMs and exploring new collaborations.

Link to Robots Seeing in the Dark - with researcher Yannick Burkhardt

15.09.2025

Robots Seeing in the Dark - With Researcher Yannick Burkhardt

Yannick Burkhardt erforscht Event-Kameras, die Robotern ermöglichen, blitzschnell zu reagieren und auch im Dunkeln zu sehen.

Link to 3D Machine Perception Beyond Vision - with researcher Riccardo Marin

08.09.2025

3D Machine Perception Beyond Vision - With Researcher Riccardo Marin

Researcher Riccardo Marin explores 3D geometry and AI, from manufacturing to VR, making machine perception more human-like.

Back to Top