Feature attribution methods are widely used to explain machine learning models, yet their evaluation is challenging due to competing quality criteria such as faithfulness, robustness, and sparsity. These criteria often conflict, and even alternative formulations of the same metric can yield inconsistent conclusions. We address this by introducing a unifying framework that analyzes systematic incompatibilities between measures of explanation quality. Within this framework, we develop two novel mathematical tools: a samplewise incompatibility index that quantifies systematic conflicts between criteria, and a generalized eigen-analysis that localizes where tradeoffs are concentrated within attribution results. Experiments on image classifiers show that this analysis provides insights beyond isolated metrics and complements current evaluation practices for feature attributions.
inproceedings DT26
BibTeXKey: DT26