Accurate pan-sharpening of multispectral images is essential for high-resolution remote sensing, yet supervised methods are limited by the need for paired training data and poor generalization. Existing unsupervised approaches often neglect the physical consistency between degradation and fusion and lack sufficient constraints, resulting in suboptimal performance in complex scenarios. We propose RevFus, a novel two-stage pan-sharpening framework. In the first stage, an invertible neural network models the degradation process and reverses it for fusion with cycle-consistency self-learning, ensuring a physically grounded mapping. In the second stage, structural detail compensation and spatial–spectral contrastive learning alleviate detail loss and enhance spectral–spatial fidelity. To further understand the network’s decision-making, we design a quantitative and systematic measure of model interpretability, the Interpretability Efficacy Coefficient (IEC). IEC integrates multiple statistics derived from SHapley Additive exPlanations (SHAP) values into a single unified score and try to evaluate how effectively a model balances spatial detail enhancement with spectral preservation. Experiments on three datasets demonstrate that RevFus outperforms state-of-the-art unsupervised and traditional methods, delivering superior spectral fidelity, enhanced spatial detail, and high model interpretability, thereby validating the effectiveness of the interpretable deep learning framework for robust, high-quality pan-sharpening.
article HZ26
BibTeXKey: HZ26