Home  | Publications | CLN+25

I2RF-TFCKD: Intra-Inter Representation Fusion With Time-Frequency Calibration Knowledge Distillation for Speech Enhancement

MCML Authors

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Principal Investigator

Abstract

In this paper, we propose an intra-inter representation fusion knowledge distillation (KD) framework with time-frequency calibration (I2RF-TFCKD) for SE, which achieves distillation through the fusion of multi-layer teacher-student feature flows. Different from previous distillation strategies for SE, the proposed framework fully utilizes the time-frequency differential information of speech while promoting global knowledge flow. Firstly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through residual fusion to form the fused feature set that enables inter-set knowledge interaction. Secondly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting, thus enabling refined allocation of distillation contributions across different layers according to speech characteristics. The proposed distillation strategy is applied to the dual-path dilated convolutional recurrent network (DPDCRN) that ranked first in the SE track of the L3DAS23 challenge. To evaluate the effectiveness of I2RF-TFCKD, we conduct experiments on both single-channel and multi-channel SE datasets. Objective evaluations demonstrate that the proposed KD strategy consistently and effectively improves the performance of the low-complexity student model and outperforms other distillation schemes.

misc CLN+25


Preprint

Oct. 2025

Authors

J. Cheng • R. Liang • Y. Ni • C. Xu • J. Li • W. Zhou • R. Liu • B. W. Schuller • X. Hao

Links

arXiv

Research Area

 B3 | Multimodal Perception

BibTeXKey: CLN+25

Back to Top