Home  | Publications | KFB+25

Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization

MCML Authors

Abstract

Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose D-Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local minimum under D-Gating is also a local minimum using non-smooth structured L2,2/D penalization, and further show that the D-Gating objective converges at least exponentially fast to the L2,2/D-regularized loss in the gradient flow limit. Together, our results show that D-Gating is theoretically equivalent to solving the original group sparsity problem, yet induces distinct learning dynamics that evolve from a non-sparse regime into sparse optimization. We validate our theory across vision, language, and tabular tasks, where D-Gating consistently delivers strong performance-sparsity tradeoffs and outperforms both direct optimization of structured penalties and conventional pruning baselines.

inproceedings


NeurIPS 2025

39th Conference on Neural Information Processing Systems. San Diego, CA, USA, Nov 30-Dec 07, 2025. Spotlight Presentation. To be published. Preprint available.
Conference logo
A* Conference

Authors

C. Kolb • L. Frost • B. BischlD. Rügamer

Links


Research Area

 A1 | Statistical Foundations & Explainability

BibTeXKey: KFB+25

Back to Top