In recent years, overparametrization has received considerable attention in various fields, shown to accelerate training and promote simplicity. However, few works study the induced sparse regularization of the original parameters that is caused by combining overparametrization with explicit smooth regularization. Here, we present a unifying framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models or regularizers and have not been widely adopted in deep learning. In contrast, our method promises fully differentiable and approximation-free optimization for sparse regularizers and is thus compatible with the ubiquitous gradient descent paradigm. The proposed optimization transfer comprises overparameterization of selected parameters and a change of penalties. We prove that the surrogate objective is equivalent in the sense of identical global and local minima, thereby avoiding the introduction of spurious solutions. We comprehensively review sparsity-inducing parametrizations across different fields and combine them in our explicit surrogate regularization framework. We further extend their scope, point out improvements and present novel parametrizations. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems from high-dimensional regression to sparse neural network training.
article KMB+26
BibTeXKey: KMB+26