is Assistant Professor of Optimization & Data Analysis at TU Munich.
His research focuses on the mathematical foundations of signal and image processing. In particular, his research agenda covers randomized sensing methods, especially in compressed sensing, dimension reduction, and analog to digital conversion. His research involves not only the theoretical analysis of such methods but also their application, such as in a project on the non-destructive testing of steel pipes.
Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.
We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.
Establishing certified uncertainty quantification (UQ) in imaging processing applications continues to pose a significant challenge. In particular, such a goal is crucial for accurate and reliable medical imaging if one aims for precise diagnostics and appropriate intervention. In the case of magnetic resonance imaging, one of the essential tools of modern medicine, enormous advancements in fast image acquisition were possible after the introduction of compressive sensing and, more recently, deep learning methods. Still, as of now, there is no UQ method that is both fully rigorous and scalable. This work takes a step towards closing this gap by proposing a total variation minimization-based method for pixel-wise sharp confidence intervals for undersampled MRI. We demonstrate that our method empirically achieves the predicted confidence levels. We expect that our approach will also have implications for other imaging modalities as well as deep learning applications in computer vision.
Over the last few years, debiased estimators have been proposed in order to establish rigorous confidence intervals for high-dimensional problems in machine learning and data science. The core argument is that the error of these estimators with respect to the ground truth can be expressed as a Gaussian variable plus a remainder term that vanishes as long as the dimension of the problem is sufficiently high. Thus, uncertainty quantification (UQ) can be performed exploiting the Gaussian model. Empirically, however, the remainder term cannot be neglected in many realistic situations of moderately-sized dimensions, in particular in certain structured measurement scenarios such as Magnetic Resonance Imaging (MRI). This, in turn, can downgrade the advantage of the UQ methods as compared to non-UQ approaches such as the standard LASSO. In this paper, we present a method to improve the debiased estimator by sampling without replacement. Our approach leverages recent results of ours on the structure of the random nature of certain sampling schemes showing how a transition between sampling with and without replacement can lead to a weighted reconstruction scheme with improved performance for the standard LASSO. In this paper, we illustrate how this reweighted sampling idea can also improve the debiased estimator and, consequently, provide a better method for UQ in Fourier imaging.
Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.
Many algorithms for high-dimensional regression problems require the calibration of regularization hyperparameters. This, in turn, often requires the knowledge of the unknown noise variance in order to produce meaningful solutions. Recent works show, however, that there exist certain estimators that are pivotal, i.e., the regularization parameter does not depend on the noise level; the most remarkable example being the square-root lasso. Such estimators have also been shown to exhibit strong connections to distributionally robust optimization. Despite the progress in the design of pivotal estimators, the resulting minimization problem is challenging as both the loss function and the regularization term are non-smooth. To date, the design of fast, robust, and scalable algorithms with strong convergence rate guarantees is still an open problem. This work addresses this problem by showing that an iteratively reweighted least squares (IRLS) algorithm exhibits global linear convergence under the weakest assumption available in the literature. We expect our findings will also have implications for multi-task learning and distributionally robust optimization.
The evolution of image halftoning, from its analog roots to contemporary digital methodologies, encapsulates a fascinating journey marked by technological advancements and creative innovations. Yet the theoretical understanding of halftoning is much more recent. In this article, we explore various approaches towards shedding light on the design of halftoning approaches and why they work. We discuss both halftoning in a continuous domain and on a pixel grid. We start by reviewing the mathematical foundation of the so-called electrostatic halftoning method, which departed from the heuristic of considering the back dots of the halftoned image as charged particles attracted by the grey values of the image in combination with mutual repulsion. Such an attraction-repulsion model can be mathematically represented via an energy functional in a reproducing kernel Hilbert space allowing for a rigorous analysis of the resulting optimization problem as well as a convergence analysis in a suitable topology. A second class of methods that we discuss in detail is the class of error diffusion schemes, arguably among the most popular halftoning techniques due to their ability to work directly on a pixel grid and their ease of application. The main idea of these schemes is to choose the locations of the black pixels via a recurrence relation designed to agree with the image in terms of the local averages. We discuss some recent mathematical understanding of these methods that is based on a connection to Sigma-Delta quantizers, a popular class of algorithms for analog-to-digital conversion.
Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.
One of the most prominent methods for uncertainty quantification in high-dimen-sional statistics is the desparsified LASSO that relies on unconstrained ℓ1-minimization. The majority of initial works focused on real (sub-)Gaussian designs. However, in many applications, such as magnetic resonance imaging (MRI), the measurement process possesses a certain structure due to the nature of the problem. The measurement operator in MRI can be described by a subsampled Fourier matrix. The purpose of this work is to extend the uncertainty quantification process using the desparsified LASSO to design matrices originating from a bounded orthonormal system, which naturally generalizes the subsampled Fourier case and also allows for the treatment of the case where the sparsity basis is not the standard basis. In particular we construct honest confidence intervals for every pixel of an MR image that is sparse in the standard basis provided the number of measurements satisfies n≳max{slog2slogp,slog2p} or that is sparse with respect to the Haar Wavelet basis provided a slightly larger number of measurements.
Multidimensional Magnetic Resonance Imaging (MRI) is a versatile tool for microstructure mapping. We use a diffusion weighted inversion recovery spin echo (DW-IR-SE) sequence with spiral readouts at ultra-strong gradients to acquire a rich diffusion–relaxation data set with sensitivity to myelin water. We reconstruct 1D and 2D spectra with a two-step convex optimization approach and investigate a variety of multidimensional MRI methods, including 1D multi-component relaxometry, 1D multi-component diffusometry, 2D relaxation correlation imaging, and 2D diffusion-relaxation correlation spectroscopic imaging (DR-CSI), in terms of their potential to quantify tissue microstructure, including the myelin water fraction (MWF). We observe a distinct spectral peak that we attribute to myelin water in multi-component T1 relaxometry, T1-T2 correlation, T1-D correlation, and T2-D correlation imaging. Due to lower achievable echo times compared to diffusometry, MWF maps from relaxometry have higher quality. Whilst 1D multi-component T1 data allows much faster myelin mapping, 2D approaches could offer unique insights into tissue microstructure and especially myelin diffusion.
We investigate to what extent it is possible to solve linear inverse problems with ReLu networks. Due to the scaling invariance arising from the linearity, an optimal reconstruction function f for such a problem is positive homogeneous, i.e., satisfies f(λx)=λf(x) for all non-negative λ. In a ReLu network, this condition translates to considering networks without bias terms. We first consider recovery of sparse vectors from few linear measurements. We prove that ReLu-networks with only one hidden layer cannot even recover 1-sparse vectors, not even approximately, and regardless of the width of the network. However, with two hidden layers, approximate recovery with arbitrary precision and arbitrary sparsity level s is possible in a stable way. We then extend our results to a wider class of recovery problems including low-rank matrix recovery and phase retrieval. Furthermore, we also consider the approximation of general positive homogeneous functions with neural networks. Extending previous work, we establish new results explaining under which conditions such functions can be approximated with neural networks. Our results also shed some light on the seeming contradiction between previous works showing that neural networks for inverse problems typically have very large Lipschitz constants, but still perform very well also for adversarial noise. Namely, the error bounds in our expressivity results include a combination of a small constant term and a term that is linear in the noise level, indicating that robustness issues may occur only for very small noise levels.
Recovering a s-sparse signal vector x∈Cn from a comparably small number of measurements y:=(Ax)∈Cm is the underlying challenge of compressed sensing. By now, a variety of efficient greedy algorithms has been established and strong recovery guarantees have been proven for random measurement matrices A∈Cm×n.However, they require a strong concentration of A ∗ Ax around its mean x (in particular, the Restricted Isometry Property), which is generally not fulfilled for heavy-tailed matrices. In order to overcome this issue and even cover applications where only limited knowledge about the distribution of the measurements matrix is known, we suggest substituting A ∗ Ax by a median-of-means estimator.In the following, we present an adapted greedy algorithm, based on median-of-means, and prove that it can recover any s-sparse unit vector x∈Cn up to a l 2 -error ∥x−x^∥2<∈ with high probability, while only requiring a bound on the fourth moment of the entries of A. The sample complexity is of the order O(slog(nlog(1∈))log(1∈)).
Most of the compressive sensing literature in signal processing assumes that the noise present in the measurement has an adversarial nature, i.e., it is bounded in a certain norm. At the same time, the randomization introduced in the sampling scheme usually assumes an i.i.d. model where rows are sampled with replacement. In this case, if a sample is measured a second time, it does not add additional information. For many applications, where the statistical noise model is a more accurate one, this is not true anymore since a second noisy sample comes with an independent realization of the noise, so there is a fundamental difference between sampling with and without replacement. Therefore, a more careful analysis must be performed. In this short note, we illustrate how one can mathematically transition between these two noise models. This transition gives rise to a weighted LASSO reconstruction method for sampling without replacement, which numerically improves the solution of high-dimensional compressive imaging problems.
We investigate the compatibility of distributed noise-shaping quantization with random samples of bandlimited functions. Let f be a real-valued π-bandlimited function. Suppose R > 1 is a real number, and assume that {xi}mi=1 is a sequence of i.i.d random variables uniformly distributed on [−R~,R~], where R~>R is appropriately chosen. We show that on using a distributed noise-shaping quantizer to quantize the values of f at {xi}mi=1, a function f ♯ can be reconstructed from these quantized values such that ∥∥f−f♯∥∥L2[−R,R] decays with high probability as m and R~ increase.
Graph models and graph-based signals are becoming increasingly important in machine learning, natural sciences, and modern signal processing. In this paper, we address the problem of quantizing bandlimited graph signals. We introduce two classes of noise-shaping algorithms for graph signals that differ in their sampling methodologies. We demonstrate that these algorithms can be efficiently used to construct quantized representatives of bandlimited graph-based signals with bounded amplitude. Moreover, for one of the algorithms, we provide theoretical guarantees on the relative error between the quantized representative and the true signal.
In this paper, we propose 1-bit weighted Σ∆ quantization schemes of mixed order as a technique for digital halftoning. These schemes combine weighted Σ∆ schemes of different orders for two-dimensional signals so one can profit both from the better stability properties of low order schemes and the better accuracy properties of higher order schemes. We demonstrate that the resulting mixed-order Σ∆ schemes in combination with a padding strategy yield improved representation quality in digital halftoning as measured in the Feature Similarity Index.These empirical results are complemented by mathematical error bounds for the model of two-dimensional bandlimited signals as motivated by a mathematical model of human visual perception.
In this paper, we study error diffusion techniques for digital halftoning from the perspective of 1-bit quantization. We introduce a method to generate schemes for two-dimensional signals as a weighted combination of their one-dimensional counterparts and show that various error diffusion schemes proposed in the literature can be represented in this framework via schemes of first order. Under the model of two-dimensional bandlimited signals, which is motivated by a mathematical model of human visual perception, we derive quantitative error bounds for such weighted schemes. We see these bounds as a step towards a mathematical understanding of the good empirical performance of error diffusion, even though they are formulated in the supremum norm, which is known to not fully capture the visual similarity of images. Motivated by the correspondence between existing error diffusion algorithms and first-order schemes, we study the performance of the analogous weighted combinations of second-order schemes and show that they exhibit a superior performance in terms of guaranteed error decay for two-dimensional bandlimited signals. In extensive numerical simulations for real-world images, we demonstrate that with some modifications to enhance stability this superior performance also translates to the problem of digital halftoning. More concretely, we find that certain second-order weighted schemes exhibit competitive performance for digital halftoning of real-world images in terms of the Feature Similarity Index (FSIM), a state-of-the-art measure for image quality assessment.
In this paper, we study the problem of recovering two unknown signals from their convolution, which is commonly referred to as blind deconvolution. Reformulation of blind deconvolution as a low-rank recovery problem has led to multiple theoretical recovery guarantees in the past decade due to the success of the nuclear norm minimization heuristic. In particular, in the absence of noise, exact recovery has been established for sufficiently incoherent signals contained in lower-dimensional subspaces. However, if the convolution is corrupted by additive bounded noise, the stability of the recovery problem remains much less understood. In particular, existing reconstruction bounds involve large dimension factors and therefore fail to explain the empirical evidence for dimension-independent robustness of nuclear norm minimization. Recently, theoretical evidence has emerged for ill-posed behavior of low-rank matrix recovery for sufficiently small noise levels. In this work, we develop improved recovery guarantees for blind deconvolution with adversarial noise which exhibit square-root scaling in the noise level. Hence, our results are consistent with existing counterexamples which speak against linear scaling in the noise level as demonstrated for related low-rank matrix recovery problems.
We introduce novel algorithms to address some challenges in machine learning, including ill-conditioned low-rank matrix retrieval, constrained least squares, and high-dimensional regression with unknown noise. By bridging least squares with modern (non-)convex optimization, our methods achieve scalability, data efficiency, and robustness. We provide theoretical guarantees with minimal assumptions and numerically validate their performance.
Efficient surrogate modelling is a key requirement for uncertainty quantification in data-driven scenarios. In this work, a novel approach of using Sparse Random Features for surrogate modelling in combination with self-supervised dimensionality reduction is described. The method is compared to other methods on synthetic and real data obtained from crashworthiness analyses. The results show a superiority of the here described approach over state of the art surrogate modelling techniques, Polynomial Chaos Expansions and Neural Networks.
©all images: LMU | TUM