Home | Research | Groups | Johannes Maly

Research Group Johannes Maly

Johannes Maly

Prof. Dr.

Associate

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

Johannes Maly

is Junior Professor at the Working Group Mathematical Data Science and Artificial Intelligence at LMU Munich.

Publications @MCML

2025

[10]

S. Dirksen, W. Li and J. Maly.
Subspace estimation under coarse quantization.
SampTA 2025 - 15th International Conference on Sampling Theory and Applications. Vienna, Austria, Jul 28-Aug 01, 2025. To be published. Preprint available. URL

Abstract

We study subspace estimation from coarsly quantized data. In particular, we analyze two stochastic quantization schemes which use dithering: a one-bit quantizer combined with rectangular dither and a multi-bit quantizer with triangular dither. For each quantizer, we derive rigorous high probability bounds for the distances between the true and estimated signal subspaces. Using our analysis, we identify scenarios in which subspace estimation via triangular dithering qualitatively outperforms rectangular dithering. We verify in numerical simulations that our estimates are optimal in their dependence on the smallest non-zero eigenvalue of the target matrix.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

[9]

H.-H. Chou, J. Maly, C. M. Verdun, B. Freitas Paulo da Costa and H. Mirandola.
Get rid of your constraints and reparametrize: A study in NNLS and implicit bias.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

Over the past years, there has been significant interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks. Several works observed that when training linear diagonal networks on the square loss for regression tasks (which corresponds to overparametrized linear regression) gradient descent converges to special solutions, e.g., non-negative ones. We connect this observation to Riemannian optimization and view overparametrized GD with identical initialization as a Riemannian GD. We use this fact for solving non-negative least squares (NNLS), an important problem behind many techniques, e.g., non-negative matrix factorization. We show that gradient flow on the reparametrized objective converges globally to NNLS solutions, providing convergence rates also for its discretized counterpart. Unlike previous methods, we do not rely on the calculation of exponential maps or geodesics. We further show accelerated convergence using a second-order ODE, lending itself to accelerated descent methods. Finally, we establish the stability against negative perturbations and discuss generalization to other constrained optimization problems.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

[8]

V. Fojtik, M. Matveev, H.-H. Chou, G. Kutyniok and J. Maly.
Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization.
Preprint (May. 2025). arXiv

Abstract

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this theoretically, recent works examine gradient descent and its variants in simplified training settings, often assuming vanishing learning rates. These studies reveal various forms of implicit regularization, such as ℓ1-norm minimizing parameters in regression and max-margin solutions in classification. Concurrently, empirical findings show that moderate to large learning rates exceeding standard stability thresholds lead to faster, albeit oscillatory, convergence in the so-called Edge-of-Stability regime, and induce an implicit bias towards minima of low sharpness (norm of training loss Hessian). In this work, we argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization. We empirically demonstrate that the learning rate balances between low parameter norm and low sharpness of the trained model. We furthermore prove for diagonal linear networks trained on a simple regression task that neither implicit bias alone minimizes the generalization error. These findings demonstrate that focusing on a single implicit bias is insufficient to explain good generalization, and they motivate a broader view of implicit regularization that captures the dynamic trade-off between norm and sharpness induced by non-negligible learning rates.

MCML Authors

Vit Fojtik

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Maria Matveev

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Foundations of Artificial Intelligence

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

[7]

S. Dirksen, W. Li and J. Maly.
Subspace and DOA estimation under coarse quantization.
Preprint (Feb. 2025). arXiv

Abstract

We study direction-of-arrival (DOA) estimation from coarsely quantized data. We focus on a two-step approach which first estimates the signal subspace via covariance estimation and then extracts DOA angles by the ESPRIT algorithm. In particular, we analyze two stochastic quantization schemes which use dithering: a one-bit quantizer combined with rectangular dither and a multi-bit quantizer with triangular dither. For each quantizer, we derive rigorous high probability bounds for the distances between the true and estimated signal subspaces and DOA angles. Using our analysis, we identify scenarios in which subspace and DOA estimation via triangular dithering qualitatively outperforms rectangular dithering. We verify in numerical simulations that our estimates are optimal in their dependence on the smallest non-zero eigenvalue of the target matrix. The resulting subspace estimation guarantees are equally applicable in the analysis of other spectral estimation algorithms and related problems.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

2024

[6]

T. Yang, J. Maly, S. Dirksen and G. Caire.
Plug-In Channel Estimation With Dithered Quantized Signals in Spatially Non-Stationary Massive MIMO Systems.
IEEE Transactions on Communications 72.1 (Jan. 2024). DOI

Abstract

As the array dimension of massive MIMO systems increases to unprecedented levels, two problems occur. First, the spatial stationarity assumption along the antenna elements is no longer valid. Second, the large array size results in an unacceptably high power consumption if high-resolution analog-to-digital converters are used. To address these two challenges, we consider a Bussgang linear minimum mean square error (BLMMSE)-based channel estimator for large scale massive MIMO systems with one-bit quantizers and a spatially non-stationary channel. Whereas other works usually assume that the channel covariance is known at the base station, we consider a plug-in BLMMSE estimator that uses an estimate of the channel covariance and rigorously analyze the distortion produced by using an estimated, rather than the true, covariance. To cope with the spatial non-stationarity, we introduce dithering into the quantized signals and provide a theoretical error analysis. In addition, we propose an angular domain fitting procedure which is based on solving an instance of non-negative least squares. For the multi-user data transmission phase, we further propose a BLMMSE-based receiver to handle one-bit quantized data signals. Our numerical results show that the performance of the proposed BLMMSE channel estimator is very close to the oracle-aided scheme with ideal knowledge of the channel covariance matrix. The BLMMSE receiver outperforms the conventional maximum-ratio-combining and zero-forcing receivers in terms of the resulting ergodic sum rate.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

[5]

S. Dirksen and J. Maly.
Tuning-free one-bit covariance estimation using data-driven dithering.
Preprint (Jan. 2024). arXiv

Abstract

We consider covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry. Recent work has shown that a reliable estimator can be constructed if uniformly distributed dithers on [−λ,λ] are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if λ is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice λ needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces λ by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization – again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

2023

[4]

C. Kümmerle and J. Maly.
Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogenous low-dimensional structures from linear observations. Focussing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

[3]

J. Maly.
Robust sensing of low-rank matrices with non-orthogonal sparse decomposition.
Applied and Computational Harmonic Analysis 67 (Nov. 2023). 2024 ACHA Charles Chui Young Researcher Best Paper Award. DOI

Abstract

We consider the problem of recovering an unknown low-rank matrix with (possibly) non-orthogonal, effectively sparse rank-1 decomposition from measurements y gathered in a linear measurement process . We propose a variational formulation that lends itself to alternating minimization and whose global minimizers provably approximate up to noise level. Working with a variant of robust injectivity, we derive reconstruction guarantees for various choices of including sub-gaussian, Gaussian rank-1, and heavy-tailed measurements. Numerical experiments support the validity of our theoretical considerations.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

[2]

H.-H. Chou, J. Maly and D. Stöger.
How to induce regularization in linear models: A guide to reparametrizing gradient flow.
Preprint (Aug. 2023). arXiv

Abstract

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias in linear models, which encompass various basic regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases which are closely connected to ℓp- or trigonometric regularizers.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

[1]

J. Maly and R. Saab.
A simple approach for quantizing neural networks.
Preprint (Apr. 2023). arXiv

Abstract

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Data Science and Artificial Intelligence

Research Group Johannes Maly

Johannes Maly

Recent News @MCML

MCML Researchers With Five Papers at AISTATS 2025

2024 ACHA Charles Chui Young Researcher Best Paper Award for Johannes Maly

Publications @MCML

2025

2024

2023