Home | Publications | NRT+24

Convergence of Gradient Descent for Learning Linear Neural Networks

MCML Authors

Gabin Maxime Nguegnang

* Former Member

→ Group Holger Rauhut
Mathematical Data Science and Artificial Intelligence

Holger Rauhut

Prof. Dr.

Core PI

Mathematical Data Science and Artificial Intelligence

Ulrich Terstiege

Dr.

→ Group Holger Rauhut
Mathematical Data Science and Artificial Intelligence

Abstract

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

article NRT+24

Advances in Continuous and Discrete Models

2024.23. Jul. 2024.

Authors

G. M. Nguegnang • H. Rauhut • U. Terstiege

Links

DOI

Research Area

A2 | Mathematical Foundations

BibTeXKey: NRT+24

#p-rauhut