This chapter focuses on the approximation theory of deep ReLU neural networks, analyzing their ability to approximate various target functions with different network architectures. We begin by introducing the universal approximation theory of deep neural networks, stating that given enough neurons, neural networks can approximate general functions. We then delve into the fundamental properties of ReLU neural networks and explore the role of width and depth of neural networks, highlighting that increasing layers could be more effective than increasing width in improving approximation accuracy. Next, we discuss the approximation rates for Sobolev functions using fully connected and convolutional neural networks. To alleviate the curse of dimensionality, we further consider Korobov functions. Finally, we focus on the approximation properties of self-attention and transformers, which have become increasingly important in modern deep learning. These results shed light on the expressivity and reliability of deep learning models, providing valuable insights into networks’ behavior and performance.
BibTeXKey: LK25