We study approximation and statistical learning properties of deep ReLU networks under structural assumptions that mitigate the curse of dimensionality. We prove minimax-optimal uniform approximation rates for s-Hölder smooth functions defined on sets with low Minkowski dimension using fully connected networks with flexible width and depth, improving existing results by logarithmic factors even in classical full-dimensional settings. A key technical ingredient is a new memorization result for deep ReLU networks that enables efficient point fitting with dense architectures. We further introduce a class of compositional models in which each component function is smooth and acts on a domain of low intrinsic dimension. This framework unifies two common assumptions in the statistical learning literature, structural constraints on the target function and low dimensionality of the covariates, within a single model. We show that deep networks can approximate such functions at rates determined by the most difficult function in the composition. As an application, we derive improved convergence rates for empirical risk minimization in nonparametric regression that adapt to smoothness, compositional structure, and intrinsic dimensionality.
BibTeXKey: NL26