NeuralNetworksApproximationTheory
Open questions:
- Can every
-Lipschitz function on a bounded domain be uniformly approximated by a shallow network? (this must be for dimension independent rates) - Is a heavy-tailed measure necessary for exponential depth separations?
- Is
approximation rate achievable for either - Oscillations grow at
rate with a fast-decaying measure (Gaussian measure) - Oscillations grow
rate with a heavy-tailed measure
- Oscillations grow at
Current reads:
- Eldan & Shamir - The Power of Depth for Feedforward Neural Networks.pdf
- Bubek & Selke - A Universal Law of Robustness via Isoperimetry.pdf
- Hsu & Sanford - On the Approximation Power of Two-Layer Networks of Random ReLUs.pdf
- Safran & Eldan - Depth Separations in Neural Networks- What is Actually Being Separated?.pdf
- Safran & Reichman - Depth Separations in Neural Networks- Separating the Dimension from the Accuracy.pdf
- Venturi & Bruna - Depth separation beyond radial functions.pdf
Textbooks & Theses:
- Bach - Learning Theory from First Principles.pdf
- Foucart’s - Expressiveness of Shallow Networks.pdf
- Guhring - Approximation with Neural Networks from a Theoretical and Practical Perspective.pdf
- Motamed - Approximation Power of Deep Neural Networks An explanatory mathematical survey.pdf
- Peterson - Neural Network Theory.pdf
- Telgarsky - Deep learning theory lecture notes.pdf
- Telgarsky - Deep learning theory.pdf
- Venturi - Architectural properties of neural networks for function approximation.pdf
- Weinan & Ma - Towards a Mathematical Understanding of Neural Network-Based Machine Learning- what we know and what we don’t.pdf
Approximation Theory:
- Christensen - An Introduction to Frames and Riesz Bases.pdf
- Cohen, DeVore, Petrova - Optimal Stable Nonlinear Approximation.pdf
- DeVore - DeVore - Optimal nonlinear approximation.pdf
- DeVore, Hanin & Petrova - Neural Network Approximation.pdf
- DeVore - Nonlinear Approximation.pdf
- Handbook on Neural Information Processing.pdf
Harmonic & Functional Analysis:
- Carl - Entropy Numbers, s-Numbers, and Eigenvalue Problems.pdf
- Dahlke & Kutyniok - THE UNCERTAINTY PRINCIPLE ASSOCIATED WITH THE CONTINUOUS SHEARLET TRANSFORM.pdf
- Folland - Real Analysis- Modern Techniques and Their Applications.pdf
- Katznelson - An Introduction to Harmonic Analysis.pdf
- Stein & Shakarchi - FUNCTIONAL ANALYSIS INTRODUCTION TO FURTHER Topics IN ANALYSIS.pdf
- Talbut - A SHORT TOUR OF HARMONIC ANALYSIS.pdf
- Xiao & He - Uncertainty Inequality for Radon Transform on the Heisenberg Group.pdf
Shallow networks:
Classical results
- Breiman - Hinging Hyperplanes for Regression, Classification, and Function Approximation.pdf
- Funahashi - On the Approximate Realization of Continuous Mappings by Neural Networks .pdf
- Girosi & Anzellotti - Convergence Rates of Approximation by Translates.pdf
- Jones - A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training.pdf
- Kůrková - Kolmogorov’s Theorem Is Relevant .pdf
- Barron - Universal Approximation Bounds for Superpositions of a Sigmoidal Function .pdf
- Cybenko - Approximation by Superpositions of a Sigmoidal Function.pdf
- Hornik - Multilayer Feedforward Networks are Universal Approximators .pdf
- Kůrková - Dimension-Independent Rates of Approximation by Neural Networks.pdf
Approximation of Lipschitz Functions
Infinite-width Shallow Networks/Neural Tangent Kernel (NTK)
- Bach - On the relationship between multivariate splines and infinitely-wide neural networks.pdf
- Ji & Telgarsky - Neural tangent kernels, transportation mappings, and universal approximation.pdf
- Ongie & Soudry - A FUNCTION SPACE VIEW OF BOUNDED NORM INFINITE WIDTH RELU NETS- THE MULTIVARIATE CASE.pdf
- Weinan - KOLMOGOROV WIDTH DECAY AND POOR APPROXIMATORS IN MACHINE LEARNING- SHALLOW NEURAL NETWORKS, RANDOM FEATURE MODELS AND NEURAL TANGENT KERNELS.pdf
- Yehudai & Shamir - On the Power and Limitations of Random Features for Understanding Neural Networks.pdf
Shallow Networks as Integral Transform/Ridge Functions and Radon Transform methods
- Candès - Harmonic Analysis of Neural Networks.pdf
- Carroll - CONSTRUCTION OF NEURAL NETS USING THE RADON TRANSFORM.pdf
- Ito - Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory.pdf
- Klusowski & Barron - Approximation by combinations of relu and squared relu ridge functions with ℓ 1 and ℓ 0 controls.pdf
- Kůrková - Integral Transforms Induced by Heaviside Perceptrons.pdf
- Maiorov - On Best Approximation by Ridge Functions.pdf
- Maiorov - Best approximation by ridge functions in Lp-spaces.pdf
- Siegel - APPROXIMATION RATES FOR SHALLOW RELUk NEURAL NETWORKS ON SOBOLEV SPACES VIA THE RADON TRANSFORM.pdf
- Sonoda - A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks.pdf
- Unser - Ridges, Neural Networks, and the Radon Transform.pdf
Banach spaces of functions expressible by shallow networks
- Parhi & Nowak - Banach Space Representer Theorems for Neural Networks and Ridge Splines.pdf
- Spek - Duality for Neural Networks through Reproducing Kernel Banach Spaces.pdf
- Weinan & Ma - The Barron Space and the Flow-induced Function Spaces for Neural Network Models (Springer).pdf
Misc
- Bach - Breaking the Curse of Dimensionality with Convex Neural Networks.pdf
- Divol & Niles-Weed - Optimal transport map estimation in general function spaces.pdf
- Kainen, Kůrková & Voight - Approximation by neural networks is not continuous.pdf
- Kůrková - Kolmogorov’s Theorem and Multilayer Neural Networks .pdf
- Kůrková - Limitations of Shallow Networks.pdf
- Kůrková - Representations of Highly-Varying Functions by One-Hidden-Layer Networks.pdf
- Maiorov & Meir - On the near optimality of the stochastic approximation of smooth functions by neural networks.pdf
- Siegel - High-order approximation rates for shallow neural networks with cosine and ReLUk activation functions.pdf
- Siegel - Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks.pdf
Depth separations:
- Amsel & Bruna - On the Benefits of Rank in Attention
- Bu - DEPTH-WIDTH TRADE-OFFS FOR NEURAL NETWORKS VIA TOPOLOGICAL ENTROPY.pdf
- Chatziafratis - Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems.pdf
- Chatziafratis - Depth-Width Trade-offs for ReLU Networks via Sharkovsky’s Theorem.pdf
- Daniely - Depth Separation for Neural Networks.pdf
- Malach & Shalev-Shwartz - Is Deeper Better only when Shallow is Good?.pdf
- Parkisons & Shamir - Depth Separation in Norm-Bounded Infinite-Width Neural Networks.pdf
- Poggio - Why and When Can Deep – but Not Shallow – Networks Avoid the Curse of Dimensionality- a Review.pdf
- Rim, Venturi, Bruna, and Peherstorfer - DEPTH SEPARATION FOR REDUCED DEEP NETWORKS IN NONLINEAR MODEL REDUCTION- DISTILLING SHOCK WAVES IN NONLINEAR HYPERBOLIC PROBLEMS.pdf
- Safran & Shamir -Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks 1.pdf
- Sanford, Hsu, Telgarsky - Representational Strengths and Limitations of Transformers.pdf
- Telgarsky - Benefits of depth in neural networks.pdf
- Vardi & Shamir - Width is Less Important than Depth in ReLU Neural Networks 1.pdf
- Venturi & Bruna - Depth separation beyond radial functions.pdf
- Yarotsky - Error bounds for approximations with deep ReLU networks.pdf
- Yehudai, Shalev-Shwartz & Shamir The Connection Between Approximation, Depth Separation and Learnability in Neural Networks.pdf
- Zweig & Bruna - Exponential Separations in Symmetric Neural Networks.pdf
- Eldan & Shamir - The Power of Depth for Feedforward Neural Networks.pdf
- Telgarsky - Representation Benefits of Deep Feedforward Networks.pdf
Deep networks:
Misc
- Daubechies, DeVore, Foucart, Hanin & Petrova - Nonlinear Approximation and Deep ReLU Networks.pdf
- Gonon - The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality.pdf
- Hanin - Complexity of Linear Regions in Deep Networks.pdf
- Hanin - Deep ReLU Networks Have Surprisingly Few Activation Patterns.pdf
- Hanin - UNIVERSAL FUNCTION APPROXIMATION BY DEEP NEURAL NETS WITH BOUNDED WIDTH AND RELU ACTIVATIONS.pdf
- Yarotsky - Optimal approximation of continuous functions by very deep ReLU networks.pdf
- Yarotsky - The phase diagram of approximation rates for deep neural networks.pdf
Banach spaces of functions expressible by deep networks
- Pahri & Nowak - What Kinds of Functions Do Deep Neural Networks Learn? Insights from Variational Spline Theory.pdf
- Weinan - ON THE BANACH SPACES ASSOCIATED WITH MULTI-LAYER RELU NETWORKS.pdf
Expressivity and learning:
- Bengio - Representation Learning- A Review and New Perspectives.pdf
- Chen, Rotskoff, Bruna & Vanden-Eijden - A Dynamical Central Limit Theorem for Shallow Neural Networks.pdf
- Chizat & Bach - On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport.pdf
- Malach & Shalev-Shwartz - Is Deeper Better only when Shallow is Good?.pdf
- Rotskoff & Vanden-Eijnden - TRAINABILITY AND ACCURACY OF NEURAL NETWORKS- AN INTERACTING PARTICLE SYSTEM APPROACH.pdf
- Welper - Approximation Results for Gradient Descent trained Neural Networks.pdf
- Wojtowytsch & Weinan - Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective.pdf
Miscellaneous Topics & Views:
PDEs
Quantized/bounded networks:
Random approximations:
- Rahimi & Recht - Uniform Approximation of Functions with Random Bases.pdf