ApproximationTheoryNeuralNetworks

Barron’s paper:

  • Demonstrates a gap between fixed basis functions (lower bound) and adaptive basis functions (upper bound). The curse of dimensionality plagues both, in the lower it approaches an O(1) bound and from above the Barron norm intrinsically depends on dimensions (by the choice of bounded sets in frequency space).
  1. Describe the gap between approximation power and practical neural networks? Are there differences in for the shallow and deep case currently? And what are the biggest hurdles to be faced in the important/needed results for shallow networks? Deep networks?
  2. Is there a way of quantifying inductive bias based on approximation power?
  3. For approx of univariate Lip function by step activation, you get -steps needed for approximation error of but how to know where to place the steps? For proof by FTC, why does this yield average case bound and what is the estimate?
  4. In NN approximation power proofs, where does the curse of dimensionality arise in the error bounds? Is there a difference in the case of shallow to deep networks?
  5. Does there exist a “transition” from poor approximation to good approximation in context of the tradeoff of regularity of the function and depth? Can you increase the rate of approximation as a function of regularity/smoothness?
  6. Does Barron space contain subspaces where the ? Is it unintuitive that Barron space does not contain linear (or smooth) functions? For instance, this could never explain extrapolation, as the result only considers functions linear on a bounded set and then extended to the remainder of the domain.
  7. People talk about infinite-width limit, but there seems to be no distinction made between uncountably vs countably infinite width? In the case of the FTC proof we have a result for an uncountably infinitely wide network, but how does this compare to the relationship to the NTK or NN-GP limit?
  8. Analogous result of benefits of depth for Transformers?