MachineLearningStatistics
Given i.i.d. data of -dimensional features and a hypothesis set , we seek a function that minimizes the risk functional. The empirical risk is the sample average of the loss function where for classification . It is obvious that . Therefore, for any fixed , by the Law of Large Numbers , with a rate of convergence given by Hoeffding’s InequalitySo with probability that is exponentially small in the number of samples, the empirical risk has large deviations from the true risk. Therefore we can use Empirical Risk Minimization (ERM), , to get a consistent estimator of the minimizer of the true risk : For finite hypothesis spaces, a union bound gives the generalization boundWe see that for ERM to succeed for general, infinite hypothesis spaces, one needs a sort of uniform LLN over , the sigma-algebra generated by the hypothesis set. This was established using various measures of complexity for . The common theme is the reduction of the analysis of an infinite class to an analysis of finite classes: