BayesianMachineLearningMachineLearningStatistics Given a data distribution over the Bayes error is the smallest generalization error over all measurable classifiers : Any hypothesis with is called a Bayes classifier so that maximizes the class posterior:and any satisfying has . Conversely, the noiseis the minimum of the class posterior which has expectation equal to the Bayes error The excess risk of a given hypothesis to the Bayes risk has the following decomposition into the

  1. Estimation error: How close a particular hypothesis can get to the best-in-class error
  2. Approximation error: How close the hypothesis class can get to the true minimum achievable error (considering noise in measurements) respectively: In regression, where , ad the loss is the squared error, the error rate (noise) can be taken to be the expected conditional variance: 𝕪𝕩where the third equality follows by the definition of conditional expectation and the last by the tower property of conditional expectation. When is not random this is clearly zero. This is the mean-squared error of the conditional expectation.