ConvexAnalysisMachineLearningStatistics As the indicator risk functional is non-convex, one defines a convex surrogate loss in order to make the corresponding optimization problem convex. A convex loss is a convex function of the margin that upper bounds the indicator loss, (this property is actually a result of classification calibration). The generalization error is bound by and in fact the minimizer of corresponds to the Bayes error and the excess risk is upper bound by a function of the excess risk of given sufficient regularity of . This implies consistency of minimizing the convex surrogate risk, in aims of achieving the Bayes error.

Indeed, for a convex surrogate margin-based loss, the optimal hypothesis iswhere the second equality holds for and . While the minimizer may not be unique, one can assume that is non-decreasing. This is reasonable as one typically does not want to penalize the classifier for highly confident, correct predictions.

One desires Fisher Consistency of if the surrogate loss is to be a sensible replacement for ERM. It can be shown that consistency of a convex loss is equivalent to having the two regularity conditions

  1. is differentiable at 0
  2. To show that these conditions imply consistency, one wants for any choice of .
  • : is non-decreasing & minimizes set
  • : is non-decreasing & maximizes set
  • : is convex so & is minimized at is differentiable at 0 so
  • : is convex so minimizes iff zero is in the subdifferential of the expected -loss: and consider the subgradients of i.e. so that summing and subtracting the equations yield As and it must hold that when and when . The second equation implies that when it holds that and . Likewise, when it holds that and .

Therefore, for convex with the above assumptions is Fisher consistent.