In decision theory and estimation theory, a Bayes estimator is an estimator or decision rule that maximizes the posterior expected value of a utility function or minimizes the posterior expected value of a loss function. (See also prior probability.) Specifically, suppose an unknown parameter θ is known to have a prior distribution <math>\Pi</math>. Let <math>\delta</math> be an estimator of θ (based on some measurements), and let <math>R(\theta,\delta)</math> be a risk function, such as the mean squared error. The Bayes risk of <math>\delta</math> is defined as <math>E_\Pi \{ R(\theta, \delta) \}</math>, where the expectation is taken over the probability distribution of <math>\theta</math>. An estimator <math>\delta</math> is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators.
Examples
Risk functions are chosen depending on how one measures the distance between the estimate and the unknown parameter. Following are several examples of risk functions and the corresponding Bayes estimators. We denote the posterior generalized distribution function as <math>F</math>.
- If we take the mean squared error as a risk function, then it is not difficult to show that the Bayes' estimate of the unknown parameter is simply the posterior mean,
- <math>\widehat{\theta }(x) = E[\theta |X]=\int \theta f(\theta |x)\,d\theta.</math>
- A "linear" loss function, with <math> a>0 </math>, which yields the posterior median as the Bayes' estimate:
- <math> L(\theta,\widehat{\theta}) = a|\theta-\widehat{\theta}| </math>
- <math> F(\widehat{\theta }(x)|X) = \tfrac{1}{2} </math>
- Another "linear" loss function, which assigns different "weights" <math> a,b>0 </math> to over or sub estimation. It yields a quantile from the posterior distribution, and is a generalization of the previous loss function:
- <math> L(\theta,\widehat{\theta}) = \left\{\begin{matrix}
- <math> F(\widehat{\theta }(x)|X) = \frac{a}{a+b} </math>
- The following loss function is trickier: it yields either the posterior mode, or a point close to it depending on the curvature and properties of the posterior distribution. Small values of the parameter <math> K>0 </math> are recommended, in order to use the mode as an approximation (<math> L>0 </math>):
- <math> L(\theta,\widehat{\theta}) = \left\{\begin{matrix}
Other loss functions can be conceived, although the mean squared error is the most widely used and validated.


