BookRags.com Literature Guides Literature
Guides
Criticism & Essays Criticism &
Essays
Questions & Answers Questions &
Answers
Lesson Plans Lesson
Plans
My Bibliography Periodic Table U.S. Presidents Shakespeare Sonnet Shake-Up
Research Anything:        
History | Encyclopedias | Films | News | Create a Bibliography | More... Login | Register | Help
Not What You Meant?  There are 14 definitions for Bayesian.

Bayesian linear regression

Print-Friendly
About 3 pages (783 words)

Bookmark and Share Know this topic well? Help others and get FREE products!

In statistics, Bayesian linear regression is a Bayesian alternative to the more well-known ordinary least-squares linear regression. Consider standard linear regression problem, where we specify the conditional density of y given x predictor variables:

<math>y_{i} = \beta x_{i} + \epsilon_{i},\,</math>

where the noise <math>\epsilon</math> is i.i.d. and normally distributed

<math>\epsilon_{i} \sim N(0, \sigma^2).\,</math>

A common, linear least squares solution, is to estimate the slope <math>\hat{\beta}</math> using the Moore-Penrose pseudoinverse:

<math> \hat{\beta} = (X^{T}X)^{-1}X^{T}y</math>.

where <math>X</math> is the vector of <math>x_{i}</math> (of length <math>n</math>). This is a frequentist's view, and assumes we have enough measurements of <math>x_i</math> to say something meaningful about y. In the empirical Bayes approach, we will assume we have only a small sample of <math>x_{i}</math> for our individual measurement, and we seek to correct our estimate by "borrowing" information from a larger set of similar observations. Let us write our conditional likelihood as

<math>\rho(y|X,\beta,\sigma^{2}) \propto (\sigma^{2})^{-n/2} \exp\left(-\frac{1}{2{\sigma}^{2}}(y-\beta X)^{T}(y-\beta X)\right),\,</math>

We seek a natural conjugate prior--a joint density <math>\rho(\beta,\sigma^{2})</math> which is of the same functional form as the likelihood. Since the likelihood is quadratic in <math>\beta</math>, we re-write the likelihood so it is normal in <math>(\beta-\hat{\beta})</math>. Write

<math>(y-\beta X)^{T}(y-\beta X) = (y-\hat{\beta} X)^{T}(y-\hat{\beta} X) + (\beta - \hat{\beta})^{T}(X^{T}X)(\beta - \hat{\beta})</math>

Now re-write the likelihood as

<math>\rho(y|X,\beta,\sigma^{2}) \propto (\sigma^2)^{-v/2} \exp\left(-\frac{vs^{2}}{2{\sigma}^{2}}\right)(\sigma^2)^{-(n-v)/2} \exp\left(-\frac{1}{2{\sigma}^{2}}(\beta - \hat{\beta})^{T}(X^{T}X)(\beta - \hat{\beta})\right),\,</math>

where

<math>vs^{2} =(y-\hat{\beta} X)^{T}(y-\hat{\beta} X) \ , v = n-k</math>

with <math>k</math> as the number of parameters to estimate. This suggests a form for the priors:

<math>\rho(\beta,\sigma^{2}) = \rho(\sigma^{2})\rho(\beta|\sigma^{2}),\,</math>

where <math>\rho(\sigma^{2})</math> is an inverse-gamma distribution

<math> \rho(\sigma^{2}) \propto (\sigma^2)^{-(v_{0}/2+1)} \exp\left(-\frac{v_{0}s_{0}^{2}}{2{\sigma}^{2}}\right),\,</math>

and <math>\rho(\beta|\sigma^{2})</math> is a normal distribution

<math> \rho(\beta|\sigma^{2}) \propto (\sigma^2)^{-k} \exp\left(-\frac{1}{2{\sigma}^{2}}(\beta - \bar{\beta})^{T}(A)(\beta - \bar{\beta})\right),\,</math>

with <math>v_{0}</math> and <math>s_{0}^{2}</math> as the prior values of <math>v</math> and <math>s^{2}</math>, respectively. With the prior now specified, we can express the posterior distribution as

<math> \rho(\beta,\sigma^{2}|y,X) \propto \rho(y|X,\beta,\sigma^{2})\rho(\beta|\sigma^{2})\rho(\sigma^{2}) </math>
<math> \propto (\sigma^{2})^{-n/2} \exp\left(-\frac{1}{2{\sigma}^{2}}(y-\beta X)^{T}(y-\beta X)\right)</math>
<math> \times (\sigma^{2})^{-k} \exp\left(-\frac{1}{2{\sigma}^{2}}(\beta - \bar{\beta})^{T}(A)(\beta - \bar{\beta})\right).</math>
<math> \times (\sigma^2)^{-(v_{0}/2+1)} \exp\left(-\frac{v_{0}s_{0}^{2}}{2{\sigma}^{2}}\right)</math>

With some re-arrangement, we can re-write the posterior so that the posterior mean <math>\tilde{\beta}</math> is weighted average of the least squares estimator and the prior mean:

<math>\tilde{\beta} = (X^{T}X+A)^{-1}(X^{T}X\hat{\beta}+A\bar{\beta})</math>

where <math>U</math> comes from the LU decomposition of <math>A</math> (which is a positive-definite matrix by design)

<math> A = U^{T}U. \,</math>

This is the key result of the Empirical Bayes approach; it allows us to estimate the slope <math>\beta</math> for our original linear regression problem by combining estimates using the least squares estimate <math>\hat{\beta}</math> for a single set of measurements with the empirical prior estimate <math>\bar{\beta}</math> from a large collection of similar measurements. (Notice that the weighted average also depends on the empirical estimate of the prior covariance matrix <math>A</math>.) To justify this, collect the quadratic terms in the exponential and now express this as a quadratic form in <math>\beta-\tilde{\beta}</math>:

<math> (y-\beta X)^{T}(y-\beta X)) + (\beta - \bar{\beta})^{T}(A)(\beta - \bar{\beta}) = (v-W\beta)^{T}(v-W\beta) </math>
<math> = ns^{2} + (\beta - \bar{\beta})^{T}W^{T}W(\beta - \bar{\beta})</math>

where

<math> ns^{2} = (v - W \tilde{\beta})^{T}(v - W \tilde{\beta}), v = [y, U\bar{B}], W = [X, U] </math>

The posterior can now be expressed as a Normal distribution <math>N(\tilde{\beta},\sigma^{2}(X^{T}X+A)^{-1}</math> times an inverse-gamma distribution:

<math>\rho(\beta,\sigma^{2}|y,X) \propto (\sigma^{2})^{-k/2} \exp\left(-\frac{1}{2{\sigma}^{2}}(\beta - \tilde{\beta})^{T}(X^{T}X+A)(\beta - \tilde{\beta})\right)\times (\sigma^2)^{-(n+v_{0})/2+1} \exp\left(-\frac{(v_{0}s_{0}^{2}+ns^{2})}{2{\sigma}^{2}}\right)</math>

A similar analysis can be performed for general case of multi-variate regression for a Bayesian Estimation of covariance matrices. Example: Suppose the weights of a large population of 35-year-old men are normally distributed with expected value μ and standard deviation σ. A crude measuring instrument measures a man's weight with a measurement error that is normally distributed with expected value 0 and standard deviation τ. The man's true weight is not observable; his weight measured with error is observed. The conditional probability distribution of a randomly chosen man's true weight, given his weight-measured-with-error, can be found by using Bayes' theorem, and then the conditional expected value can be used as an estimate of his true weight, provided that the values of μ, σ, and τ are known. But they are not. One may use the data to estimate the standard deviation of the measurement errors by measuring each man multiple times. One may similarly estimate the population average weight and the population standard deviation of weights by weighing multiple men. These estimates of parameters based on the data are the occasion for the use of the word empirical. Finally, one may then estimate the aforementioned conditional expected true weight by using Bayes' theorem.

References

  • Bradley P. Carlin and Thomas A. Louis, Bayes and Empirical Bayes Methods for Data Analysis, Chapman & Hall/CRC, Second edition 2000,
  • Peter E. Rossi, Greg M. Allenby, and Robert McCulloch, Bayesian Statistics and Marketing, John Wiley & Sons, Ltd, 2006

External links

View More Summaries on Bayesian linear regression
 
Ask any question on Bayesian linear regression and get it answered FAST!
Answer questions in BookRags Q&A and earn points toward
discounted or even FREE Study Guides and other BookRags products!
Learn more about BookRags Q&A
Copyrights
Bayesian linear regression from Wíkipedia. ©2006 by Wíkipedia. Licensed under the GNU Free Documentation License. View a list of authors or edit this article.

Article Navigation
Join BookRagslearn moreJoin BookRags




About BookRags | Customer Service | Report an Error | Terms of Use | Privacy Policy