Sample Selection Bias
In a linear regression model, sample selection bias occurs when data on the dependent variable are missing nonrandomly, conditional on the independent variables. For example, if a researcher uses ordinary least squares (OLS) to estimate a regression model in which large values of the dependent variable are underrepresented in a sample, estimates of slope coefficients typically will be biased.
Hausman and Wise (1977) studied the problem of estimating the effect of education on income in a sample of persons with incomes below $15,000. This is known as a truncated sample and is an example of explicit selection on the dependent variable. This is shown in Figure 1, where individuals are sampled at three education levels: low (L), middle (M), and high (H). In the figure, sample truncation leads to an estimate of the effect of schooling that is biased downward from the true regression line as a result of the $15,000 ceiling on the dependent variable. In a variety of special conditions (Winship and Mare 1992), selection biases coefficients downward. In general, however, selection may bias estimated effects in either direction.
A sample that is restricted on the dependent variable is effectively selected on the error of the regression equation; at any value of X, observations with sufficiently large positive errors are eliminated from the sample.
This page contains 201 words.

Sample Selection Bias article
Read the rest of this article.
This article contains 4,203 words
(approx. 14 pages at 300 words per page).