BookRags.com Literature Guides Literature
Guides
Criticism & Essays Criticism &
Essays
Questions & Answers Questions &
Answers
Lesson Plans Lesson
Plans
My Bibliography Periodic Table U.S. Presidents Shakespeare Sonnet Shake-Up
Research Anything:        
History | Encyclopedias | Films | News | Create a Bibliography | More... Login | Register | Help
Not What You Meant?  There are 3 definitions for Bagging.

Bootstrap aggregating

Print-Friendly
About 2 pages (486 words)

Bookmark and Share Questions on this topic? Just ask!

Bootstrap aggregating (bagging) is a meta-algorithm to improve classification and regression models in terms of stability and classification accuracy. Bagging also reduces variance and helps to avoid overfitting. Although this method is usually applied to decision tree models, it can be used with any type of model. Bagging is a special case of the model averaging approach. Given a standard training set D of size N, we generate L new training sets <math>D_i</math> also of size N' (N' < N) by sampling examples uniformly from D, and with replacement. By sampling with replacement it is likely that some examples will be repeated in each <math>D_i</math>. If <math>N'=N</math>, then for large <math>N</math> the set <math>D_i</math> expected to have 63.2% of the examples of D, the rest being duplicates. This kind of sample is known as a bootstrap sample. The L models are fitted using the above L bootstrap samples and combined by averaging the output (in case of regression) or voting (in case of classification). One particular interesting point about bagging is that, since the method averages several predictors, it is not useful for improving linear models.

Contents

Example: Ozone data

This example is rather artificial, but illustrates the basic principles of bagging. Rousseeuw and Leroy (1986) describe a data set concerning ozone levels. The data are available via the classic data sets page. All computations were performed in R. A scatter plot reveals an apparently non-linear relationship between temperature and ozone. One way to model the relationship is to use a loess smoother. Such a smoother requires that a span parameter be chosen. In this example, a span of 0.5 was used. One hundred bootstrap samples of the data were taken, and the LOESS smoother was fit to each sample. Predictions from these 100 smoothers were then made across the range of the data. The first 10 predicted smooth fits appear as grey lines in the figure below. The lines are clearly very wiggly and they overfit the data - a result of the span being too low. The red line on the plot below represents the mean of the 100 smoothers. Clearly, the mean is more stable and there is less overfit. This is the bagged predictor. image:ozone.png

History

Bagging (Bootstrap aggregating) was proposed by Leo Breiman in 1994 to improve the classification by combining classifications of randomly generated training sets. See Breiman, 1994. Technical Report No. 421.

References

See also

View More Summaries on Bootstrap aggregating
 
Ask any question on Bootstrap aggregating and get it answered FAST!
Answer questions in BookRags Q&A and earn points toward
discounted or even FREE Study Guides and other BookRags products!
Learn more about BookRags Q&A
Copyrights
Bootstrap aggregating from Wíkipedia. ©2006 by Wíkipedia. Licensed under the GNU Free Documentation License. View a list of authors or edit this article.

Article Navigation
Join BookRagslearn moreJoin BookRags




About BookRags | Customer Service | Report an Error | Terms of Use | Privacy Policy