Bagging and Bootstrap in Data Mining, Machine Learning

Bagging

Bootstrap Aggregation famously knows as bagging, is a powerful and simple ensemble method.

An ensemble method is a technique that combines the predictions from many machine learning algorithms together to make more reliable and accurate predictions than any individual model.It means that we can say that prediction of bagging is very strong.

Why we use bagging?

The main purpose of using the bagging technique is to improve Classification Accuracy.

How does Bagging work
For example, we have 1000 observations and 200 elements. In bagging, we will create several models with a subset of variables and a subset of observations. i.e we might create 300 trees with 300 random variables and 20 observations in each tree. After that, we can average the results of all the 300 tree’s (models) to get to our final prediction.

Model’s Derivation/Estimation (In General)

 

 

bagging data mining
Figure: bagging in data mining

Bootstrap

  • Accuracy Estimation
  • Sampling with replacement
  • Some may not be used, other may be used more than once

Bootstrap – At Abstract

bootstrap in data mining
Figure: bootstrap in data mining

Bootstrapping – in Detail

bootstraping in details
Figure: bootstrapping in details

Bagging

Advantages of Bagging
Figure: Bagging

Benefits of Bagging

  • Can also improve continuous label predictive model’s accuracy
  • Ideal for the parallel processing environment
  • Significantly greater accuracy than a single classifier
  • Reduces the variance of the individual model
  • Best in case of diverse classifiers

Differences and similarities or comparison of  boosting and bagging

Differences in boosting and bagging

Similarities of boosting and bagging

Only Boosting determines weights for the data to tip the scales in favor of the most difficult cases.Both generate several training data sets by random sampling…
Only Boosting tries to reduce bias.

Boosting can increase the over-fitting problem.

Bagging may solve the over-fitting problem.

Both are good at reducing the variance.

Both are good at providing higher stability.

A weighted average for Boosting and the equally weighted average for BaggingBoth make the final decision by taking the majority of them (or averaging  the N learners)
Boosting focus is to add new models that do well where previous models fail.Both are ensemble methods to get N learners from 1 learner…
Please Share This Article with Friends
Fazal Rehman Shamil
Welcome to all friends. The reason for our success is only your love for T4Tutorials. Our team is always available to answer your queries regarding any kind of confusions or discussion regarding your study and career matters. For discussion with us please join our facebook group "T4Tutorials.com". The link of the group is mentioned below. Thanks and love to all for connecting with us. We are nothing without you. Love you all.....
https://web.facebook.com/groups/2066136233601097/