Boosting in data mining

What is Boosting?

Boosting is an efficient algorithm that is able to convert a weak learner into a strong learner.

Example:

Suppose we want to check that an email is “spam email” or  “safe email”?

In this case, there can be multiple possibilities like;

  • Rule 1: Email contains only links to some websites.
    • Decision: It is a spam
  • Rule 2: Email from an official email address. e.g [email protected].
    • Decision: It is not spam.
  • Rule 3: Email has a request to get private bank details. e.g bank account number and father/mother name etc.
    • Decision: It is a spam

Now the question is that the 3 rules discussed above or enough to classify an email as “spam” or not?

  • Answer: These 3 rules are not enough. These 3 rules are weak learner. So we need to boost these learners. We can boost the weak learners to the stronger learner by boosting.
  • Boosting can be done by combining and assigning weights to every weak learner.

Boosting have greater accuracy as compared to Bagging.

How does boosting algorithm work?

The boosting algorithm works to  generate multiple weak learners and combine the predictions of these weak learners to form one strong learner. These weak learners  are the result of  applying different Learning algorithms on different distributions of the given sample of the data set.

Is Random Forest a boosting algorithm?

No, Random Forest is not a boosting algorithm.  A random forest  is an averaging method or ensemble bagging  method  and it is used to reduce the variance of individual trees by randomly selecting many trees from the given sample of the dataset, and finally to calculate the average of all.

Is Boosting better than bagging?

Boosting tries to reduce bias  but Both of these are perform well when we want to reduce the  variance and to provide higher stability. In case of over-fitting problem, Bagging is the best solution, but Boosting can be helpful to increase it.

Why does boosting not Overfit?

Boosting not overfit, and some of the reasons are mentioned below;
As iterations move forward, the impact of changes is localized.
Parameters are not jointly optimized.
It involves the stage-wise estimation and this is the main reason to slows down the learning process of the algorithm.

Is gradient boosting supervised or unsupervised?

Gradient boosting is a supervised machine learning technique and it is used for classification and regression purpose.

Does boosting use bootstrap?

We can implement the boosting by using the bootstrap samples with the help of Gradient Boosting.

Why does boosting reduce bias?

Boosting tries its best to reduce the error in predictions, so we can say that boosting helps to reduce the biasness.

Types of boosting algorithm:

Three main types of boosting algorithm are as follows;

  1. XGBoost algorithm
  2. AdaBoost algorithm
  3. Gradient tree boosting algorithm.

Next Similar Tutorials

  1. Decision tree induction on categorical attributes  – Click Here
  2. Decision Tree Induction and Entropy in data mining – Click Here
  3. Overfitting of decision tree and tree pruning – Click Here
  4. Attribute selection Measures – Click Here
  5. Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
  6. Gini index for binary variables – Click Here
  7. Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
  8. Evaluation of a classifier by confusion matrix in data mining – Click Here
  9. Holdout method for evaluating a classifier in data mining – Click Here
  10. RainForest Algorithm / Framework – Click Here
  11. Boosting in data mining – Click Here
  12. Naive Bayes Classifier  – Click Here