Overfitting of decision tree and tree pruning, How to avoid overfitting in data mining

Prof. Fazal Rehman Shamil
Last modified on November 10th, 2019

Overfitting of tree

Before overfitting of the tree, let’s revise test data and training data;

Training Data:

Training data is the data that is used for prediction.

Test Data:

Test data is used to assess the power of training data in prediction.


Overfitting means too many un-necessary branches in the tree. Overfitting results in different kind of anomalies that are the results of outliers and noise.

[quads id=1]

How to avoid overfitting?

There are two techniques to avoid overfitting;

  1. Pre-pruning
  2. Post-pruning


Pree-Pruning means to stop the growing tree before a tree is fully grown.

2. Post-Pruning:

Post-Pruning means to allow the tree to grow with no size limit. After tree completion starts to prune the tree.

Advantages of tree-pruning and post-pruning:

  • Pruning controls to increase tree un-necessary.
  • Pruning reduces the complexity of the tree.

Next Similar Tutorials

  1. Decision tree induction on categorical attributes  – Click Here
  2. Decision Tree Induction and Entropy in data mining – Click Here
  3. Overfitting of decision tree and tree pruning – Click Here
  4. Attribute selection Measures – Click Here
  5. Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
  6. Gini index for binary variables – Click Here
  7. Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
  8. Evaluation of a classifier by confusion matrix in data mining – Click Here
  9. Holdout method for evaluating a classifier in data mining – Click Here
  10. RainForest Algorithm / Framework – Click Here
  11. Boosting in data mining – Click Here
  12. Naive Bayes Classifier  – Click Here


Prof.Fazal Rehman Shamil (Available for Professional Discussions)
1. Message on Facebook page for discussions,
2. Video lectures on Youtube
3. Email is only for Advertisement/business enquiries.