Overfitting of decision tree and tree pruning, How to avoid overfitting in data mining

Overfitting of tree

Before overfitting of the tree, let’s revise test data and training data;

Training Data:

Training data is the data that is used for prediction.

Test Data:

Test data is used to assess the power of training data in prediction.

Overfitting:

Overfitting means too many un-necessary branches in the tree. Overfitting results in different kind of anomalies that are the results of outliers and noise.

[quads id=1]

How to avoid overfitting?

There are two techniques to avoid overfitting;

  1. Pre-pruning
  2. Post-pruning

1.Pree-Pruning:

Pree-Pruning means to stop the growing tree before a tree is fully grown.

2. Post-Pruning:

Post-Pruning means to allow the tree to grow with no size limit. After tree completion starts to prune the tree.

Advantages of tree-pruning and post-pruning:

  • Pruning controls to increase tree un-necessary.
  • Pruning reduces the complexity of the tree.

Next Similar Tutorials

  1. Decision tree induction on categorical attributes  – Click Here
  2. Decision Tree Induction and Entropy in data mining – Click Here
  3. Overfitting of decision tree and tree pruning – Click Here
  4. Attribute selection Measures – Click Here
  5. Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
  6. Gini index for binary variables – Click Here
  7. Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
  8. Evaluation of a classifier by confusion matrix in data mining – Click Here
  9. Holdout method for evaluating a classifier in data mining – Click Here
  10. RainForest Algorithm / Framework – Click Here
  11. Boosting in data mining – Click Here
  12. Naive Bayes Classifier  – Click Here