Overfitting of tree
Before overfitting of the tree, let’s revise test data and training data;
Training Data:
Training data is the data that is used for prediction.
Test Data:
Test data is used to assess the power of training data in prediction.
Overfitting:
Overfitting means too many un-necessary branches in the tree. Overfitting results in different kind of anomalies that are the results of outliers and noise.
[quads id=1]
How to avoid overfitting?
There are two techniques to avoid overfitting;
- Pre-pruning
- Post-pruning
1.Pree-Pruning:
Pree-Pruning means to stop the growing tree before a tree is fully grown.
2. Post-Pruning:
Post-Pruning means to allow the tree to grow with no size limit. After tree completion starts to prune the tree.
Advantages of tree-pruning and post-pruning:
- Pruning controls to increase tree un-necessary.
- Pruning reduces the complexity of the tree.
Next Similar Tutorials
- Decision tree induction on categorical attributes – Click Here
- Decision Tree Induction and Entropy in data mining – Click Here
- Overfitting of decision tree and tree pruning – Click Here
- Attribute selection Measures – Click Here
- Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
- Gini index for binary variables – Click Here
- Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
- Evaluation of a classifier by confusion matrix in data mining – Click Here
- Holdout method for evaluating a classifier in data mining – Click Here
- RainForest Algorithm / Framework – Click Here
- Boosting in data mining – Click Here
- Naive Bayes Classifier – Click Here