All data is randomly divided into same equal size data sets. e.g,
- Training set
- Test set
- Validation set
- It is a data set helps in the prediction of the model.
- Unseen data is used as a subset of the data set to assess the performance of the model.
- The validation set is also a data set used to assess the performance of model built during the training.
There are total 3 data sets.
Total training set for model construction
Total test set for accuracy estimation