Major tasks of data pre-processing
Data Cleaning
Data cleaning is a process to clean the data in such a way that data can be easily integrated.
Data Integration
Data integration is a process to integrate/combine all the data.
Data Reduction
Data reduction is a process to reduce the large data into smaller once in such a way that data can be easily transformed further.
Data Transformation
Data transformation is a process to transform the data into a reliable shape.
Data Discretization
Data discretization converts a large number of data values into smaller once, so that data evaluation and data management becomes very easy.
After the completion of these tasks, the data is ready for mining.
Optimized data pre-processing for discrimination prevention
Non-discrimination is a recognized objective in algorithmic decision making.
We can reduce the discrimination. Convex optimization is a technique for learning a data transformation. Let’s see some of the important goals of Convex optimization.
Controlling discrimination
limiting distortion in individual data samples
preserving utility etc and many more.
FAQ
Raw data has not had any type of processing or pre-processing done on it?
Answer: Yes, Raw data pre-processing is a technique which is used to transform the raw data in a more useful and efficient format when we are mining the data.
Why do you rescale data in pre processing?
Answer: Our preprocessed data may contain attributes with a mixtures of scales for various quantities such as pageviews, CPC and RPM etc.
Data normalization and standardization are two most famous data scaling methods..
Important topics to know:
Learn about data preprocessing in data mining ppt. Learn about data preprocessing steps in machine learning. Learn about data preprocessing tools. Learn about the data preprocessing diagram. Learn about data preprocessing python. Learn about data preprocessing tutorial. Learn about data preprocessing techniques.Learn about data preprocessing techniques in machine learning.