Data discretization converts a large number of data values into smaller once, so that data evaluation and data management becomes very easy.
Data discretization example
we have an attribute of age with the following values.
|Age||10,11,13,14,17,19,30, 31, 32, 38, 40, 42,70 , 72, 73, 75|
Table: Before discretization
|10,11,13,14,17,19,||30, 31, 32, 38, 40, 42||70 , 72, 73, 75|
Another example is the Website visitor’s data. As seen in the figure below, data is discretized into the countries.
All IP addresses of a specific country are discretized into the countries.
For example, all visitors visit the website with the IP addresses of the United States are shown under country labels.
What are some famous techniques of data discretization?
- Histogram analysis:Histogram is a plot used to present the underlying frequency distribution of a set of continuous data. Histogram helps the inspection of the data for the distribution of the data. For example normal distribution representation, outliers, and skewness representation, etc.
- Binning:Binning is a data smoothing technique and its helps to group the a huge number of continuous values into a smaller number of bins.For example, if we have a data about a group of the students, and we want to arrange their marks into a smaller number of marks intervals by making the bins of grades. One bin for grade A, one for grade B, one for C, one for D, and one for F Grade.
- Correlation analysisCluster analysis is commonly known as clustering. Clustering is the task of grouping with similar objects in one group, commonly called cluster. All different objects are placed in different cluster.
- Clustering analysis
- Decision tree analysis
- Equal width partitioning
- Equal depth partitioning
Data discretization and concept hierarchy generation
A concept hierarchy represents a sequence of mappings with a set of more general concepts to specialized concepts. Similary mapping from a low-level concepts to higher-level concepts. In other words, we can say top down mapping and bottom up mapping.
Let’s see an example of a concept hierarchy for the dimension location.
Each city can be mapped with the country with which the given city belongs. For example, Mianwali can be mapped to Pakistan and Pakistan can be mapped to Asia.
Top down mapping
Top down mapping starts from top with general concepts and move to the bottom to the specialized concepts.
Bottom up mapping
Bottom up mapping starts from Bottom with specialized concepts and move to the top to the generalized concepts.
Data discretization and binarization in data mining
what is diffrence between discretization and binarizatioin in data science?
Data Discretization in data mining is the process that is used to transform the continuous attributes.
Data Binarization in data mining is used to transform both the discrete and continuous attributes into binary attributes.
Important topics to know:
- Data discretization in data mining tutorial
- data discretization slides
- discretization and binarization in data mining
- discretization example
- data discretization definition
- data discretization in data preprocessing
- data discretization in data mining pdf
- data discretization in data mining ppt