Data discretization in data mining

By: Prof. Fazal Rehman Shamil
Last modified on March 30th, 2021

Data discretization converts a large number of data values into smaller once, so that data evaluation and data management becomes very easy.

Data discretization example

we have an attribute of age with the following values.

Age  10,11,13,14,17,19,30, 31, 32, 38, 40, 42,70 , 72, 73, 75

Table: Before discretization

Attribute Age Age Age
10,11,13,14,17,19, 30, 31, 32, 38, 40, 42 70 , 72, 73, 75
After Discretization Young   Mature Old

Another example is the Website visitor’s data. As seen in the figure below, data is discretized into the countries.

data discretization techniques

All IP addresses of a specific country are discretized into the countries.

For example, all visitors visit the website with the IP addresses of the United States are shown under country labels.

What are some famous techniques of data discretization?

  1. Histogram analysis: Histogram is a plot used to present the underlying frequency distribution of a set of continuous data. The histogram helps the inspection of the data for the distribution of the data. For example normal distribution representation, outliers, and skewness representation, etc.
  2. Binning: Binning is a data smoothing technique and its helps to group a huge number of continuous values into a smaller number of bins. For example, if we have data about a group of students, and we want to arrange their marks into a smaller number of marks intervals by making the bins of grades. One bin for grade A, one for grade B, one for C, one for D, and one for F Grade.
  3. Correlation analysis: Cluster analysis is commonly known as clustering. Clustering is the task of grouping similar objects in one group, commonly called clusters.  All different objects are placed in different clusters.
  4. Clustering analysis
  5. Decision tree analysis
  6. Equal width partitioning
  7. Equal depth partitioning

Data discretization and concept hierarchy generation

A concept hierarchy represents a sequence of mappings with a set of more general concepts to specialized concepts. Similarly mapping from low-level concepts to higher-level concepts. In other words, we can say top-down mapping and bottom-up mapping.

Let’s see an example of a concept hierarchy for the dimension location.

Each city can be mapped with the country to which the given city belongs. For example, Mianwali can be mapped to Pakistan and Pakistan can be mapped to Asia.

Top-down mapping

Top-down mapping starts from the top with general concepts and moves to the bottom to the specialized concepts.

Bottom-up mapping

Bottom-up mapping starts from the Bottom with specialized concepts and moves to the top to the generalized concepts.

data discretization and concept hierarchy generation

Data discretization and binarization in data mining

what is the difference between discretization and binarization in data science?

Data Discretization in data mining is the process that is used to transform the continuous attributes.

Data Binarization in data mining is used to transform both the discrete and continuous attributes into binary attributes.

Important topics to know:

  • Data discretization in data mining tutorial
  • data discretization slides
  • discretization and binarization in data mining
  • discretization example
  • data discretization definition
  • data discretization in data preprocessing
  • data discretization in data mining pdf
  • data discretization in data mining ppt
Prof. Fazal Rehman Shamil
Latest posts by Prof. Fazal Rehman Shamil (see all)