Data Generalization In Data Mining – Summarization Based Characterization

Data Generalization In Data Mining - Summarization Based Characterization

From Data Analysis point of view, we can classify the data mining into the following two categories; Predictive data mining

  1. Predictive data mining
  2. Descriptive data mining

Descriptive data mining

We can describe the data set in a concise way and it is also helpful in presenting the interesting properties of the given data.

Predictive data mining

Predictive data mining is helpful in analyzing the data to construct one or a set of models. Predictive data mining is useful in predicting the behavior of new data sets.

If we discuss about databases, then databases have a big amount of data. But the user wants to store a big data but interested in getting the summarized and concise data and information.

Predictive data mining is helpful to show an overall picture of a class of data. Similarly, Predictive data mining is useful to distinguish it from a set of comparative classes.

Concept Description

Concept Description is the simplest kind of descriptive data mining. A concept is a term that can be used for a collection of data. The collection of data examples are mentioned below;

new_students, graduate_students, alumni, and so on.

We can’t say that the data mining task concept description is a simple enumeration of the data. The solid reason behind it is because the concept description generates descriptions for the comparison and characterization of the data.

The term concept description is also referred to as class description especially when the concept to be described is about a class of objects.

Comparison of data

• Comparison of data provides the descriptions of comparing more than one data collection.

Characterization of data

Characterization provides a concise summary of the given collection of data.

Data Generalization & Summarization

When we are at the primitive concept level, then Data and objects in databases contain detailed information.
For example, let’s see the example of a sales database. Suppose the database contains attributes that are describing the low-level item data and information. Low level information examples are item_ID, item_name, item_supplier, item_place_made, item_brand, item_category,  and item_price.

Data Summarization is very helpful to summarize a large set of data. Data Summarization is helpful to present it at a high conceptual level.

Let’s see one example. Suppose we want to summarize a large amount of data related to the sales during the summer holidays and providing a general description of such data, which can be very helpful for the sales department and managers.

All of this activity requires the help of data generalization.

Data Generalization

Data Generalization abstracts a large set of data that is relevant to the task in a database from a low to the higher conceptual level.

Data Generalization is helpful for creating the characteristic rules and it is a summarization of general features of objects in a target class.

How to retrieve data relevant to a user-specified class?

The data can be retrieved by a database query and with the help of a summarization module to extract the essence of the data at different levels of abstractions.