Data Generalization In Data Mining – Summarization Based Characterization

From Data Analysis point of view, we can classify the data mining into the following two categories; Predictive data mining

Predictive data mining
Descriptive data mining

Descriptive data mining

We can describe the data set in a concise way and it is also helpful in presenting the interesting properties of the given data.

Predictive data mining

Predictive data mining is helpful in analyzing the data to construct one or a set of models. Predictive data mining is useful in predicting the behavior of new data sets. If we discuss about databases, then databases have a big amount of data. But the user wants to store a big data but interested in getting the summarized and concise data and information. Predictive data mining is helpful to show an overall picture of a class of data. Similarly, Predictive data mining is useful to distinguish it from a set of comparative classes. Concept Description Concept Description is the simplest kind of descriptive data mining. A concept is a term that can be used for a collection of data. The collection of data examples are mentioned below; new_students, graduate_students, alumni, and so on. We can’t say that the data mining task concept description is a simple enumeration of the data. The solid reason behind it is because the concept description generates descriptions for the comparison and characterization of the data. The term concept description is also referred to as class description especially when the concept to be described is about a class of objects. Comparison of data • Comparison of data provides the descriptions of comparing more than one data collection. Characterization of data Characterization provides a concise summary of the given collection of data.

Data Generalization & Summarization

When we are at the primitive concept level, then Data and objects in databases contain detailed information. For example, let’s see the example of a sales database. Suppose the database contains attributes that are describing the low-level item data and information. Low level information examples are item_ID, item_name, item_supplier, item_place_made, item_brand, item_category, and item_price. Data Summarization is very helpful to summarize a large set of data. Data Summarization is helpful to present it at a high conceptual level. Let’s see one example. Suppose we want to summarize a large amount of data related to the sales during the summer holidays and providing a general description of such data, which can be very helpful for the sales department and managers. All of this activity requires the help of data generalization.