Data Quality in Data Preprocessing for Data Mining

1. Accuracy:

Data can be correct or wrong and accuracy refers to the correctness of the data.

2. Completeness:

Data entries can be completely or partially entered in the database and Completeness evaluates the extent to which data is recorded without missing values.

3. Consistency:

Example#1 of data consistency:

Order_ID Order_Date Product
1 15-Jan-2024 Laptop
2 16-Jan-2024 Smartphone
3 17-Jan-2024 Tablet
4 07-Apr-2024 Television
5 03-May-2024 Headphones

Table 1: Dataset with consistent Date Formatting

Order_ID Order_Date Product
1 15-Jan-2024 Laptop
2 16-Jan-2024 Smartphone
3 17-Jan-2024 Tablet
4 07-04-2024 Television
5 03-05-2024 Headphones

Table 2: Dataset with Inconsistent Date Formatting

Example#2 of data consistency:

Student_ID Name Gender
1 F.R.Shamil Male
2 Jane Smith Female
3 Talha Male
4 Emily Davis Female
5 Sam Brown Male

Table 1: Original Dataset with Consistent Gender Representation

Student_ID Name Gender
1 F.R.Shamil Male
2 Jane Smith Female
3 Talha Male
4 Emily Davis F
5 Sam Brown M

Table 2: Inconsistent Gender Representation

4. Timeliness:

Timeliness assesses whether the data is up-to-date. It is essential for data that change over time to reflect the changes.

5. Believability:

How trustworthy the data are correct?

6. interpretability:

how easily the data can be understood?