Data Cleaning MCQs

By: Prof. Dr. Fazal Rehman Shamil | Last updated: August 7, 2024

Question: What is data cleaning?

A) Deleting all data points that do not fit the model
B) Preparing data for analysis by removing errors and inconsistencies
C) Extracting useful information from the dataset
D) Aggregating data from various sources
Answer: B) Preparing data for analysis by removing errors and inconsistencies

Question: Which of the following is not a step in data cleaning?

A) Removing duplicates
B) Handling missing values
C) Transforming data into a different format
D) Adding noise to the dataset
Answer: D) Adding noise to the dataset

Question: What is the primary goal of removing duplicate records in data cleaning?

A) To reduce computational complexity
B) To increase the size of the dataset
C) To improve the accuracy of analysis
D) To add variability to the dataset
Answer: C) To improve the accuracy of analysis

Question: Which technique is used to detect and handle missing values in a dataset?

A) Data normalization
B) Data discretization
C) Data imputation
D) Data transformation
Answer: C) Data imputation

Question: Why is handling missing data crucial in data mining?

A) Missing data does not affect the analysis results
B) It ensures statistical models perform optimally
C) It reduces the need for data cleaning
D) Missing data can be safely ignored
Answer: B) It ensures statistical models perform optimally

Question: Which approach involves replacing missing values with the mean or median of the non-missing values in the same column?

A) Mean normalization
B) Median normalization
C) Central tendency imputation
D) Standardization
Answer: C) Central tendency imputation

Question: What does outlier detection aim to identify in a dataset?

A) Anomalies or data points that significantly deviate from other observations
B) Missing values in the dataset
C) Unstructured data points
D) Data points that fit the model perfectly
Answer: A) Anomalies or data points that significantly deviate from other observations

Question: Which method is used to identify and correct inconsistencies in data values that do not fit predefined rules?

A) Data validation
B) Data imputation
C) Data integration
D) Data transformation
Answer: A) Data validation

Question: What is the purpose of data normalization in data cleaning?

A) To remove outliers from the dataset
B) To transform data into a consistent format for analysis
C) To introduce variability into the dataset
D) To handle missing values
Answer: B) To transform data into a consistent format for analysis

Question: Which technique involves converting categorical data into numerical values for analysis?

A) Data discretization
B) Data standardization
C) Data encoding
D) Data imputation
Answer: C) Data encoding

Question: What is the primary purpose of data integration in data cleaning?

A) To aggregate data from multiple sources into a unified format
B) To remove outliers from the dataset
C) To transform data into a consistent format
D) To add noise to the dataset
Answer: A) To aggregate data from multiple sources into a unified format

Question: Which of the following is a common approach to handling noisy data in data cleaning?

A) Ignoring noisy data during analysis
B) Adding more noise to balance it out
C) Applying outlier detection techniques
D) None of the above
Answer: C) Applying outlier detection techniques

Question: What does data standardization aim to achieve?

A) To convert data into a standard format for consistency
B) To add variability to the dataset
C) To delete duplicate records
D) To replace missing values
Answer: A) To convert data into a standard format for consistency

Question: Which technique is used to transform data into a common scale without distorting differences in the ranges of values?

A) Min-max scaling
B) Z-score normalization
C) Mean normalization
D) Central tendency imputation
Answer: B) Z-score normalization

Question: Why is data cleaning considered a critical step in the data mining process?

A) It improves the efficiency of data storage
B) It reduces the amount of data needed for analysis
C) It enhances the quality and reliability of analysis results
D) It automates the process of data collection
Answer: C) It enhances the quality and reliability of analysis results

More Next Data Mining MCQs

  1. Repeated Data Mining MCQs
  2. Classification in Data mining MCQs
  3. Clustering in Data mining MCQs
  4. Data Analysis and Experimental Design MCQs
  5. Basics of Data Science MCQs
  6. Big Data MCQs
  7. Caret Data Science MCQs 
  8. Binary and Count Outcomes MCQs
  9. CLI and Git Workflow

 

  1. Data Preprocessing MCQs
  2. Association Rule Learning MCQs
  3. Classification
  4. Clustering
  5. Regression MCQs
  6. Anomaly Detection MCQs
  7. Text Mining and Natural Language Processing (NLP) MCQs
  8. Web Mining MCQs
  9. Sequential Pattern Mining MCQs
  10. Time Series Analysis MCQs

Data Mining Algorithms and Techniques MCQs

  1. Frequent Itemset Mining MCQs
  2. Dimensionality Reduction MCQs
  3. Ensemble Methods MCQs
  4. Data Mining Tools and Software MCQs
  5. Python  Programming for Data Mining MCQs (Pandas, NumPy, Scikit-Learn)
  6. R Programming for Data Mining(dplyr, ggplot2, caret) MCQs
  7. SQL Programming for Data Mining for Data Mining MCQs
  8. Big Data Technologies MCQs