Quantitative Data Analytics Qualification related questions

By: Prof. Dr. Fazal Rehman Shamil | Last updated: June 15, 2024

Which of the following is a statistical measure of the dispersion or variability in a dataset?
A) Mean
B) Median
C) Variance
D) Mode
Answer: C) Variance

What is the formula for calculating the coefficient of variation (CV)?
A) (Standard Deviation / Mean) x 100
B) (Mean / Standard Deviation) x 100
C) Standard Deviation x Mean
D) Mean + Standard Deviation
Answer: A) (Standard Deviation / Mean) x 100

In data analytics, what does the term “correlation” measure?
A) Causation between variables
B) Relationship between variables
C) Mean of variables
D) Median of variables
Answer: B) Relationship between variables

Which statistical test is used to determine if there is a significant difference between the means of two groups?
A) Chi-square test
B) T-test
C) ANOVA
D) Regression analysis
Answer: B) T-test

In regression analysis, what does the coefficient of determination (R-squared) measure?
A) Strength of the relationship between variables
B) Direction of the relationship between variables
C) Significance of the relationship between variables
D) Variance explained by the regression model
Answer: D) Variance explained by the regression model

Which data visualization technique is used to show the distribution of a continuous variable?
A) Histogram
B) Bar chart
C) Pie chart
D) Line graph
Answer: A) Histogram

What does the term “Outlier” refer to in data analytics?
A) An observation that is significantly different from other observations
B) An observation that is similar to other observations
C) An observation with a high correlation coefficient
D) An observation with a low variance
Answer: A) An observation that is significantly different from other observations

Which of the following is NOT a type of data sampling technique?
A) Random sampling
B) Stratified sampling
C) Cluster sampling
D) Convenience sampling
Answer: D) Convenience sampling

What is the purpose of hypothesis testing in data analytics?
A) To prove a hypothesis is true
B) To validate data
C) To assess the strength of a relationship
D) To make decisions about population parameters based on sample data
Answer: D) To make decisions about population parameters based on sample data

Which of the following is a measure of central tendency?
A) Range
B) Standard Deviation
C) Mode
D) Variance
Answer: C) Mode

What does the term “Normal distribution” refer to in statistics?
A) A distribution with a bell-shaped curve
B) A distribution with a linear relationship
C) A distribution with no outliers
D) A distribution with a high variance
Answer: A) A distribution with a bell-shaped curve

Which statistical test is used to determine if there is a significant relationship between two categorical variables?
A) T-test
B) Chi-square test
C) ANOVA
D) Regression analysis
Answer: B) Chi-square test

What is the primary goal of data preprocessing in analytics?
A) To increase the size of the dataset
B) To reduce noise and improve data quality
C) To add outliers to the dataset
D) To remove all missing values
Answer: B) To reduce noise and improve data quality

What does the term “Cross-validation” refer to in machine learning?
A) Training a model on one dataset and testing it on another
B) Splitting data into training and testing sets
C) Evaluating a model’s performance using multiple subsets of data
D) Using multiple algorithms to build a model
Answer: C) Evaluating a model’s performance using multiple subsets of data

Which of the following is a measure of association used for categorical data?
A) Pearson correlation coefficient
B) Spearman’s rank correlation coefficient
C) Coefficient of determination
D) ANOVA
Answer: B) Spearman’s rank correlation coefficient

What is the purpose of data visualization in analytics?
A) To make data look more complex
B) To communicate insights effectively
C) To hide outliers in the data
D) To replace statistical analysis
Answer: B) To communicate insights effectively

Which data structure is used to store data in a hierarchical format?
A) List
B) Array
C) Tree
D) Queue
Answer: C) Tree

Which of the following is NOT a dimensionality reduction technique?
A) Principal Component Analysis (PCA)
B) Linear Regression
C) t-SNE (t-distributed Stochastic Neighbor Embedding)
D) Singular Value Decomposition (SVD)
Answer: B) Linear Regression

What is the purpose of A/B testing in data analytics?
A) To compare two different datasets
B) To assess the performance of a website or application
C) To conduct hypothesis testing
D) To calculate variance
Answer: B) To assess the performance of a website or application

Which of the following is a supervised learning algorithm?
A) K-means clustering
B) Decision tree
C) Apriori algorithm
D) DBSCAN
Answer: B) Decision tree

What does the term “Precision” refer to in classification models?
A) The number of true positive predictions divided by the total number of positive predictions
B) The number of true positive predictions divided by the total number of actual positives
C) The number of true negative predictions divided by the total number of negative predictions
D) The number of true negative predictions divided by the total number of actual negatives
Answer: A) The number of true positive predictions divided by the total number of positive predictions

Which of the following is a type of non-probability sampling technique?
A) Random sampling
B) Stratified sampling
C) Snowball sampling
D) Systematic sampling
Answer: C) Snowball sampling

What does the term “Data Mining” refer to in analytics?
A) Extracting valuable information from data
B) Storing data in a secure location
C) Adding noise to data
D) Removing outliers from data
Answer: A) Extracting valuable information from data

Which statistical test is used to determine if there is a significant difference between the means of more than two groups?
A) T-test
B) Chi-square test
C) ANOVA
D) Regression analysis
Answer: C) ANOVA

What is the primary goal of feature engineering in machine learning?
A) To create new features from existing data
B) To remove features from the dataset
C) To increase model complexity
D) To decrease model accuracy
Answer: A) To create new features from existing data

What does the term “Overfitting” refer to in machine learning?
A) When a model performs well on training data but poorly on new data
B) When a model performs poorly on training data
C) When a model is too simple
D) When a model is not trained properly
Answer: A) When a model performs well on training data but poorly on new data

Which of the following is NOT a classification algorithm?
A) Logistic Regression
B) K-nearest neighbors (KNN)
C) Linear Regression
D) Support Vector Machine (SVM)
Answer: C) Linear Regression

What is the purpose of regularization in machine learning?
A) To penalize complex models
B) To increase model bias
C) To decrease model variance
D) To simplify feature selection
Answer: A) To penalize complex models

Which of the following is a method for handling missing data in a dataset?
A) Removing rows with missing data
B) Replacing missing data with the mean of the column
C) Ignoring missing data
D) All of the above
Answer: D) All of the above

What does the term “Confusion Matrix” represent in classification models?
A) A matrix that shows the relationship between variables
B) A matrix that shows the performance of a classification model
C) A matrix that shows the correlation between variables
D) A matrix that shows the mean of variables
Answer: B) A matrix that shows the performance of a classification model

Which of the following is NOT a step in the CRISP-DM (Cross-Industry Standard Process for Data Mining) framework?
A) Data Understanding
B) Data Visualization
C) Data Preparation
D) Model Evaluation
Answer: B) Data Visualization

What does the term “Logistic Regression” refer to in machine learning?
A) A regression algorithm used for predicting continuous outcomes
B) A regression algorithm used for predicting binary outcomes
C) A classification algorithm used for predicting continuous outcomes
D) A classification algorithm