Which of the following is a statistical measure of the dispersion or variability in a dataset?
A) Mean
B) Median
C) Variance
D) Mode
Answer: C) Variance
What is the formula for calculating the coefficient of variation (CV)?
A) (Standard Deviation / Mean) x 100
B) (Mean / Standard Deviation) x 100
C) Standard Deviation x Mean
D) Mean + Standard Deviation
Answer: A) (Standard Deviation / Mean) x 100
In data analytics, what does the term “correlation” measure?
A) Causation between variables
B) Relationship between variables
C) Mean of variables
D) Median of variables
Answer: B) Relationship between variables
Which statistical test is used to determine if there is a significant difference between the means of two groups?
A) Chi-square test
B) T-test
C) ANOVA
D) Regression analysis
Answer: B) T-test
In regression analysis, what does the coefficient of determination (R-squared) measure?
A) Strength of the relationship between variables
B) Direction of the relationship between variables
C) Significance of the relationship between variables
D) Variance explained by the regression model
Answer: D) Variance explained by the regression model
Which data visualization technique is used to show the distribution of a continuous variable?
A) Histogram
B) Bar chart
C) Pie chart
D) Line graph
Answer: A) Histogram
What does the term “Outlier” refer to in data analytics?
A) An observation that is significantly different from other observations
B) An observation that is similar to other observations
C) An observation with a high correlation coefficient
D) An observation with a low variance
Answer: A) An observation that is significantly different from other observations
Which of the following is NOT a type of data sampling technique?
A) Random sampling
B) Stratified sampling
C) Cluster sampling
D) Convenience sampling
Answer: D) Convenience sampling
What is the purpose of hypothesis testing in data analytics?
A) To prove a hypothesis is true
B) To validate data
C) To assess the strength of a relationship
D) To make decisions about population parameters based on sample data
Answer: D) To make decisions about population parameters based on sample data
Which of the following is a measure of central tendency?
A) Range
B) Standard Deviation
C) Mode
D) Variance
Answer: C) Mode
What does the term “Normal distribution” refer to in statistics?
A) A distribution with a bell-shaped curve
B) A distribution with a linear relationship
C) A distribution with no outliers
D) A distribution with a high variance
Answer: A) A distribution with a bell-shaped curve
Which statistical test is used to determine if there is a significant relationship between two categorical variables?
A) T-test
B) Chi-square test
C) ANOVA
D) Regression analysis
Answer: B) Chi-square test
What is the primary goal of data preprocessing in analytics?
A) To increase the size of the dataset
B) To reduce noise and improve data quality
C) To add outliers to the dataset
D) To remove all missing values
Answer: B) To reduce noise and improve data quality
What does the term “Cross-validation” refer to in machine learning?
A) Training a model on one dataset and testing it on another
B) Splitting data into training and testing sets
C) Evaluating a model’s performance using multiple subsets of data
D) Using multiple algorithms to build a model
Answer: C) Evaluating a model’s performance using multiple subsets of data
Which of the following is a measure of association used for categorical data?
A) Pearson correlation coefficient
B) Spearman’s rank correlation coefficient
C) Coefficient of determination
D) ANOVA
Answer: B) Spearman’s rank correlation coefficient
What is the purpose of data visualization in analytics?
A) To make data look more complex
B) To communicate insights effectively
C) To hide outliers in the data
D) To replace statistical analysis
Answer: B) To communicate insights effectively
Which data structure is used to store data in a hierarchical format?
A) List
B) Array
C) Tree
D) Queue
Answer: C) Tree
Which of the following is NOT a dimensionality reduction technique?
A) Principal Component Analysis (PCA)
B) Linear Regression
C) t-SNE (t-distributed Stochastic Neighbor Embedding)
D) Singular Value Decomposition (SVD)
Answer: B) Linear Regression
What is the purpose of A/B testing in data analytics?
A) To compare two different datasets
B) To assess the performance of a website or application
C) To conduct hypothesis testing
D) To calculate variance
Answer: B) To assess the performance of a website or application
Which of the following is a supervised learning algorithm?
A) K-means clustering
B) Decision tree
C) Apriori algorithm
D) DBSCAN
Answer: B) Decision tree
What does the term “Precision” refer to in classification models?
A) The number of true positive predictions divided by the total number of positive predictions
B) The number of true positive predictions divided by the total number of actual positives
C) The number of true negative predictions divided by the total number of negative predictions
D) The number of true negative predictions divided by the total number of actual negatives
Answer: A) The number of true positive predictions divided by the total number of positive predictions
Which of the following is a type of non-probability sampling technique?
A) Random sampling
B) Stratified sampling
C) Snowball sampling
D) Systematic sampling
Answer: C) Snowball sampling
What does the term “Data Mining” refer to in analytics?
A) Extracting valuable information from data
B) Storing data in a secure location
C) Adding noise to data
D) Removing outliers from data
Answer: A) Extracting valuable information from data
Which statistical test is used to determine if there is a significant difference between the means of more than two groups?
A) T-test
B) Chi-square test
C) ANOVA
D) Regression analysis
Answer: C) ANOVA
What is the primary goal of feature engineering in machine learning?
A) To create new features from existing data
B) To remove features from the dataset
C) To increase model complexity
D) To decrease model accuracy
Answer: A) To create new features from existing data
What does the term “Overfitting” refer to in machine learning?
A) When a model performs well on training data but poorly on new data
B) When a model performs poorly on training data
C) When a model is too simple
D) When a model is not trained properly
Answer: A) When a model performs well on training data but poorly on new data
Which of the following is NOT a classification algorithm?
A) Logistic Regression
B) K-nearest neighbors (KNN)
C) Linear Regression
D) Support Vector Machine (SVM)
Answer: C) Linear Regression
What is the purpose of regularization in machine learning?
A) To penalize complex models
B) To increase model bias
C) To decrease model variance
D) To simplify feature selection
Answer: A) To penalize complex models
Which of the following is a method for handling missing data in a dataset?
A) Removing rows with missing data
B) Replacing missing data with the mean of the column
C) Ignoring missing data
D) All of the above
Answer: D) All of the above
What does the term “Confusion Matrix” represent in classification models?
A) A matrix that shows the relationship between variables
B) A matrix that shows the performance of a classification model
C) A matrix that shows the correlation between variables
D) A matrix that shows the mean of variables
Answer: B) A matrix that shows the performance of a classification model
Which of the following is NOT a step in the CRISP-DM (Cross-Industry Standard Process for Data Mining) framework?
A) Data Understanding
B) Data Visualization
C) Data Preparation
D) Model Evaluation
Answer: B) Data Visualization
What does the term “Logistic Regression” refer to in machine learning?
A) A regression algorithm used for predicting continuous outcomes
B) A regression algorithm used for predicting binary outcomes
C) A classification algorithm used for predicting continuous outcomes
D) A classification algorithm