1. Which is helpful to generate balanced cross-validation groupings from a set of data?
(A) createResample
(B) createSample
(C) createFolds
(D) none of these
2. Which of the following is the wrong statement?
(A) Three parameters are helpful to time series splitting
(B) Simple random sampling of time series is possibly the greatest way to resample time series data.
(C) Horizon parameter is the number of consecutive values in test set sample
(D) All of these
3. Which of the following functions can be helpful to maximize the minimum dissimilarities?
(A) sumDiss
(B) avgDiss
(C) minDiss
(D) All of these
4. Which function can create the indices for the time series type of splitting?
(A) createTimeSlices
(B) newTimeSlices
(C) binTimeSlices
(D) none of these
5. Which of the following is the correct statement?
(A) caret includes several functions to pre-process the predictor data
(B) Asymptotics are helpful to inference typically
(C) The function dummyVars can be helpful to generate a complete set of dummy variables from one or more factors
(D) All of these
6. Which is helpful to create sub-samples using a maximum dissimilarity approach?
(A) minDissim
(B) inmaxDissim
(C) maxDissim
(D) All of these
7. Which function can be helpful to create balanced splits of the data?
(A) newDataPartition
(B) createDataPartition
(C) renameDataPartition
(D) none of these
8. Which package tools are present in caret?
(A) model tuning
(B) feature selection
(C) pre-processing
(D) All of these
9. Which of the following functions is a wrapper for dissimilar lattice plots to visualize the data?
(A) featurePlot
(B) levelplot
(C) plotsample
(D) None of these
10. Which of the following is the wrong statement?
(A) In every situation, the data generating mechanism can create predictors that only have a single unique value of a matrix to enumerate sets of linear combinations
(B) Predictors might have only a handful of unique values that occur with very low frequencies
(C) The function findLinearCombos uses the QR decomposition
(D) All of these
11. Which function can be helpful to identify near zero-variance variables?
(A) nearZeroVar
(B) nearVar
(C) zeroVar
(D) All of these
12. Which function can be helpful to flag predictors for removal?
(A) searchCorrelation
(B) findCorrelation
(C) findCausation
(D) none of these
13. Which of the following is the correct statement?
(A) findLinearColumns will also return a vector of column positions that can be removed to eliminate the linear dependencies
(B) the function findLinearRows can be helpful to generate a complete set of row variables from one factor
(C) findLinearCombos will return a list that enumerates dependencies
(D) None of these
14. Which can be helpful to impute data sets based only on information in the training set?
(A) preProcess
(B) postProcess
(C) process
(D) All of these
15. Which of the following can also be helpful to find new variables that are linear combinations of the original set with independent components?
(A) PCA
(B) SCA
(C) ICA
(D) None of these
16. Which function is helpful to generate the class distances?
(A) predict.classDist
(B) preprocess.classDist
(C) predict.classDistance
(D) All of these
17. varImp is a wrapper around the evimp function in which of the following package?
(A) numpy
(B) plot
(C) earth
(D) none of these
18. Which of the following is the wrong statement?
(A) An argument, para, is helpful to choose the model fitting technique
(B) For regression, the relationship between each predictor and the outcome is evaluated
(C) The trapezoidal rule is helpful to compute the area under the ROC curve
(D) All of these
19. Which of the following curve analysis is conducted on each predictor for classification?
(A) ROC
(B) COC
(C) NOC
(D) All of these
20. Which of the following functions tracks the changes in model statistics?
(A) findTrack
(B) varImpTrack
(C) none of these
(D) varImp
21. Which of the following is the correct statement?
(A) The difference between the class centroids and the overall centroid is helpful to measure the variable influence
(B) The Bagged Trees output holds variable usage statistics
(C) Boosted Trees uses a dissimilar approach as a single tree
(D) None of these
22. What model includes a backward elimination feature selection routine?
(A) MCV
(B) MARS
(C) MCRS
(D) All of these
23. What model sums the importance over each boosting iteration?
(A) Partial least squares
(B) Bagged trees
(C) Boosted trees
(D) None of these
24. What argument is helpful to set important values?
(A) set
(B) scale
(C) value
(D) All of these
More Next Data Mining MCQs
- Repeated Data Mining MCQs
- Classification in Data mining MCQs
- Clustering in Data mining MCQs
- Data Analysis and Experimental Design MCQs
- Basics of Data Science MCQs
- Big Data MCQs
- Caret Data Science MCQs
- Binary and Count Outcomes MCQs
- CLI and Git Workflow
- Data Preprocessing MCQs
- Data Warehousing and OLAP MCQs
- Association Rule Learning MCQs
- Classification
- Clustering
- Regression MCQs
- Anomaly Detection MCQs
- Text Mining and Natural Language Processing (NLP) MCQs
- Web Mining MCQs
- Sequential Pattern Mining MCQs
- Time Series Analysis MCQs
Data Mining Algorithms and Techniques MCQs
- Frequent Itemset Mining MCQs
- Dimensionality Reduction MCQs
- Ensemble Methods MCQs
- Data Mining Tools and Software MCQs
- Python Programming for Data Mining MCQs (Pandas, NumPy, Scikit-Learn)
- R Programming for Data Mining(dplyr, ggplot2, caret) MCQs
- SQL Programming for Data Mining for Data Mining MCQs
- Big Data Technologies MCQs