Which of the following is not a benefit of cross validation

This set of Data Science Multiple Choice Questions & Answers (MCQs) focuses on “Cross Validation”.

Índice Show

Which of the following best describes a goal of cross
Why is cross
Which of the following is a common error measure *?
In which situations you would recommend leave one out method for validation of data mining results?

1. Which of the following is correct use of cross validation?
a) Selecting variables to include in a model
b) Comparing predictors
c) Selecting parameters in prediction function
d) All of the mentioned
View Answer

Answer: d
Explanation: Cross-validation is also used to pick type of prediction function to be used.

2. Point out the wrong combination.
a) True negative=correctly rejected
b) False negative=correctly rejected
c) False positive=correctly identified
d) All of the mentioned
View Answer

Answer: c
Explanation: False positive means incorrectly identified.

3. Which of the following is a common error measure?
a) Sensitivity
b) Median absolute deviation
c) Specificity
d) All of the mentioned
View Answer

Answer: d
Explanation: Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function.

4. Which of the following is not a machine learning algorithm?
a) SVG
b) SVM
c) Random forest
d) None of the mentioned
View Answer

Answer: a
Explanation: SVM stands for scalable vector machine.

5. Point out the wrong statement.
a) ROC curve stands for receiver operating characteristic
b) Foretime series, data must be in chunks
c) Random sampling must be done with replacement
d) None of the mentioned
View Answer

Answer: d
Explanation: Random sampling with replacement is the bootstrap.

6. Which of the following is a categorical outcome?
a) RMSE
b) RSquared
c) Accuracy
d) All of the mentioned
View Answer

Answer: c
Explanation: RMSE stands for Root Mean Squared Error.

7. For k cross-validation, larger k value implies more bias.
a) True
b) False
View Answer

Answer: b
Explanation: For k cross-validation, larger k value implies less bias.

8. Which of the following method is used for trainControl resampling?
a) repeatedcv
b) svm
c) bag32
d) none of the mentioned
View Answer

Answer: a
Explanation: repeatedcv stands for repeated cross-validation.

9. Which of the following can be used to create the most common graph types?
a) qplot
b) quickplot
c) plot
d) all of the mentioned
View Answer

Answer: a
Explanation: qplot() is short for a quick plot.

10. For k cross-validation, smaller k value implies less variance.
a) True
b) False
View Answer

Answer: a
Explanation: Larger k value implies more variance.

Sanfoundry Global Education & Learning Series – Data Science.

Here’s the list of Best Books in Data Science.

Next Steps:

Get Free Certificate of Merit in Data Science
Participate in Data Science Certification Contest
Become a Top Ranker in Data Science
Take Data Science Tests
Chapterwise Practice Tests: Chapter 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Chapterwise Mock Tests: Chapter 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & technical discussions at Telegram SanfoundryClasses.

Next: Blackbox Model Selection Up: Autonomous Modeling Previous: Judging Model Quality by

Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen. One way to overcome this problem is to not use the entire data set when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on ``new'' data. This is the basic idea for a whole class of model evaluation methods called cross validation.

The holdout method is the simplest kind of cross validation. The data set is separated into two sets, called the training set and the testing set. The function approximator fits a function using the training set only. Then the function approximator is asked to predict the output values for the data in the testing set (it has never seen these output values before). The errors it makes are accumulated as before to give the mean absolute test set error, which is used to evaluate the model. The advantage of this method is that it is usually preferable to the residual method and takes no longer to compute. However, its evaluation can have a high variance. The evaluation may depend heavily on which data points end up in the training set and which end up in the test set, and thus the evaluation may be significantly different depending on how the division is made.

K-fold cross validation is one way to improve over the holdout method. The data set is divided into k subsets, and the holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. Then the average error across all k trials is computed. The advantage of this method is that it matters less how the data gets divided. Every data point gets to be in a test set exactly once, and gets to be in a training set k-1 times. The variance of the resulting estimate is reduced as k is increased. The disadvantage of this method is that the training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to make an evaluation. A variant of this method is to randomly divide the data into a test and training set k different times. The advantage of doing this is that you can independently choose how large each test set is and how many trials you average over.

Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set. That means that N separate times, the function approximator is trained on all the data except for one point and a prediction is made for that point. As before the average error is computed and used to evaluate the model. The evaluation given by leave-one-out cross validation error (LOO-XVE) is good, but at first pass it seems very expensive to compute. Fortunately, locally weighted learners can make LOO predictions just as easily as they make regular predictions. That means computing the LOO-XVE takes no more time than computing the residual error and it is a much better way to evaluate models. We will see shortly that Vizier relies heavily on LOO-XVE to choose its metacodes.

Figure 26: Cross validation checks how well a model generalizes to new data

Fig. 26 shows an example of cross validation performing better than residual error. The data set in the top two graphs is a simple underlying function with significant noise. Cross validation tells us that broad smoothing is best. The data set in the bottom two graphs is a complex underlying function with no noise. Cross validation tells us that very little smoothing is best for this data set.

Now we return to the question of choosing a good metacode for data set a1.mbl:

File -> Open -> a1.mbl
Edit -> Metacode -> A90:9
Model -> LOOPredict
Edit -> Metacode -> L90:9
Model -> LOOPredict
Edit -> Metacode -> L10:9
Model -> LOOPredict

LOOPredict goes through the entire data set and makes LOO predictions for each point. At the bottom of the page it shows the summary statistics including Mean LOO error, RMS LOO error, and information about the data point with the largest error. The mean absolute LOO-XVEs for the three metacodes given above (the same three used to generate the graphs in fig. 25), are 2.98, 1.23, and 1.80. Those values show that global linear regression is the best metacode of those three, which agrees with our intuitive feeling from looking at the plots in fig. 25. If you repeat the above operation on data set b1.mbl you'll get the values 4.83, 4.45, and 0.39, which also agrees with our observations.

Next: Blackbox Model Selection Up: Autonomous Modeling Previous: Judging Model Quality by

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997

Which of the following best describes a goal of cross

The goal of cross-validation is to estimate the expected level of fit of a model to a data set that is independent of the data that were used to train the model. It can be used to estimate any quantitative measure of fit that is appropriate for the data and model.

Why is cross

Cross-Validation is a very powerful tool. It helps us better use our data, and it gives us much more information about our algorithm performance. In complex machine learning models, it's sometimes easy not pay enough attention and use the same data in different steps of the pipeline.

Which of the following is a common error measure *?

Which of the following is a common error measure? Explanation: Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function.

The leave-one-out cross-validation procedure is appropriate when you have a small dataset or when an accurate estimate of model performance is more important than the computational cost of the method.