The What and How on Cross-Validation for Supervised Learning

Key Concepts Worth Articulating and Best CV Practices

3 min readDec 20, 2020

The Dichotomy Between Test and Validation

1. Metamorphic explainer for visceral mental imageries

Given the powerful effect of association and mental imagery, we start this note with an analogy: If we think of supervised learning tasks as a stage play or a high-stake exam, validations are the dress rehearsals or the mock exams we carry out to (1) test our strategy/knowledge in a safe environment and (2) form feedback loops for improving our strategy/knowledge based on the validation outcome. Similarly, we can view tests as the “moment-of-truth” real contest or actual stage play.

2. More rigorous explainer

In more formal language, validation is part of the training data that are partitioned out as a proxy of test datasets to give us some insight into the predictive capability of the model. Put it more concretely, validation serves two main purposes under the umbrella of model selection:

Select among different models based on validation scores
Select optimal combinations of hyperparameters for each given model

In practice, these two activities are carried out in parallel in scikit learn classes GridSearchCV and RandomizedSearchCV .

What Is the “Cross” for in Cross-Validation?

“Cross” has two layers of meaning:

Signifies the use of sampling without replacement so that each training example is used in training and validation exactly once. The opposite of this approach — sampling with replacement — can be compared to the ineffective learning strategy of rereading repeatedly and being fooled by the illusion of knowing.
Circumvents the risk of fitting to the model to a particular way of partition the train vs. validation sets. For example, for a 5-fold cross-validation, if we simply do the partition and arbitrarily decide to only carry out validation once with the first chunk of the dataset as the validation, we could get unlucky that some systematic biases happen to be present in such partitioning and the model ends up being distracted by such noises.

Why Are Validation and Test Both Necessary?

During validation, we expose the model to all the examples in the training data. The risk here is that we cultivate a sense of familiarity for the model that is easy to confuse rote memorization with actual learning. And also like real-life situations when we could think that we had mastered the test materials but ended up doing poorly in the real exams, the test is necessary to — well, really put the model’s learning into tests.

Best Practices

The following guidelines can be found in Raschka and Mirjalili (2019):

For data sizes that are not too large or too small, a good standard value for k in k-fold cross-validation is 10. 10-fold CV is suggested to achieve the best tradeoff between bias and variance
A general rule of thumb is: We increase the k values in k-fold CV when the data size is small. One core reason behind this rule is to maximize the size of training data in each CV iteration. Otherwise, because the data size is already very small to start with, we risk having “pessimistic bias” when estimating model performance. One extreme approach for very small training data is the “leave-one-out cross-validation” (LOOCV); In LOOCV, the number of folds equals the total number of training examples. In each CV iteration, only one training example is treated as validation data.
By contrast, when we work with a large dataset, the concern shifts toward computation costs and agility in parameter adjustment. This is when we want to decrease k values (e.g., k = 5 could be a good start)

Closing Words

Things that seem trivial may not easily be articulated. Same as the “putting our learning to tests” argument, articulating and forming associations and intuitions around core (and seemingly trivial) concepts are a crucial part of mastery.

References & Useful Resources

Browlee, J., 2017, “What is the Difference Between Test and Validation Datasets?” blog post.
Raschka, S. and Mirjalili, V., 2019, “Learning Best Practices for Model Evaluation and Hyperparameter Tuning”, Python Machine Learning — Third Edition.
Brown, P.C., Roediger H.L., and McDaniel, 2014, “Avoid Illusions of Knowing”, Make It Stick: The Science of Successful Learning.