Experimenting with parameters that support cross-validation
When performing model training on a dataset, we usually perform a train-test split on the dataset. Let’s assume we split it in the ratio of 70% and 30%, where 70% is used to create the training dataset and the remaining 30% is used to create the test dataset. Then, we pass the training dataset to the ML system for training and use the test dataset to calculate the performance of the model. A train-test split is often performed in a random state, meaning 70% of the data that was used to create the training dataset is often chosen at random from the original dataset without replacement, except in the case of time-series data, where the order of the events needs to be maintained or in the case where we need to keep the classes stratified. Similarly, for the test dataset, 30% of the data is chosen at random from the original dataset to create the test dataset.
The following diagram shows how data from the dataset is...