Cross-validation
Cross-validation is a very useful technique to evaluate the performance of a supervised method. We will randomly split our dataset into k sub-datasets called folds (usually, 5 to 10). We will choose a fold for testing and keep the rest for training. We will train the model using the other k-1 folds and test it with a fold. We will repeat this process of training and testing k times, each time keeping a different folder for testing.
In each iteration, we will create a model and obtain a performance measure such as accuracy. When we've finished, we have k measures of performance, and we can obtain the performance of the modeling technique by calculating the average.
Using Rattle, we can split the original dataset into training, validation, and testing. Some R packages implement cross-validation when creating the model. If the model we are creating, uses cross-validation, we can skip the creation of the validation dataset and only create the training and testing datasets.
When...