Partitioning datasets and model optimization
As we've explained, in supervised learning, we split the dataset in three subsets—training, validation, and testing:
To create the model or learner, Rattle uses the training dataset. After creating a model, we use the validation data to evaluate its performance. To improve the performance, depending on the algorithm we're using, we can use different tuning options. After tuning, we rebuild the model and evaluate its performance again. This is an iterative process; we create the model and evaluate it until we're fine with its performance.
For simplicity, in this chapter, we'll see only model creation, and in the following chapter, we'll see model optimization, but in real life, this is an iterative process.
The examples in this chapter will not have any optimization.
Finally, when you're happy with the model, you can use the testing dataset to confirm its performance. You need to use the testing dataset because...