Model Validation and Testing
Nowadays, it is easy for almost anybody to start working in a machine-learning project with all the information available online. However, choosing the right algorithm for your data is a challenge when there are many alternatives available. Due to this, the right algorithm is chosen by a process of trial and error, where the different alternatives are tested.
Moreover, the decision process to arrive at a good model covers not only the selection of the algorithm but also the tuning of its hyperparameters. To do this, a conventional approach is to divide the data into three parts, training, validation, and testing sets, which will be explained further now.
Data Partition
Data partition is a process involving the division of the dataset into three subsets so that each set can be used for a different purpose. This way, the development of a model is not affected by the introduction of bias. The following is an explanation of each subset:
Training set: As the name suggests...