In order to have sustainable, responsible machine learning workflows and develop machine learning applications that produce true value, we need to be able to measure how well our machine learning models perform. We also need to ensure that our machine learning models generalize to data that they will see in production. If we don't do these things, we are basically shooting in the dark. We will have no understanding of the expected behavior of our models and we won't be able to improve them over time.
The process of measuring how a model is performing (with respect to certain data) is called evaluation. The process of ensuring that our model generalizes to data that we might expect to encounter is called validation. Both processes need to be present in every machine learning workflow and application, and we will cover both in this chapter.
...