Understanding Data Processing
A Machine Learning (ML) model is the output we get once data is fitted into an ML algorithm. It represents the underlying relationship between various features and how that relationship impacts the target variable. This relationship depends entirely on the contents of the dataset. What makes every ML model unique, despite using the same ML algorithm, is the dataset that is used to train said model. Data can be collected from various sources and can have different schemas and structures, which need not be structurally compatible among themselves but may in fact be related to each other. This relationship can be very valuable and can also potentially be the differentiator between a good and a bad model. Thus, it is important to transform this data to meet the requirements of the ML algorithm to eventually train a good model.
Data processing, data preparation, and data preprocessing are all steps in the ML pipeline that focus on best exposing the underlying...