Handling missing values in the dataframe
Missing values in datasets are the most common issue in the real world. It is often expected to have at least a few instances of missing data in huge chunks of datasets collected from various sources. Data can be missing for several reasons, which can range from anything from data not being generated at the source all the way to downtimes in data collectors. Handling missing data is very important for model training, as many ML algorithms don’t support missing data. Those that do may end up giving more importance to looking for patterns in the missing data, rather than the actual data that is present, which distracts the machine from learning.
Missing data is often referred to as Not Available (NA) or nan. Before we can send a dataframe for model training, we need to handle these types of values first. You can either drop the entire row that contains any missing values or you can fill them with any default value either default or common...