Summary
In this chapter, we first explored the various techniques and some of the common functions we use to preprocess our dataframe before it is sent to model training. We looked into how we can reframe our raw dataframe into a suitable consistent format that meets the requirement for model training. We learned how to manipulate the columns of dataframes by combining them with different columns of different dataframes. We learned how to combine rows from partitioned dataframes, as well as how to directly merge dataframes into a single dataframe.
Once we knew how to reframe our dataframes, we learned how to handle the missing values that are often present in freshly collected data. We learned how to fill NA values, replace certain incorrect values, as well as how to use different imputation strategies to avoid adding noise and bias when filling missing values.
We then investigated how we can manipulate the feature columns by sorting the dataframes by column, as well as changing...