Summary
In this chapter, you learned how to clean different types of data, whether the data is continuous numeric values, text categories, or date data. You learned what you should look for when exploring data and how to handle common types of data-cleaning problems. You created Python functions to handle common data-cleaning tasks. You gained an understanding of how to manage imbalanced data for classification problems and how to work with transformed data. All of these will help you make sure your model’s accuracy is not hampered by data problems.
In the next chapter, you will learn more about feature engineering, which is creating new input columns from existing data, and feature selection, where you reduce the number of input columns to those with the most effect on the model. In that chapter, we will be using a housing price dataset from Kaggle.com, which can be found at https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/. In preparation...