Data Cleaning, Imbalanced Data, and Other Data Problems
This chapter covers how to address common problems with real-life datasets. You will learn about data exploration and cleaning in more depth than was covered in Chapters 2 and 4. By the end of this chapter, you will know how to clean different types of data, whether the data is continuous numeric values, text categories, or date data. You will see what you should look for when exploring data and how to handle common data-cleaning problems. You will gain an understanding of how to manage unbalanced data for classification problems and how to work with transformed data.
In this chapter, you will learn about the following main topics:
- Real-life data is never clean
- What to look for when exploring data
- Data-cleaning methods
- Handling imbalanced data