Feature engineering is a massive task to be undertaken by data scientists and machine learning engineers. It is a task that is imperative to having successful and production-ready machine learning pipelines. In the coming seven chapters, we are going to explore six major aspects of feature engineering:
- Feature understanding: learning how to identify data based on its qualities and quantitative state
- Feature improvement: cleaning and imputing missing data values in order to maximize the dataset's value
- Feature selection -statistically selecting and subsetting feature sets in order to reduce the noise in our data
- Feature construction - building new features with the intention of exploiting feature interactions
- Feature transformation - extracting latent (hidden) structure within datasets in order to mathematically transform our datasets into something new (and usually better)
- Feature learning - harnessing the power of deep learning to view data in a whole new light that will open up new problems to be solved.
In this book, we will be exploring feature engineering as it relates to our machine learning endeavors. By breaking down this large topic into our subtopics and diving deep into each one in separate chapters, we will be able to get a much broader and more useful understanding of how these procedures work and how to apply each one in Python.
In our next chapter, we will dive straight into our first subsection, Feature understanding. We will finally be getting our hands on some real data, so let's begin!