Summary
In this chapter, we learned how to perform pandas operations on datasets for data-wrangling purposes. We explored various stages of the data-wrangling life cycle, through discovery, structuring, cleansing, enriching, data quality validation, and visualization, and the usage of pandas operations to perform those activities seamlessly. Users can use AWS SDK for pandas, aka awswrangler integration, with pandas DataFrames to perform data-wrangling activities on AWS cloud services.
In the next chapter, we are going to learn about SageMaker Data Wrangler, which helps in performing data-wrangling activities as a part of ML pipelines.