What this book covers
Chapter 1, Getting Started with Data Wrangling: In the opening chapter, you will embark on a journey into the world of data wrangling and discover the power of leveraging AWS for efficient and effective data manipulation and preparation. This chapter serves as a solid foundation, providing you with an overview of the key concepts and tools you’ll encounter throughout the book.
Chapter 2, Introduction to AWS Glue DataBrew: In this chapter, you will discover the powerful capabilities of AWS Glue DataBrew for data wrangling and data preparation tasks. This chapter will guide you through the process of leveraging AWS Glue DataBrew to cleanse, transform, and enrich your data, ensuring its quality and usability for further analysis.
Chapter 3, Introducing AWS SDK for pandas: In this chapter, you will be introduced to the versatile capabilities of AWS Data Wrangler for data wrangling tasks on the AWS platform. This chapter will provide you with a comprehensive understanding of AWS Data Wrangler and how it can empower you to efficiently manipulate and prepare your data for analysis.
Chapter 4, Introduction to SageMaker Data Wrangler: In this chapter, you will discover the capabilities of Amazon SageMaker Data Wrangler for data wrangling tasks within the Amazon SageMaker ecosystem. This chapter will equip you with the knowledge and skills to leverage Amazon SageMaker Data Wrangler’s powerful features to efficiently preprocess and prepare your data for machine learning projects.
Chapter 5, Working with Amazon S3: In this chapter, you will delve into the world of Amazon Simple Storage Service (S3) and explore its vast potential for storing, organizing, and accessing your data. This chapter will provide you with a comprehensive understanding of Amazon S3 and how it can be leveraged for effective data management and manipulation.
Chapter 6, Working with AWS Glue: In this chapter, you will dive into the powerful capabilities of AWS Glue, a fully managed extract, transform, and load (ETL) service provided by AWS. This chapter will guide you through the process of leveraging AWS Glue to automate and streamline your data preparation and transformation workflows.
Chapter 7, Working with Athena: In this chapter, you will explore the powerful capabilities of Amazon Athena, a serverless query service that enables you to analyze data directly in Amazon S3 using standard SQL queries. This chapter will guide you through the process of leveraging Amazon Athena to unlock valuable insights from your data, without the need for complex data processing infrastructure.
Chapter 8, Working with QuickSight: In this chapter, you will discover the power of Amazon QuickSight, a fast, cloud-powered business intelligence (BI) service provided by AWS. This chapter will guide you through the process of leveraging QuickSight to create interactive dashboards and visualizations, enabling you to gain valuable insights from your data.
Chapter 9, Building an End-to-End Data-Wrangling Pipeline with AWS SDK for Pandas: In this chapter, you will explore the powerful combination of AWS Data Wrangler and pandas, a popular Python library for data manipulation and analysis. This chapter will guide you through the process of leveraging pandas operations within AWS Data Wrangler to perform advanced data transformations and analysis on your datasets.
Chapter 10, Data Processing for Machine Learning with SageMaker Data Wrangler: In this chapter, you will delve into the world of machine learning (ML) data optimization using the powerful capabilities of AWS SageMaker Data Wrangler. This chapter will guide you through the process of leveraging SageMaker Data Wrangler to preprocess and prepare your data for ML projects, maximizing the performance and accuracy of your ML models.
Chapter 11, Data Lake Security and Monitoring: In this chapter, you will be introduced to Identity and Access Management (IAM) on AWS and how closely Data Wrangler integrates with AWS’ security features. We will show how you can interact directly with Amazon Cloudwatch logs, query against logs, and return the logs as a data frame.