Summary
In this chapter, we covered Apache Spark, its connection with AWS Glue, and the various features available in AWS Glue, including AWS Glue Data Catalog for data discovery, AWS Glue Crawler for metadata extraction, and AWS Glue Studio for building UI-based ETL pipelines. We also explored how to use the AWS Glue Marketplace to subscribe to different connectors so that we can extract data from SaaS applications.
In the next chapter, we will discuss another essential service that plays a significant role in the data wrangling and discovery process: Amazon Athena.