Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Serverless Analytics with Amazon Athena

You're reading from   Serverless Analytics with Amazon Athena Query structured, unstructured, or semi-structured data in seconds without setting up any infrastructure

Arrow left icon
Product type Paperback
Published in Nov 2021
Publisher Packt
ISBN-13 9781800562349
Length 438 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Aaron Wishnick Aaron Wishnick
Author Profile Icon Aaron Wishnick
Aaron Wishnick
Mert Turkay Hocanin Mert Turkay Hocanin
Author Profile Icon Mert Turkay Hocanin
Mert Turkay Hocanin
Anthony Virtuoso Anthony Virtuoso
Author Profile Icon Anthony Virtuoso
Anthony Virtuoso
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Section 1: Fundamentals Of Amazon Athena
2. Chapter 1: Your First Query FREE CHAPTER 3. Chapter 2: Introduction to Amazon Athena 4. Chapter 3: Key Features, Query Types, and Functions 5. Section 2: Building and Connecting to Your Data Lake
6. Chapter 4: Metastores, Data Sources, and Data Lakes 7. Chapter 5: Securing Your Data 8. Chapter 6: AWS Glue and AWS Lake Formation 9. Section 3: Using Amazon Athena
10. Chapter 7: Ad Hoc Analytics 11. Chapter 8: Querying Unstructured and Semi-Structured Data 12. Chapter 9: Serverless ETL Pipelines 13. Chapter 10: Building Applications with Amazon Athena 14. Chapter 11: Operational Excellence – Monitoring, Optimization, and Troubleshooting 15. Section 4: Advanced Topics
16. Chapter 12: Athena Query Federation 17. Chapter 13: Athena UDFs and ML 18. Chapter 14: Lake Formation – Advanced Topics 19. Other Books You May Enjoy

What this book covers

Chapter 1, Your First Query, is all about orienting you to the serverless analytics experience offered by Amazon Athena. For now, we will simplify things in order to run your first queries and demonstrate why so many people choose Amazon Athena for their workloads. This will help establish your mental model for the deeper discussions, features, and examples of later sections.

Chapter 2, Introduction to Amazon Athena, continues your introduction to Athena by discussing the service's capabilities, scalability, and pricing. You'll learn when to use Amazon Athena and how to estimate the performance and costs of your workloads before building them on Athena. We'll also take a look behind the scenes to see how Athena uses PrestoDB, an open source SQL engine from Facebook, to process your queries.

Chapter 3, Key Features, Query Types, and Functions, concludes our introduction to Amazon Athena by exploring built-in features you can use to make your reports or application more powerful. This includes approximate query techniques to speed up analysis of large datasets and Create Table As Select (CTAS) statements for running queries that generate significant amounts of result data.

Chapter 4, Metastores, Data Sources, and Data Lakes, teaches you what a metastore is and what they contain. We will introduce Apache Hive and AWS Glue Data Catalog implementations of a metastore. We'll then learn how to create tables through Athena or discover datasets in S3 using AWS Glue crawlers. We then focus on a typical data lake architecture, which contains three different stages for data.

Chapter 5, Securing Your Data, covers the various methods that can be employed to secure your data and ensure it can only be viewed by those that have permission to do so.

Chapter 6, AWS Glue and AWS Lake Formation, demonstrates step by step how to build a secure data lake in Lake Formation and how Athena interacts with Lake Formation to keep data safe.

Chapter 7, Ad Hoc Analytics, focuses on how you can use Athena to quickly get to know your data, look for patterns, find outliers, and generally surface insights that will help you get the most from your data.

Chapter 8, Querying Unstructured and Semi-Structured Data, shows how Amazon Athena combines a traditional query engine, and its requirement for an upfront schema, with extensions that allow it to handle data that contains varying or no schema.

Chapter 9, Serverless ETL Pipelines, continue with the theme of controlling chaos by using automation to normalize newly arrived data through a process known as extract, transform, load (ETL).

Chapter 10, Building Applications with Amazon Athena, tells you what to do when integrating Amazon Athena into your applications. How will the application make Athena calls? How should credentials be stored? Should you use JDBC, ODBC, or Athena's SDK? What are the best practices on setting up connectivity between your application and Athena and the security considerations? Lastly, what is the best way for me to store my data on S3 to optimize speed and cost? This chapter will answer all these questions and give examples – including working code – to get you started integrating with Athena fast, easily, and in a secure way.

Chapter 11, Operational Excellence – Maintenance, Optimization, and Troubleshooting, focuses on operational excellence by looking at what could go wrong when using Athena in a production environment. We'll learn how to monitor and alert KPI breaches – such as queue dwell times – using CloudWatch metrics so you can avoid surprises. You'll also see how to optimize your data and queries to avoid problems before they happen. We'll then look at how the layout of data stored in S3 can have a significant impact on both cost and performance. Lastly, we will look at the most common reasons for query failure and review tips to help diagnose and correct failing queries.

Chapter 12, Athena Query Federation, is all about getting the most out of Amazon Athena by using Athena's Query Federation capabilities to expand beyond queries over data in S3. We will illustrate how Query Federation allows you to combine data from multiple sources (for example, S3 and Elasticsearch) to provide a single source of truth for your queries. Then we will peel back the hood and explain how Amazon Athena uses AWS Lambda to run customizable connectors. We will even write our own connector in order to show you how easy it is to customize Athena with your own code.

Chapter 13, Athena UDFs and ML, continues the theme of enhancing Amazon Athena with our own functionality by adding our own user-defined functions and machine learning models. These capabilities allow us to do everything from applying ML inference to identify suspicious records in our dataset to converting port numbers in a VPC flow log to the common name for that port (for example, HTTP). In all of these examples, we add our own logic to Athena's row-level processing without the need to run any servers of our own.

Chapter 14, Lake Formation – Advanced Topics, covers some of the advanced features that Lake Formation brings to the table, and explores various use cases that are enabled by these features.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image