You're reading from Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Product type Paperback

Published in Nov 2024

Publisher Packt

ISBN-13 9781805127284

Length 528 pages

Edition 1st Edition

Languages

Python

Tools

AWS Glue

Concepts

Data Engineering

Authors (4):

Viquar Khan

Gonzalo Herreros González

Huda Nofal

Trâm Ngọc Phạm

View More author details

Table of Contents (16) Chapters

Preface

1. Chapter 1: Managing Data Lake Storage

2. Chapter 2: Sharing Your Data Across Environments and Accounts FREE CHAPTER

3. Chapter 3: Ingesting and Transforming Your Data with AWS Glue

4. Chapter 4: A Deep Dive into AWS Orchestration Frameworks

5. Chapter 5: Running Big Data Workloads with Amazon EMR

6. Chapter 6: Governing Your Platform

7. Chapter 7: Data Quality Management

8. Chapter 8: DevOps – Defining IaC and Building CI/CD Pipelines

9. Chapter 9: Monitoring Data Lake Cloud Infrastructure

10. Chapter 10: Building a Serving Layer with AWS Analytics Services

11. Chapter 11: Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads

12. Chapter 12: Harnessing the Power of AWS for Seamless Data Warehouse Migration

13. Chapter 13: Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS

14. Index

Why subscribe?

15. Other Books You May Enjoy

Using data lake formats to store your data

Historically, big data technologies on the Hadoop ecosystem have taken some trade-offs to scale to volumes that traditional databases cannot handle. In the case of Apache Hive, which became the standard Hadoop SQL database, the external tables just point to files on some object storage such as HDFS or S3, and then jobs access those files without a central system coordinating access or transactions. This is still how the standard tables work on the Glue catalog.

As a result, the atomicity, consistency, isolation, and durability (ACID) properties of RDBMSs were relaxed to allow for scalability in use cases where write concurrency or the lack of transactions is not an issue, such as historical append-only tables.

In recent years, the desire has been to bring back those ACID properties while keeping the data on a scalable object store for cheap and virtually infinite scalability, with many clients and engines using the data in a distributed...

The rest of the chapter is locked

You're reading from Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Table of Contents (16) Chapters

Using data lake formats to store your data

Unlock this book and the full library FREE for 7 days

Authors (4)

Personalised recommendations for you