You're reading from Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Product type Paperback

Published in Nov 2024

Publisher Packt

ISBN-13 9781805127284

Length 528 pages

Edition 1st Edition

Languages

Python

Tools

AWS Glue

Concepts

Data Engineering Cloud Computing

Authors (4):

Viquar Khan

Gonzalo Herreros González

Huda Nofal

Trâm Ngọc Phạm

View More author details

Table of Contents (16) Chapters

Preface

1. Chapter 1: Managing Data Lake Storage

2. Chapter 2: Sharing Your Data Across Environments and Accounts FREE CHAPTER

3. Chapter 3: Ingesting and Transforming Your Data with AWS Glue

4. Chapter 4: A Deep Dive into AWS Orchestration Frameworks

5. Chapter 5: Running Big Data Workloads with Amazon EMR

6. Chapter 6: Governing Your Platform

7. Chapter 7: Data Quality Management

8. Chapter 8: DevOps – Defining IaC and Building CI/CD Pipelines

9. Chapter 9: Monitoring Data Lake Cloud Infrastructure

10. Chapter 10: Building a Serving Layer with AWS Analytics Services

11. Chapter 11: Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads

12. Chapter 12: Harnessing the Power of AWS for Seamless Data Warehouse Migration

13. Chapter 13: Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS

14. Index

Why subscribe?

15. Other Books You May Enjoy

Code development on EMR using Workspaces

Developing data processing code on complex distributed frameworks is much more productive when it is done in an interactive way by using representative data and seeing the results of the transformations done on each step. This has led to an increase in the popularity of languages that can be interpreted interactively, such as Python or Scala.

While you can do some interactive development via a shell, as the code becomes larger, it stops being practical. The productive way to do this is via a notebook with cells, where each cell holds and executes a block of code, but the variables are common to the notebook so the work you do in one cell is visible to the others. That way, you can develop and test a small piece of code at a time and see the results.

EMR has traditionally supported this style of development with Apache Zeppelin, which can be installed on the cluster to run multiple types of notebooks including Spark or Bash, with multiple...

The rest of the chapter is locked

You're reading from Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Table of Contents (16) Chapters

Code development on EMR using Workspaces

Unlock this book and the full library FREE for 7 days

Authors (4)

Personalised recommendations for you