0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Data Engineering with AWS Cookbook

You're reading from Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Product type Paperback

Published in Nov 2024

Publisher Packt

ISBN-13 9781805127284

Length 528 pages

Edition 1st Edition

Languages

Python

Tools

AWS Glue

Concepts

Data Engineering Cloud Computing

Authors (4):

Viquar Khan

Gonzalo Herreros González

Huda Nofal

Trâm Ngọc Phạm

View More author details

Table of Contents (16) Chapters

Preface

1. Chapter 1: Managing Data Lake Storage

2. Chapter 2: Sharing Your Data Across Environments and Accounts FREE CHAPTER

3. Chapter 3: Ingesting and Transforming Your Data with AWS Glue

4. Chapter 4: A Deep Dive into AWS Orchestration Frameworks

5. Chapter 5: Running Big Data Workloads with Amazon EMR

6. Chapter 6: Governing Your Platform

7. Chapter 7: Data Quality Management

8. Chapter 8: DevOps – Defining IaC and Building CI/CD Pipelines

9. Chapter 9: Monitoring Data Lake Cloud Infrastructure

10. Chapter 10: Building a Serving Layer with AWS Analytics Services

11. Chapter 11: Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads

12. Chapter 12: Harnessing the Power of AWS for Seamless Data Warehouse Migration

13. Chapter 13: Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS

14. Index

Why subscribe?

15. Other Books You May Enjoy

Reusing libraries in your Glue job

Spark provides a rich data framework that can be extended with additional plugins, libraries, and Python modules. As you build more jobs, you would likely reuse your own code, whether it’s UDFs to process data when it’s not possible to do the same using the Spark functions or some pipeline code you want to reuse; for instance, a function with some transformations that you do regularly.

In this recipe, you will see how you can reuse Python code on Glue for Spark jobs.

Getting ready

This recipe requires a bash shell with the AWS CLI installed and configured and the GLUE_ROLE_ARN and GLUE_BUCKET environment variables set, as indicated in the Technical requirements section at the beginning of the chapter.

How to do it...

The following bash commands will create a Python module and config file:

mkdir my_module
cat <<EOF > my_module/__init__.py
from random import randint
def do_some_calculation(a):
  return...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (4)

Trâm Ngọc Phạm

Trâm Ngọc Phạm

Trâm Ngọc Phạm is a senior data architect with over a decade of hands-on experience working in the big data and AI field, from playing a lead role in tailoring cloud data platforms to BI and analytics use cases for enterprises in Vietnam. While working as a Senior Data and Analytics consultant for the AWS Professional Services team, she specialized in guiding finance and telco companies across Southeast Asian countries to build enterprise-scale data platforms and drive analytics use cases that utilized AWS services and big data tools.

See other products by Trâm Ngọc Phạm

Viquar Khan

Viquar Khan

Viquar Khan is a senior data architect at AWS Professional Services and brings over 20 years of expertise in finance and data analytics, empowering global financial institutions to harness the full potential of AWS technologies. He designs cutting-edge, customized data solutions tailored to complex industry needs. A polyglot developer skilled in Java, Scala, Python, and other languages, Viquar has excelled in various technical roles. As an expert group member of JSR368 (JavaTM Message Service 2.1), he has shaped industry standards and actively contributes to open source projects such as Apache Spark and Terraform. His technical insights have reached and benefited over 6.7 million users on Stack Overflow.

See other products by Viquar Khan

Huda Nofal

Huda Nofal

Huda Nofal is a seasoned data engineer with over 7 years of experience at Amazon, where she has played a key role in helping internal business teams achieve their data goals. With deep expertise in AWS services, she has successfully designed and implemented data pipelines that power critical decision-making processes across various organizations. Huda's work primarily focuses on leveraging Redshift, Glue, data lakes, and Lambda to create scalable, efficient data solutions.

See other products by Huda Nofal

Gonzalo Herreros González

Gonzalo Herreros González

Gonzalo Herreros González is a principal data architect. He holds a bachelor's degree in computer science and a master's degree in data analytics. He has experience of over a decade in big data and two decades of software development, both in AWS and on-premises. Previously, he worked at MasterCard where he achieved the first PCI-DSS Hadoop cluster in the world. More recently, he worked at AWS for over 6 years, building data pipelines for the internal network data, and later, as an architect in the AWS Glue service team, building transforms for AWS Glue Studio and helping large customers succeed with AWS data services.

See other products by Gonzalo Herreros González

Personalised recommendations for you

Based on your interests and search pattern

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m