Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Airflow Best Practices

You're reading from   Apache Airflow Best Practices A practical guide to orchestrating data workflow with Apache Airflow

Arrow left icon
Product type Paperback
Published in Oct 2024
Publisher Packt
ISBN-13 9781805123750
Length 188 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Dylan Storey Dylan Storey
Author Profile Icon Dylan Storey
Dylan Storey
Dylan Intorf Dylan Intorf
Author Profile Icon Dylan Intorf
Dylan Intorf
Kendrick van Doorn Kendrick van Doorn
Author Profile Icon Kendrick van Doorn
Kendrick van Doorn
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Part 1: Apache Airflow: History, What, and Why
2. Chapter 1: Getting Started with Airflow 2.0 FREE CHAPTER 3. Chapter 2: Core Airflow Concepts 4. Part 2: Airflow Basics
5. Chapter 3: Components of Airflow 6. Chapter 4: Basics of Airflow and DAG Authoring 7. Part 3: Common Use Cases
8. Chapter 5: Connecting to External Sources 9. Chapter 6: Extending Functionality with UI Plugins 10. Chapter 7: Writing and Distributing Custom Providers 11. Chapter 8: Orchestrating a Machine Learning Workflow 12. Chapter 9: Using Airflow as a Driving Service 13. Part 4: Scale with Your Deployed Instance
14. Chapter 10: Airflow Ops: Development and Deployment 15. Chapter 11: Airflow Ops Best Practices: Observation and Monitoring 16. Chapter 12: Multi-Tenancy in Airflow 17. Chapter 13: Migrating Airflow 18. Index 19. Other Books You May Enjoy

Extracting images from the NASA API

This pipeline is designed to extract an image every day, store this information in a folder, and notify you of the completion. This entire process will be orchestrated by Apache Airflow and will take advantage of the scheduler to automate the function of re-running. As stated earlier, it is helpful to spend time working through practicing this in Jupyter Notebook or another tool to ensure the API calls and connections are operating as expected and to troubleshoot any issues.

The NASA API

For this data pipeline, we will be extracting data from NASA. My favorite API is the Astronomy Picture of the Day (APOD) where a new photo is selected and displayed. You can easily change the API to another of interest, but for this example, I recommend you stick with the APOD and explore others once completed.

A NASA API key is required to start this next step:

  1. Create a NASA API key (https://api.nasa.gov/).
  2. Input your name, email, and planned...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image