Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Airflow Best Practices

You're reading from   Apache Airflow Best Practices A practical guide to orchestrating data workflow with Apache Airflow

Arrow left icon
Product type Paperback
Published in Oct 2024
Publisher Packt
ISBN-13 9781805123750
Length 188 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Dylan Storey Dylan Storey
Author Profile Icon Dylan Storey
Dylan Storey
Dylan Intorf Dylan Intorf
Author Profile Icon Dylan Intorf
Dylan Intorf
Kendrick van Doorn Kendrick van Doorn
Author Profile Icon Kendrick van Doorn
Kendrick van Doorn
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Part 1: Apache Airflow: History, What, and Why
2. Chapter 1: Getting Started with Airflow 2.0 FREE CHAPTER 3. Chapter 2: Core Airflow Concepts 4. Part 2: Airflow Basics
5. Chapter 3: Components of Airflow 6. Chapter 4: Basics of Airflow and DAG Authoring 7. Part 3: Common Use Cases
8. Chapter 5: Connecting to External Sources 9. Chapter 6: Extending Functionality with UI Plugins 10. Chapter 7: Writing and Distributing Custom Providers 11. Chapter 8: Orchestrating a Machine Learning Workflow 12. Chapter 9: Using Airflow as a Driving Service 13. Part 4: Scale with Your Deployed Instance
14. Chapter 10: Airflow Ops: Development and Deployment 15. Chapter 11: Airflow Ops Best Practices: Observation and Monitoring 16. Chapter 12: Multi-Tenancy in Airflow 17. Chapter 13: Migrating Airflow 18. Index 19. Other Books You May Enjoy

Scheduler

The previous sections covered how tasks are executed and the best way to enable different use cases of tasks instances to be executed. To determine when these tasks should be scheduled for execution, we need to take a closer look at the Scheduler and its multiple responsibilities:

  • DAG Parsing: The scheduler continuously parses DAG files in the DAG Directory to look for new tasks to schedule. It determines the execution order based on dependencies set within the DAGs.
  • Heartbeat Mechanism: The scheduler operates in a loop, often referred to as the “heartbeat”, where it continually checks for tasks to run, schedules them, and then sleeps for a short duration before checking again.
  • Dynamic Task Scheduling: Unlike traditional cron setups where jobs are fixed, the Airflow scheduler dynamically determines which tasks should run based on their dependencies and state. This allows for more complex workflows with conditional execution paths.

Some...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image