Airflow Ops Best Practices: Observation and Monitoring
In this chapter, we will continue to explore the application of modern “ops” practices within Apache Airflow, focusing on the observation and monitoring of your systems and DAGs after they’ve been deployed.
We’ll divide this observation into two segments – the core Airflow system and individual DAGs. Each segment will cover specific metrics and measurements you should be monitoring for alerting and potential intervention.
When we discuss monitoring in this section, we will consider two types of monitoring – active and suppressive.
In an active monitoring scenario, a process will actively check a service’s health state, recording its state and potentially taking action directly on the return value.
In a suppressive monitoring scenario, the absence of a state (or state change) is usually meaningful. In these scenarios, the monitored application sends an active schedule...