Making robust alerts
The ability to make more robust alerts is one of the distinguishing factors of Prometheus vs. traditional, check-based monitoring solutions such as Nagios. It allows you to consider multiple factors when creating alerts. For example, rather than just alerting on high memory usage on a server, you can easily create an alert that will only fire if you have high memory usage and a high rate of major page faults since that is generally a better indicator of a system experiencing memory pressure. The idea is to craft alerts in such a way that you reduce the number of false positives as much as possible so that alerts only fire when real, visible impact is occurring. This is part of a larger discussion on the philosophy of alerting on symptoms vs. causes, which is covered comprehensively in Rob Ewaschuk’s excellent document entitled My Philosophy on Alerting (linked at the end of this chapter).
Use logical/set binary operators
In order to make robust alerts...