diff --git a/docs/plans/prometheus-scrape-target-labels.md b/docs/plans/prometheus-scrape-target-labels.md index cc754b9..a6a76f2 100644 --- a/docs/plans/prometheus-scrape-target-labels.md +++ b/docs/plans/prometheus-scrape-target-labels.md @@ -32,6 +32,24 @@ Values: free-form string, e.g. `"dns"`, `"build-host"`, `"database"`, `"monitori Recommendation: start with a single primary role string. If multi-role matching becomes a real need, switch to separate boolean labels. +### `dns_role` + +For DNS servers specifically, distinguish between primary and secondary resolvers. The secondary resolver (ns2) receives very little traffic and has a cold cache, making generic cache hit ratio alerts inappropriate. + +Values: `"primary"`, `"secondary"` + +Example use case: The `unbound_low_cache_hit_ratio` alert fires on ns2 because its cache hit ratio (~62%) is lower than ns1 (~90%). This is expected behavior since ns2 gets ~100x less traffic. With a `dns_role` label, the alert can either exclude secondaries or use different thresholds: + +```promql +# Only alert on primary DNS +unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"} + +# Or use different thresholds +(unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"}) +or +(unbound_cache_hit_ratio < 0.5 and on(instance) unbound_up{dns_role="secondary"}) +``` + ## Implementation ### 1. Add `labels` option to `homelab.monitoring`