From 787c14c7a6944ea479ae2b1ca72591e302a98436 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Fri, 6 Feb 2026 01:23:34 +0100 Subject: [PATCH] docs: add dns_role label to scrape target labels plan Add proposed dns_role label to distinguish primary/secondary DNS resolvers. This addresses the unbound_low_cache_hit_ratio alert firing on ns2, which has a cold cache due to low traffic. Co-Authored-By: Claude Opus 4.5 --- docs/plans/prometheus-scrape-target-labels.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/plans/prometheus-scrape-target-labels.md b/docs/plans/prometheus-scrape-target-labels.md index cc754b9..a6a76f2 100644 --- a/docs/plans/prometheus-scrape-target-labels.md +++ b/docs/plans/prometheus-scrape-target-labels.md @@ -32,6 +32,24 @@ Values: free-form string, e.g. `"dns"`, `"build-host"`, `"database"`, `"monitori Recommendation: start with a single primary role string. If multi-role matching becomes a real need, switch to separate boolean labels. +### `dns_role` + +For DNS servers specifically, distinguish between primary and secondary resolvers. The secondary resolver (ns2) receives very little traffic and has a cold cache, making generic cache hit ratio alerts inappropriate. + +Values: `"primary"`, `"secondary"` + +Example use case: The `unbound_low_cache_hit_ratio` alert fires on ns2 because its cache hit ratio (~62%) is lower than ns1 (~90%). This is expected behavior since ns2 gets ~100x less traffic. With a `dns_role` label, the alert can either exclude secondaries or use different thresholds: + +```promql +# Only alert on primary DNS +unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"} + +# Or use different thresholds +(unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"}) +or +(unbound_cache_hit_ratio < 0.5 and on(instance) unbound_up{dns_role="secondary"}) +``` + ## Implementation ### 1. Add `labels` option to `homelab.monitoring`