docs: add dns_role label to scrape target labels plan
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
Add proposed dns_role label to distinguish primary/secondary DNS resolvers. This addresses the unbound_low_cache_hit_ratio alert firing on ns2, which has a cold cache due to low traffic. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -32,6 +32,24 @@ Values: free-form string, e.g. `"dns"`, `"build-host"`, `"database"`, `"monitori
|
||||
|
||||
Recommendation: start with a single primary role string. If multi-role matching becomes a real need, switch to separate boolean labels.
|
||||
|
||||
### `dns_role`
|
||||
|
||||
For DNS servers specifically, distinguish between primary and secondary resolvers. The secondary resolver (ns2) receives very little traffic and has a cold cache, making generic cache hit ratio alerts inappropriate.
|
||||
|
||||
Values: `"primary"`, `"secondary"`
|
||||
|
||||
Example use case: The `unbound_low_cache_hit_ratio` alert fires on ns2 because its cache hit ratio (~62%) is lower than ns1 (~90%). This is expected behavior since ns2 gets ~100x less traffic. With a `dns_role` label, the alert can either exclude secondaries or use different thresholds:
|
||||
|
||||
```promql
|
||||
# Only alert on primary DNS
|
||||
unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"}
|
||||
|
||||
# Or use different thresholds
|
||||
(unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"})
|
||||
or
|
||||
(unbound_cache_hit_ratio < 0.5 and on(instance) unbound_up{dns_role="secondary"})
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
### 1. Add `labels` option to `homelab.monitoring`
|
||||
|
||||
Reference in New Issue
Block a user