monitoring: propagate host labels to Prometheus scrape targets
Extract homelab.host metadata (tier, priority, role, labels) from host configurations and propagate them to Prometheus scrape targets. This enables semantic alert filtering using labels instead of hardcoded instance names. Changes: - lib/monitoring.nix: Extract host metadata, group targets by labels - prometheus.nix: Use structured static_configs with labels - rules.yml: Replace instance filters with role-based filters Example labels in Prometheus: - ns1/ns2: role=dns, dns_role=primary/secondary - nix-cache01: role=build-host - testvm*: tier=test Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -5,20 +5,19 @@
|
||||
| Step | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| 1. Create `homelab.host` module | ✅ Complete | `modules/homelab/host.nix` |
|
||||
| 2. Update `lib/monitoring.nix` | ❌ Not started | Labels not extracted or propagated |
|
||||
| 3. Update Prometheus config | ❌ Not started | Still uses flat target list |
|
||||
| 4. Set metadata on hosts | ⚠️ Partial | Some hosts configured, see below |
|
||||
| 5. Update alert rules | ❌ Not started | |
|
||||
| 6. Labels for service targets | ❌ Not started | Optional |
|
||||
| 2. Update `lib/monitoring.nix` | ✅ Complete | Labels extracted and propagated |
|
||||
| 3. Update Prometheus config | ✅ Complete | Uses structured static_configs |
|
||||
| 4. Set metadata on hosts | ✅ Complete | All relevant hosts configured |
|
||||
| 5. Update alert rules | ✅ Complete | Role-based filtering implemented |
|
||||
| 6. Labels for service targets | ✅ Complete | Host labels propagated to all services |
|
||||
|
||||
**Hosts with metadata configured:**
|
||||
- `ns1`, `ns2`: `role = "dns"`, `labels.dns_role = "primary"/"secondary"`
|
||||
- `nix-cache01`: `role = "build-host"` (missing `priority = "low"` from plan)
|
||||
- `nix-cache01`: `role = "build-host"`
|
||||
- `vault01`: `role = "vault"`
|
||||
- `jump`: `role = "bastion"`
|
||||
- `template`, `template2`, `testvm*`: `tier` and `priority` set
|
||||
- `testvm01/02/03`: `tier = "test"`
|
||||
|
||||
**Key gap:** The `homelab.host` module exists and some hosts use it, but `lib/monitoring.nix` does not extract these values—they are not propagated to Prometheus scrape targets.
|
||||
**Implementation complete.** Branch: `prometheus-scrape-target-labels`
|
||||
|
||||
---
|
||||
|
||||
@@ -119,7 +118,7 @@ Import this module in `modules/homelab/default.nix`.
|
||||
|
||||
### 2. Update `lib/monitoring.nix`
|
||||
|
||||
❌ **Not started.** The current implementation does not extract `homelab.host` values.
|
||||
✅ **Complete.** Labels are now extracted and propagated.
|
||||
|
||||
- `extractHostMonitoring` should also extract `homelab.host` values (priority, role, labels).
|
||||
- Build the combined label set from `homelab.host`:
|
||||
@@ -149,7 +148,7 @@ This requires grouping hosts by their label attrset and producing one `static_co
|
||||
|
||||
### 3. Update `services/monitoring/prometheus.nix`
|
||||
|
||||
❌ **Not started.** Still uses flat target list (`static_configs = [{ targets = nodeExporterTargets; }]`).
|
||||
✅ **Complete.** Now uses structured static_configs output.
|
||||
|
||||
Change the node-exporter scrape config to use the new structured output:
|
||||
|
||||
@@ -163,7 +162,7 @@ static_configs = nodeExporterTargets;
|
||||
|
||||
### 4. Set metadata on hosts
|
||||
|
||||
⚠️ **Partial.** Some hosts configured (see status table above). Current `nix-cache01` only has `role`, missing the `priority = "low"` suggested below.
|
||||
✅ **Complete.** All relevant hosts have metadata configured. Note: The implementation filters by `role` rather than `priority`, which matches the existing nix-cache01 configuration.
|
||||
|
||||
Example in `hosts/nix-cache01/configuration.nix`:
|
||||
|
||||
@@ -189,17 +188,11 @@ homelab.host = {
|
||||
|
||||
### 5. Update alert rules
|
||||
|
||||
❌ **Not started.** Requires steps 2-3 to be completed first.
|
||||
✅ **Complete.** Updated `services/monitoring/rules.yml`:
|
||||
|
||||
After implementing labels, review and update `services/monitoring/rules.yml`:
|
||||
- `high_cpu_load`: Replaced `instance!="nix-cache01..."` with `role!="build-host"` for standard hosts (15m duration) and `role="build-host"` for build hosts (2h duration).
|
||||
- `unbound_low_cache_hit_ratio`: Added `dns_role="primary"` filter to only alert on the primary DNS resolver (secondary has a cold cache).
|
||||
|
||||
- Replace instance-name exclusions with label-based filters (e.g. `{priority!="low"}` instead of `{instance!="nix-cache01.home.2rjus.net:9100"}`).
|
||||
- Consider whether any other rules should differentiate by priority or role.
|
||||
### 6. Labels for `generateScrapeConfigs` (service targets)
|
||||
|
||||
Specifically, the `high_cpu_load` rule currently has a nix-cache01 exclusion that should be replaced with a `priority`-based filter.
|
||||
|
||||
### 6. Consider labels for `generateScrapeConfigs` (service targets)
|
||||
|
||||
❌ **Not started.** Optional enhancement.
|
||||
|
||||
The same label propagation could be applied to service-level scrape targets. This is optional and can be deferred -- service targets are more specialized and less likely to need generic label-based filtering.
|
||||
✅ **Complete.** Host labels are now propagated to all auto-generated service scrape targets (unbound, homelab-deploy, nixos-exporter, etc.). This enables semantic filtering on any service metric, such as using `dns_role="primary"` with the unbound job.
|
||||
|
||||
Reference in New Issue
Block a user