docs: update plan status and move completed nats-deploy plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
- Move nats-deploy-service.md to completed/ folder - Update prometheus-scrape-target-labels.md with implementation status - Add status table showing which steps are complete/partial/not started - Update cross-references to point to new location Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,10 +1,32 @@
|
||||
# Prometheus Scrape Target Labels
|
||||
|
||||
## Implementation Status
|
||||
|
||||
| Step | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| 1. Create `homelab.host` module | ✅ Complete | `modules/homelab/host.nix` |
|
||||
| 2. Update `lib/monitoring.nix` | ❌ Not started | Labels not extracted or propagated |
|
||||
| 3. Update Prometheus config | ❌ Not started | Still uses flat target list |
|
||||
| 4. Set metadata on hosts | ⚠️ Partial | Some hosts configured, see below |
|
||||
| 5. Update alert rules | ❌ Not started | |
|
||||
| 6. Labels for service targets | ❌ Not started | Optional |
|
||||
|
||||
**Hosts with metadata configured:**
|
||||
- `ns1`, `ns2`: `role = "dns"`, `labels.dns_role = "primary"/"secondary"`
|
||||
- `nix-cache01`: `role = "build-host"` (missing `priority = "low"` from plan)
|
||||
- `vault01`: `role = "vault"`
|
||||
- `jump`: `role = "bastion"`
|
||||
- `template`, `template2`, `testvm*`: `tier` and `priority` set
|
||||
|
||||
**Key gap:** The `homelab.host` module exists and some hosts use it, but `lib/monitoring.nix` does not extract these values—they are not propagated to Prometheus scrape targets.
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
|
||||
|
||||
**Related:** This plan shares the `homelab.host` module with `docs/plans/nats-deploy-service.md`, which uses the same metadata for deployment tier assignment.
|
||||
**Related:** This plan shares the `homelab.host` module with `docs/plans/completed/nats-deploy-service.md`, which uses the same metadata for deployment tier assignment.
|
||||
|
||||
## Motivation
|
||||
|
||||
@@ -54,12 +76,11 @@ or
|
||||
|
||||
## Implementation
|
||||
|
||||
This implementation uses a shared `homelab.host` module that provides host metadata for multiple consumers (Prometheus labels, deployment tiers, etc.). See also `docs/plans/nats-deploy-service.md` which uses the same module for deployment tier assignment.
|
||||
This implementation uses a shared `homelab.host` module that provides host metadata for multiple consumers (Prometheus labels, deployment tiers, etc.). See also `docs/plans/completed/nats-deploy-service.md` which uses the same module for deployment tier assignment.
|
||||
|
||||
### 1. Create `homelab.host` module
|
||||
|
||||
**Status:** Step 1 (Create `homelab.host` module) is complete. The module is in
|
||||
`modules/homelab/host.nix` with tier, priority, role, and labels options.
|
||||
✅ **Complete.** The module is in `modules/homelab/host.nix`.
|
||||
|
||||
Create `modules/homelab/host.nix` with shared host metadata options:
|
||||
|
||||
@@ -98,6 +119,8 @@ Import this module in `modules/homelab/default.nix`.
|
||||
|
||||
### 2. Update `lib/monitoring.nix`
|
||||
|
||||
❌ **Not started.** The current implementation does not extract `homelab.host` values.
|
||||
|
||||
- `extractHostMonitoring` should also extract `homelab.host` values (priority, role, labels).
|
||||
- Build the combined label set from `homelab.host`:
|
||||
|
||||
@@ -126,6 +149,8 @@ This requires grouping hosts by their label attrset and producing one `static_co
|
||||
|
||||
### 3. Update `services/monitoring/prometheus.nix`
|
||||
|
||||
❌ **Not started.** Still uses flat target list (`static_configs = [{ targets = nodeExporterTargets; }]`).
|
||||
|
||||
Change the node-exporter scrape config to use the new structured output:
|
||||
|
||||
```nix
|
||||
@@ -138,29 +163,34 @@ static_configs = nodeExporterTargets;
|
||||
|
||||
### 4. Set metadata on hosts
|
||||
|
||||
⚠️ **Partial.** Some hosts configured (see status table above). Current `nix-cache01` only has `role`, missing the `priority = "low"` suggested below.
|
||||
|
||||
Example in `hosts/nix-cache01/configuration.nix`:
|
||||
|
||||
```nix
|
||||
homelab.host = {
|
||||
tier = "test"; # can be deployed by MCP (used by homelab-deploy)
|
||||
priority = "low"; # relaxed alerting thresholds
|
||||
role = "build-host";
|
||||
};
|
||||
```
|
||||
|
||||
**Note:** Current implementation only sets `role = "build-host"`. Consider adding `priority = "low"` when label propagation is implemented.
|
||||
|
||||
Example in `hosts/ns1/configuration.nix`:
|
||||
|
||||
```nix
|
||||
homelab.host = {
|
||||
tier = "prod";
|
||||
priority = "high";
|
||||
role = "dns";
|
||||
labels.dns_role = "primary";
|
||||
};
|
||||
```
|
||||
|
||||
**Note:** `tier` and `priority` use defaults ("prod" and "high"), which is the intended behavior. The current ns1/ns2 configurations match this pattern.
|
||||
|
||||
### 5. Update alert rules
|
||||
|
||||
❌ **Not started.** Requires steps 2-3 to be completed first.
|
||||
|
||||
After implementing labels, review and update `services/monitoring/rules.yml`:
|
||||
|
||||
- Replace instance-name exclusions with label-based filters (e.g. `{priority!="low"}` instead of `{instance!="nix-cache01.home.2rjus.net:9100"}`).
|
||||
@@ -170,4 +200,6 @@ Specifically, the `high_cpu_load` rule currently has a nix-cache01 exclusion tha
|
||||
|
||||
### 6. Consider labels for `generateScrapeConfigs` (service targets)
|
||||
|
||||
❌ **Not started.** Optional enhancement.
|
||||
|
||||
The same label propagation could be applied to service-level scrape targets. This is optional and can be deferred -- service targets are more specialized and less likely to need generic label-based filtering.
|
||||
|
||||
Reference in New Issue
Block a user