docs: add homelab-deploy plan, unify host metadata
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled

Add plan for NATS-based deployment service (homelab-deploy) that enables
on-demand NixOS configuration updates via messaging. Features tiered
permissions (test/prod) enforced at NATS layer.

Update prometheus-scrape-target-labels plan to share the homelab.host
module for host metadata (tier, priority, role, labels) - single source
of truth for both deployment tiers and prometheus labels.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-07 02:10:54 +01:00
parent 881e70df27
commit 4d724329a6
2 changed files with 388 additions and 14 deletions

View File

@@ -4,6 +4,8 @@
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
**Related:** This plan shares the `homelab.host` module with `docs/plans/nats-deploy-service.md`, which uses the same metadata for deployment tier assignment.
## Motivation
Some hosts have workloads that make generic alert thresholds inappropriate. For example, `nix-cache01` regularly hits high CPU during builds, requiring a longer `for` duration on `high_cpu_load`. Currently this is handled by excluding specific instance names in PromQL expressions, which is brittle and doesn't scale.
@@ -52,22 +54,59 @@ or
## Implementation
### 1. Add `labels` option to `homelab.monitoring`
This implementation uses a shared `homelab.host` module that provides host metadata for multiple consumers (Prometheus labels, deployment tiers, etc.). See also `docs/plans/nats-deploy-service.md` which uses the same module for deployment tier assignment.
In `modules/homelab/monitoring.nix`, add:
### 1. Create `homelab.host` module
Create `modules/homelab/host.nix` with shared host metadata options:
```nix
labels = lib.mkOption {
type = lib.types.attrsOf lib.types.str;
default = { };
description = "Custom labels to attach to this host's scrape targets";
};
{ lib, ... }:
{
options.homelab.host = {
tier = lib.mkOption {
type = lib.types.enum [ "test" "prod" ];
default = "prod";
description = "Deployment tier - controls which credentials can deploy to this host";
};
priority = lib.mkOption {
type = lib.types.enum [ "high" "low" ];
default = "high";
description = "Alerting priority - low priority hosts have relaxed thresholds";
};
role = lib.mkOption {
type = lib.types.nullOr lib.types.str;
default = null;
description = "Primary role of this host (dns, database, monitoring, etc.)";
};
labels = lib.mkOption {
type = lib.types.attrsOf lib.types.str;
default = { };
description = "Additional free-form labels (e.g., dns_role = 'primary')";
};
};
}
```
Import this module in `modules/homelab/default.nix`.
### 2. Update `lib/monitoring.nix`
- `extractHostMonitoring` should carry `labels` through in its return value.
- `generateNodeExporterTargets` currently returns a flat list of target strings. It needs to return structured `static_configs` entries instead, grouping targets by their label sets:
- `extractHostMonitoring` should also extract `homelab.host` values (priority, role, labels).
- Build the combined label set from `homelab.host`:
```nix
# Combine structured options + free-form labels
effectiveLabels =
(lib.optionalAttrs (host.priority != "high") { priority = host.priority; })
// (lib.optionalAttrs (host.role != null) { role = host.role; })
// host.labels;
```
- `generateNodeExporterTargets` returns structured `static_configs` entries, grouping targets by their label sets:
```nix
# Before (flat list):
@@ -80,7 +119,7 @@ labels = lib.mkOption {
]
```
This requires grouping hosts by their label attrset and producing one `static_configs` entry per unique label combination. Hosts with no custom labels get grouped together with no extra labels (preserving current behavior).
This requires grouping hosts by their label attrset and producing one `static_configs` entry per unique label combination. Hosts with default values (priority=high, no role, no labels) get grouped together with no extra labels (preserving current behavior).
### 3. Update `services/monitoring/prometheus.nix`
@@ -94,17 +133,29 @@ static_configs = [{ targets = nodeExporterTargets; }];
static_configs = nodeExporterTargets;
```
### 4. Set labels on hosts
### 4. Set metadata on hosts
Example in `hosts/nix-cache01/configuration.nix` or the relevant service module:
Example in `hosts/nix-cache01/configuration.nix`:
```nix
homelab.monitoring.labels = {
priority = "low";
homelab.host = {
tier = "test"; # can be deployed by MCP (used by homelab-deploy)
priority = "low"; # relaxed alerting thresholds
role = "build-host";
};
```
Example in `hosts/ns1/configuration.nix`:
```nix
homelab.host = {
tier = "prod";
priority = "high";
role = "dns";
labels.dns_role = "primary";
};
```
### 5. Update alert rules
After implementing labels, review and update `services/monitoring/rules.yml`: