docs: add homelab-deploy plan, unify host metadata
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Add plan for NATS-based deployment service (homelab-deploy) that enables on-demand NixOS configuration updates via messaging. Features tiered permissions (test/prod) enforced at NATS layer. Update prometheus-scrape-target-labels plan to share the homelab.host module for host metadata (tier, priority, role, labels) - single source of truth for both deployment tiers and prometheus labels. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -4,6 +4,8 @@
|
||||
|
||||
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
|
||||
|
||||
**Related:** This plan shares the `homelab.host` module with `docs/plans/nats-deploy-service.md`, which uses the same metadata for deployment tier assignment.
|
||||
|
||||
## Motivation
|
||||
|
||||
Some hosts have workloads that make generic alert thresholds inappropriate. For example, `nix-cache01` regularly hits high CPU during builds, requiring a longer `for` duration on `high_cpu_load`. Currently this is handled by excluding specific instance names in PromQL expressions, which is brittle and doesn't scale.
|
||||
@@ -52,22 +54,59 @@ or
|
||||
|
||||
## Implementation
|
||||
|
||||
### 1. Add `labels` option to `homelab.monitoring`
|
||||
This implementation uses a shared `homelab.host` module that provides host metadata for multiple consumers (Prometheus labels, deployment tiers, etc.). See also `docs/plans/nats-deploy-service.md` which uses the same module for deployment tier assignment.
|
||||
|
||||
In `modules/homelab/monitoring.nix`, add:
|
||||
### 1. Create `homelab.host` module
|
||||
|
||||
Create `modules/homelab/host.nix` with shared host metadata options:
|
||||
|
||||
```nix
|
||||
labels = lib.mkOption {
|
||||
type = lib.types.attrsOf lib.types.str;
|
||||
default = { };
|
||||
description = "Custom labels to attach to this host's scrape targets";
|
||||
};
|
||||
{ lib, ... }:
|
||||
{
|
||||
options.homelab.host = {
|
||||
tier = lib.mkOption {
|
||||
type = lib.types.enum [ "test" "prod" ];
|
||||
default = "prod";
|
||||
description = "Deployment tier - controls which credentials can deploy to this host";
|
||||
};
|
||||
|
||||
priority = lib.mkOption {
|
||||
type = lib.types.enum [ "high" "low" ];
|
||||
default = "high";
|
||||
description = "Alerting priority - low priority hosts have relaxed thresholds";
|
||||
};
|
||||
|
||||
role = lib.mkOption {
|
||||
type = lib.types.nullOr lib.types.str;
|
||||
default = null;
|
||||
description = "Primary role of this host (dns, database, monitoring, etc.)";
|
||||
};
|
||||
|
||||
labels = lib.mkOption {
|
||||
type = lib.types.attrsOf lib.types.str;
|
||||
default = { };
|
||||
description = "Additional free-form labels (e.g., dns_role = 'primary')";
|
||||
};
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
Import this module in `modules/homelab/default.nix`.
|
||||
|
||||
### 2. Update `lib/monitoring.nix`
|
||||
|
||||
- `extractHostMonitoring` should carry `labels` through in its return value.
|
||||
- `generateNodeExporterTargets` currently returns a flat list of target strings. It needs to return structured `static_configs` entries instead, grouping targets by their label sets:
|
||||
- `extractHostMonitoring` should also extract `homelab.host` values (priority, role, labels).
|
||||
- Build the combined label set from `homelab.host`:
|
||||
|
||||
```nix
|
||||
# Combine structured options + free-form labels
|
||||
effectiveLabels =
|
||||
(lib.optionalAttrs (host.priority != "high") { priority = host.priority; })
|
||||
// (lib.optionalAttrs (host.role != null) { role = host.role; })
|
||||
// host.labels;
|
||||
```
|
||||
|
||||
- `generateNodeExporterTargets` returns structured `static_configs` entries, grouping targets by their label sets:
|
||||
|
||||
```nix
|
||||
# Before (flat list):
|
||||
@@ -80,7 +119,7 @@ labels = lib.mkOption {
|
||||
]
|
||||
```
|
||||
|
||||
This requires grouping hosts by their label attrset and producing one `static_configs` entry per unique label combination. Hosts with no custom labels get grouped together with no extra labels (preserving current behavior).
|
||||
This requires grouping hosts by their label attrset and producing one `static_configs` entry per unique label combination. Hosts with default values (priority=high, no role, no labels) get grouped together with no extra labels (preserving current behavior).
|
||||
|
||||
### 3. Update `services/monitoring/prometheus.nix`
|
||||
|
||||
@@ -94,17 +133,29 @@ static_configs = [{ targets = nodeExporterTargets; }];
|
||||
static_configs = nodeExporterTargets;
|
||||
```
|
||||
|
||||
### 4. Set labels on hosts
|
||||
### 4. Set metadata on hosts
|
||||
|
||||
Example in `hosts/nix-cache01/configuration.nix` or the relevant service module:
|
||||
Example in `hosts/nix-cache01/configuration.nix`:
|
||||
|
||||
```nix
|
||||
homelab.monitoring.labels = {
|
||||
priority = "low";
|
||||
homelab.host = {
|
||||
tier = "test"; # can be deployed by MCP (used by homelab-deploy)
|
||||
priority = "low"; # relaxed alerting thresholds
|
||||
role = "build-host";
|
||||
};
|
||||
```
|
||||
|
||||
Example in `hosts/ns1/configuration.nix`:
|
||||
|
||||
```nix
|
||||
homelab.host = {
|
||||
tier = "prod";
|
||||
priority = "high";
|
||||
role = "dns";
|
||||
labels.dns_role = "primary";
|
||||
};
|
||||
```
|
||||
|
||||
### 5. Update alert rules
|
||||
|
||||
After implementing labels, review and update `services/monitoring/rules.yml`:
|
||||
|
||||
Reference in New Issue
Block a user