prometheus-scrape-target-labels #30

Merged
torjus merged 5 commits from prometheus-scrape-target-labels into master 2026-02-07 16:27:38 +00:00
Owner

Summary

Propagate homelab.host metadata to Prometheus scrape targets, enabling semantic alert filtering using labels instead of hardcoded instance names.

Changes

  • lib/monitoring.nix: Extract host metadata (tier, priority, role, labels) and propagate to all scrape targets
  • services/monitoring/prometheus.nix: Use structured static_configs with labels
  • services/monitoring/rules.yml: Replace hardcoded instance exclusions with role-based filters
  • .claude/skills/observability/SKILL.md: Document new labels for troubleshooting
  • CLAUDE.md: Document homelab-deploy CLI for prod hosts

New Labels

All Prometheus scrape targets now include:

Label Description
hostname Short hostname (e.g., ns1, monitoring01) - always present
role Host role (e.g., dns, build-host, vault) - when configured
tier Deployment tier (test for test VMs) - when non-default
dns_role primary or secondary for DNS servers

Example Queries

{hostname="ns1"}                    # All metrics from ns1
up{role="dns"}                      # All DNS servers
node_load5{role="build-host"}       # Build hosts only
up{dns_role="primary"}              # Primary DNS only

Alert Rule Updates

  • high_cpu_load: Now uses role!="build-host" instead of hardcoded instance!="nix-cache01..."
  • unbound_low_cache_hit_ratio: Now filters by dns_role="primary" to avoid alerting on secondary DNS

Verified

Deployed to monitoring01 and confirmed labels are working in Prometheus.

## Summary Propagate `homelab.host` metadata to Prometheus scrape targets, enabling semantic alert filtering using labels instead of hardcoded instance names. ## Changes - **lib/monitoring.nix**: Extract host metadata (tier, priority, role, labels) and propagate to all scrape targets - **services/monitoring/prometheus.nix**: Use structured static_configs with labels - **services/monitoring/rules.yml**: Replace hardcoded instance exclusions with role-based filters - **.claude/skills/observability/SKILL.md**: Document new labels for troubleshooting - **CLAUDE.md**: Document homelab-deploy CLI for prod hosts ## New Labels All Prometheus scrape targets now include: | Label | Description | |-------|-------------| | `hostname` | Short hostname (e.g., `ns1`, `monitoring01`) - always present | | `role` | Host role (e.g., `dns`, `build-host`, `vault`) - when configured | | `tier` | Deployment tier (`test` for test VMs) - when non-default | | `dns_role` | `primary` or `secondary` for DNS servers | ## Example Queries ```promql {hostname="ns1"} # All metrics from ns1 up{role="dns"} # All DNS servers node_load5{role="build-host"} # Build hosts only up{dns_role="primary"} # Primary DNS only ``` ## Alert Rule Updates - `high_cpu_load`: Now uses `role!="build-host"` instead of hardcoded `instance!="nix-cache01..."` - `unbound_low_cache_hit_ratio`: Now filters by `dns_role="primary"` to avoid alerting on secondary DNS ## Verified Deployed to monitoring01 and confirmed labels are working in Prometheus.
torjus added 5 commits 2026-02-07 16:27:05 +00:00
Extract homelab.host metadata (tier, priority, role, labels) from host
configurations and propagate them to Prometheus scrape targets. This
enables semantic alert filtering using labels instead of hardcoded
instance names.

Changes:
- lib/monitoring.nix: Extract host metadata, group targets by labels
- prometheus.nix: Use structured static_configs with labels
- rules.yml: Replace instance filters with role-based filters

Example labels in Prometheus:
- ns1/ns2: role=dns, dns_role=primary/secondary
- nix-cache01: role=build-host
- testvm*: tier=test

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a `hostname` label to all Prometheus scrape targets, making it easy
to query all metrics for a host without wildcarding the instance label.

Example queries:
- {hostname="ns1"} - all metrics from ns1
- node_cpu_seconds_total{hostname="monitoring01"} - specific metric

For external targets (like gunter), the hostname is extracted from the
target string.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
skills: update observability with new target labels
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
b794aa89db
Document the new hostname and host metadata labels available on all
Prometheus scrape targets:
- hostname: short hostname for easy filtering
- role: host role (dns, build-host, vault)
- tier: deployment tier (test for test VMs)
- dns_role: primary/secondary for DNS servers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CLAUDE.md: document homelab-deploy CLI for prod hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Run nix flake check / flake-check (pull_request) Failing after 1s
116abf3bec
Add instructions for deploying to prod hosts using the CLI directly,
since the MCP server only handles test-tier deployments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
torjus merged commit 0b462f0a96 into master 2026-02-07 16:27:38 +00:00
torjus deleted branch prometheus-scrape-target-labels 2026-02-07 16:27:38 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: torjus/nixos-servers#30