monitoring: auto-generate Prometheus scrape targets from host configs #16
Reference in New Issue
Block a user
Delete Branch "monitoring-improvements"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
homelab.monitoringNixOS options module (enable,scrapeTargets) following the same pattern ashomelab.dnslib/monitoring.nixlibrary functions to auto-generate Prometheus scrape targets from flake host configurationsservices/monitoring/external-targets.nixfor non-flake hosts (gunter, restic)scrapeTargetsin service modules: step-ca, home-assistant, caddy, jellyfin, nix-cache caddy, wireguard exporterAlert rules cleanup
snake_case(SmartCriticalWarning→smart_critical_warning, etc.)zigbee2qmtt_downtypo →zigbee2mqtt_downpushgateway_not_runningalertfor: 5mclauses to allmonitoring_rulesalertswireguard_handshake_timeoutNew alerts
certificate_expiring_sooncertificate_check_errorstep_ca_certificate_expiringpve_node_downpve_guest_stoppedcaddy_upstream_unhealthycaddy_high_error_ratesmartctl_high_temperaturefilesystem_filling_upsystemd_not_runninghigh_file_descriptorshost_rebootNode-exporter target coverage
Previously 10 hardcoded + 1 external. Now auto-discovers 17 flake hosts + 1 external, adding: auth01, media1, ns3, ns4, vault01, vaulttest01, nixos-test1.
Test plan
nix build .#nixosConfigurations.monitoring01.config.system.build.toplevel— Prometheus config and rules validation passnix buildfor ca, ha1, http-proxy, jelly01, nix-cache01 — all hosts with newscrapeTargetsbuild successfully/targets)