# Homelab Infrastructure Exporter ## Overview Build a Prometheus exporter for metrics specific to our homelab infrastructure. Unlike the generic nixos-exporter, this covers services and patterns unique to our environment. ## Current State ### Existing Exporters - **node-exporter** (all hosts): System metrics - **systemd-exporter** (all hosts): Service restart counts, IP accounting - **labmon** (monitoring01): TLS certificate monitoring, step-ca health - **Service-specific**: unbound, postgres, nats, jellyfin, home-assistant, caddy, step-ca ### Gaps - No visibility into Vault/OpenBao lease expiry - No ACME certificate expiry from internal CA - No Proxmox guest agent metrics from inside VMs ## Metrics ### Vault/OpenBao Metrics | Metric | Description | Source | |--------|-------------|--------| | `homelab_vault_token_expiry_seconds` | Seconds until AppRole token expires | Token metadata or lease file | | `homelab_vault_token_renewable` | 1 if token is renewable | Token metadata | Labels: `role` (AppRole name) ### ACME Certificate Metrics | Metric | Description | Source | |--------|-------------|--------| | `homelab_acme_cert_expiry_seconds` | Seconds until certificate expires | Parse cert from `/var/lib/acme/*/cert.pem` | | `homelab_acme_cert_not_after` | Unix timestamp of cert expiry | Certificate NotAfter field | Labels: `domain`, `issuer` Note: labmon already monitors external TLS endpoints. This covers local ACME-managed certs. ### Proxmox Guest Metrics (future) | Metric | Description | Source | |--------|-------------|--------| | `homelab_proxmox_guest_info` | Info gauge with VM ID, name | QEMU guest agent | | `homelab_proxmox_guest_agent_running` | 1 if guest agent is responsive | Agent ping | ### DNS Zone Metrics (future) | Metric | Description | Source | |--------|-------------|--------| | `homelab_dns_zone_serial` | Current zone serial number | DNS AXFR or zone file | Labels: `zone` ## Architecture Single binary with collectors enabled via config. Runs on hosts that need specific collectors. ``` homelab-exporter ├── main.go ├── collector/ │ ├── vault.go # Vault/OpenBao token metrics │ ├── acme.go # ACME certificate metrics │ └── proxmox.go # Proxmox guest agent (future) └── config/ └── config.go ``` ## Configuration ```yaml listen_addr: ":9970" collectors: vault: enabled: true token_path: "/var/lib/vault/token" acme: enabled: true cert_dirs: - "/var/lib/acme" proxmox: enabled: false ``` ## NixOS Module ```nix services.homelab-exporter = { enable = true; port = 9970; collectors = { vault = { enable = true; tokenPath = "/var/lib/vault/token"; }; acme = { enable = true; certDirs = [ "/var/lib/acme" ]; }; }; }; # Auto-register scrape target homelab.monitoring.scrapeTargets = [{ job_name = "homelab-exporter"; port = 9970; }]; ``` ## Integration ### Deployment Deploy on hosts that have relevant data: - **All hosts with ACME certs**: acme collector - **All hosts with Vault**: vault collector - **Proxmox VMs**: proxmox collector (when implemented) ### Relationship with nixos-exporter These are complementary: - **nixos-exporter** (port 9971): Generic NixOS metrics, deploy everywhere - **homelab-exporter** (port 9970): Infrastructure-specific, deploy selectively Both can run on the same host if needed. ## Implementation ### Language Go - consistent with labmon and nixos-exporter. ### Phase 1: Core + ACME 1. Create git repository (git.t-juice.club/torjus/homelab-exporter) 2. Implement ACME certificate collector 3. HTTP server with `/metrics` 4. NixOS module ### Phase 2: Vault Collector 1. Implement token expiry detection 2. Handle missing/expired tokens gracefully ### Phase 3: Dashboard 1. Create Grafana dashboard for infrastructure health 2. Add to existing monitoring service module ## Alert Examples ```yaml - alert: VaultTokenExpiringSoon expr: homelab_vault_token_expiry_seconds < 3600 for: 5m labels: severity: warning annotations: summary: "Vault token on {{ $labels.instance }} expires in < 1 hour" - alert: ACMECertExpiringSoon expr: homelab_acme_cert_expiry_seconds < 7 * 24 * 3600 for: 1h labels: severity: warning annotations: summary: "ACME cert {{ $labels.domain }} on {{ $labels.instance }} expires in < 7 days" ``` ## Open Questions - [ ] How to read Vault token expiry without re-authenticating? - [ ] Should ACME collector also check key/cert match? ## Notes - Port 9970 (labmon uses 9969, nixos-exporter will use 9971) - Keep infrastructure-specific logic here, generic NixOS stuff in nixos-exporter - Consider merging Proxmox metrics with pve-exporter if overlap is significant