Files
nixos-servers/docs/plans/homelab-exporter.md

4.7 KiB

Homelab Infrastructure Exporter

Overview

Build a Prometheus exporter for metrics specific to our homelab infrastructure. Unlike the generic nixos-exporter, this covers services and patterns unique to our environment.

Current State

Existing Exporters

  • node-exporter (all hosts): System metrics
  • systemd-exporter (all hosts): Service restart counts, IP accounting
  • labmon (monitoring01): TLS certificate monitoring, step-ca health
  • Service-specific: unbound, postgres, nats, jellyfin, home-assistant, caddy, step-ca

Gaps

  • No visibility into Vault/OpenBao lease expiry
  • No ACME certificate expiry from internal CA
  • No Proxmox guest agent metrics from inside VMs

Metrics

Vault/OpenBao Metrics

Metric Description Source
homelab_vault_token_expiry_seconds Seconds until AppRole token expires Token metadata or lease file
homelab_vault_token_renewable 1 if token is renewable Token metadata

Labels: role (AppRole name)

ACME Certificate Metrics

Metric Description Source
homelab_acme_cert_expiry_seconds Seconds until certificate expires Parse cert from /var/lib/acme/*/cert.pem
homelab_acme_cert_not_after Unix timestamp of cert expiry Certificate NotAfter field

Labels: domain, issuer

Note: labmon already monitors external TLS endpoints. This covers local ACME-managed certs.

Proxmox Guest Metrics (future)

Metric Description Source
homelab_proxmox_guest_info Info gauge with VM ID, name QEMU guest agent
homelab_proxmox_guest_agent_running 1 if guest agent is responsive Agent ping

DNS Zone Metrics (future)

Metric Description Source
homelab_dns_zone_serial Current zone serial number DNS AXFR or zone file

Labels: zone

Architecture

Single binary with collectors enabled via config. Runs on hosts that need specific collectors.

homelab-exporter
├── main.go
├── collector/
│   ├── vault.go     # Vault/OpenBao token metrics
│   ├── acme.go      # ACME certificate metrics
│   └── proxmox.go   # Proxmox guest agent (future)
└── config/
    └── config.go

Configuration

listen_addr: ":9970"
collectors:
  vault:
    enabled: true
    token_path: "/var/lib/vault/token"
  acme:
    enabled: true
    cert_dirs:
      - "/var/lib/acme"
  proxmox:
    enabled: false

NixOS Module

services.homelab-exporter = {
  enable = true;
  port = 9970;
  collectors = {
    vault = {
      enable = true;
      tokenPath = "/var/lib/vault/token";
    };
    acme = {
      enable = true;
      certDirs = [ "/var/lib/acme" ];
    };
  };
};

# Auto-register scrape target
homelab.monitoring.scrapeTargets = [{
  job_name = "homelab-exporter";
  port = 9970;
}];

Integration

Deployment

Deploy on hosts that have relevant data:

  • All hosts with ACME certs: acme collector
  • All hosts with Vault: vault collector
  • Proxmox VMs: proxmox collector (when implemented)

Relationship with nixos-exporter

These are complementary:

  • nixos-exporter (port 9971): Generic NixOS metrics, deploy everywhere
  • homelab-exporter (port 9970): Infrastructure-specific, deploy selectively

Both can run on the same host if needed.

Implementation

Language

Go - consistent with labmon and nixos-exporter.

Phase 1: Core + ACME

  1. Create git repository (git.t-juice.club/torjus/homelab-exporter)
  2. Implement ACME certificate collector
  3. HTTP server with /metrics
  4. NixOS module

Phase 2: Vault Collector

  1. Implement token expiry detection
  2. Handle missing/expired tokens gracefully

Phase 3: Dashboard

  1. Create Grafana dashboard for infrastructure health
  2. Add to existing monitoring service module

Alert Examples

- alert: VaultTokenExpiringSoon
  expr: homelab_vault_token_expiry_seconds < 3600
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Vault token on {{ $labels.instance }} expires in < 1 hour"

- alert: ACMECertExpiringSoon
  expr: homelab_acme_cert_expiry_seconds < 7 * 24 * 3600
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "ACME cert {{ $labels.domain }} on {{ $labels.instance }} expires in < 7 days"

Open Questions

  • How to read Vault token expiry without re-authenticating?
  • Should ACME collector also check key/cert match?

Notes

  • Port 9970 (labmon uses 9969, nixos-exporter will use 9971)
  • Keep infrastructure-specific logic here, generic NixOS stuff in nixos-exporter
  • Consider merging Proxmox metrics with pve-exporter if overlap is significant