4.7 KiB
4.7 KiB
Homelab Infrastructure Exporter
Overview
Build a Prometheus exporter for metrics specific to our homelab infrastructure. Unlike the generic nixos-exporter, this covers services and patterns unique to our environment.
Current State
Existing Exporters
- node-exporter (all hosts): System metrics
- systemd-exporter (all hosts): Service restart counts, IP accounting
- labmon (monitoring01): TLS certificate monitoring, step-ca health
- Service-specific: unbound, postgres, nats, jellyfin, home-assistant, caddy, step-ca
Gaps
- No visibility into Vault/OpenBao lease expiry
- No ACME certificate expiry from internal CA
- No Proxmox guest agent metrics from inside VMs
Metrics
Vault/OpenBao Metrics
| Metric | Description | Source |
|---|---|---|
homelab_vault_token_expiry_seconds |
Seconds until AppRole token expires | Token metadata or lease file |
homelab_vault_token_renewable |
1 if token is renewable | Token metadata |
Labels: role (AppRole name)
ACME Certificate Metrics
| Metric | Description | Source |
|---|---|---|
homelab_acme_cert_expiry_seconds |
Seconds until certificate expires | Parse cert from /var/lib/acme/*/cert.pem |
homelab_acme_cert_not_after |
Unix timestamp of cert expiry | Certificate NotAfter field |
Labels: domain, issuer
Note: labmon already monitors external TLS endpoints. This covers local ACME-managed certs.
Proxmox Guest Metrics (future)
| Metric | Description | Source |
|---|---|---|
homelab_proxmox_guest_info |
Info gauge with VM ID, name | QEMU guest agent |
homelab_proxmox_guest_agent_running |
1 if guest agent is responsive | Agent ping |
DNS Zone Metrics (future)
| Metric | Description | Source |
|---|---|---|
homelab_dns_zone_serial |
Current zone serial number | DNS AXFR or zone file |
Labels: zone
Architecture
Single binary with collectors enabled via config. Runs on hosts that need specific collectors.
homelab-exporter
├── main.go
├── collector/
│ ├── vault.go # Vault/OpenBao token metrics
│ ├── acme.go # ACME certificate metrics
│ └── proxmox.go # Proxmox guest agent (future)
└── config/
└── config.go
Configuration
listen_addr: ":9970"
collectors:
vault:
enabled: true
token_path: "/var/lib/vault/token"
acme:
enabled: true
cert_dirs:
- "/var/lib/acme"
proxmox:
enabled: false
NixOS Module
services.homelab-exporter = {
enable = true;
port = 9970;
collectors = {
vault = {
enable = true;
tokenPath = "/var/lib/vault/token";
};
acme = {
enable = true;
certDirs = [ "/var/lib/acme" ];
};
};
};
# Auto-register scrape target
homelab.monitoring.scrapeTargets = [{
job_name = "homelab-exporter";
port = 9970;
}];
Integration
Deployment
Deploy on hosts that have relevant data:
- All hosts with ACME certs: acme collector
- All hosts with Vault: vault collector
- Proxmox VMs: proxmox collector (when implemented)
Relationship with nixos-exporter
These are complementary:
- nixos-exporter (port 9971): Generic NixOS metrics, deploy everywhere
- homelab-exporter (port 9970): Infrastructure-specific, deploy selectively
Both can run on the same host if needed.
Implementation
Language
Go - consistent with labmon and nixos-exporter.
Phase 1: Core + ACME
- Create git repository (git.t-juice.club/torjus/homelab-exporter)
- Implement ACME certificate collector
- HTTP server with
/metrics - NixOS module
Phase 2: Vault Collector
- Implement token expiry detection
- Handle missing/expired tokens gracefully
Phase 3: Dashboard
- Create Grafana dashboard for infrastructure health
- Add to existing monitoring service module
Alert Examples
- alert: VaultTokenExpiringSoon
expr: homelab_vault_token_expiry_seconds < 3600
for: 5m
labels:
severity: warning
annotations:
summary: "Vault token on {{ $labels.instance }} expires in < 1 hour"
- alert: ACMECertExpiringSoon
expr: homelab_acme_cert_expiry_seconds < 7 * 24 * 3600
for: 1h
labels:
severity: warning
annotations:
summary: "ACME cert {{ $labels.domain }} on {{ $labels.instance }} expires in < 7 days"
Open Questions
- How to read Vault token expiry without re-authenticating?
- Should ACME collector also check key/cert match?
Notes
- Port 9970 (labmon uses 9969, nixos-exporter will use 9971)
- Keep infrastructure-specific logic here, generic NixOS stuff in nixos-exporter
- Consider merging Proxmox metrics with pve-exporter if overlap is significant