monitoring: implement monitoring gaps coverage
Some checks failed
Run nix flake check / flake-check (push) Failing after 7m36s

Add exporters and scrape targets for services lacking monitoring:
- PostgreSQL: postgres-exporter on pgdb1
- Authelia: native telemetry metrics on auth01
- Unbound: unbound-exporter with remote-control on ns1/ns2
- NATS: HTTP monitoring endpoint on nats1
- OpenBao: telemetry config and Prometheus scrape with token auth
- Systemd: systemd-exporter on all hosts for per-service metrics

Add alert rules for postgres, auth (authelia + lldap), jellyfin,
vault (openbao), plus extend existing nats and unbound rules.

Add Terraform config for Prometheus metrics policy and token. The
token is created via vault_token resource and stored in KV, so no
manual token creation is needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-05 21:42:38 +01:00
parent 41d4226812
commit 3cccfc0487
12 changed files with 217 additions and 0 deletions

View File

@@ -0,0 +1,21 @@
# Generic policies for services (not host-specific)
resource "vault_policy" "prometheus_metrics" {
name = "prometheus-metrics"
policy = <<EOT
path "sys/metrics" {
capabilities = ["read"]
}
EOT
}
# Long-lived token for Prometheus to scrape OpenBao metrics
resource "vault_token" "prometheus_metrics" {
policies = [vault_policy.prometheus_metrics.name]
ttl = "8760h" # 1 year
renewable = true
metadata = {
purpose = "prometheus-metrics-scraping"
}
}

View File

@@ -92,6 +92,13 @@ locals {
auto_generate = false
data = { token = var.actions_token_1 }
}
# Prometheus OpenBao token for scraping metrics
# Token is created by vault_token.prometheus_metrics in policies.tf
"hosts/monitoring01/openbao-token" = {
auto_generate = false
data = { token = vault_token.prometheus_metrics.client_token }
}
}
}

View File

@@ -51,3 +51,4 @@ variable "actions_token_1" {
type = string
sensitive = true
}