monitoring: implement monitoring gaps coverage
Some checks failed
Run nix flake check / flake-check (push) Failing after 7m36s

Add exporters and scrape targets for services lacking monitoring:
- PostgreSQL: postgres-exporter on pgdb1
- Authelia: native telemetry metrics on auth01
- Unbound: unbound-exporter with remote-control on ns1/ns2
- NATS: HTTP monitoring endpoint on nats1
- OpenBao: telemetry config and Prometheus scrape with token auth
- Systemd: systemd-exporter on all hosts for per-service metrics

Add alert rules for postgres, auth (authelia + lldap), jellyfin,
vault (openbao), plus extend existing nats and unbound rules.

Add Terraform config for Prometheus metrics policy and token. The
token is created via vault_token resource and stored in KV, so no
manual token creation is needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-05 21:42:38 +01:00
parent 41d4226812
commit 3cccfc0487
12 changed files with 217 additions and 0 deletions

View File

@@ -1,10 +1,24 @@
{ pkgs, ... }: {
homelab.monitoring.scrapeTargets = [{
job_name = "unbound";
port = 9167;
}];
networking.firewall.allowedTCPPorts = [
53
];
networking.firewall.allowedUDPPorts = [
53
];
services.prometheus.exporters.unbound = {
enable = true;
unbound.host = "unix:///run/unbound/unbound.ctl";
};
# Grant exporter access to unbound socket
systemd.services.prometheus-unbound-exporter.serviceConfig.SupplementaryGroups = [ "unbound" ];
services.unbound = {
enable = true;
@@ -23,6 +37,11 @@
do-ip6 = "no";
do-udp = "yes";
do-tcp = "yes";
extended-statistics = true;
};
remote-control = {
control-enable = true;
control-interface = "/run/unbound/unbound.ctl";
};
stub-zone = {
name = "home.2rjus.net";