The tcp-reuse-timeout=15 and infra-host-ttl=120 changes from 5c111c8
caused unbound to fail resolving external domains via DNS-over-TLS.
Reverting to defaults (tcp-reuse-timeout=60, infra-host-ttl=900).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lower infra-host-ttl (900s → 120s) and tcp-reuse-timeout (60s → 15s)
so unbound recovers faster from upstream TLS forwarder failures
instead of staying stuck after ISP outages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configure Unbound to query both ns1 and ns2 for the home.2rjus.net
zone, in addition to local NSD. This provides redundancy during
bootstrap or if local NSD is temporarily unavailable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add exporters and scrape targets for services lacking monitoring:
- PostgreSQL: postgres-exporter on pgdb1
- Authelia: native telemetry metrics on auth01
- Unbound: unbound-exporter with remote-control on ns1/ns2
- NATS: HTTP monitoring endpoint on nats1
- OpenBao: telemetry config and Prometheus scrape with token auth
- Systemd: systemd-exporter on all hosts for per-service metrics
Add alert rules for postgres, auth (authelia + lldap), jellyfin,
vault (openbao), plus extend existing nats and unbound rules.
Add Terraform config for Prometheus metrics policy and token. The
token is created via vault_token resource and stored in KV, so no
manual token creation is needed.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>