Commit Graph

6 Commits

Author SHA1 Message Date
5c111c8d78 unbound: tune timeouts for faster recovery after network outages
Lower infra-host-ttl (900s → 120s) and tcp-reuse-timeout (60s → 15s)
so unbound recovers faster from upstream TLS forwarder failures
instead of staying stuck after ISP outages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 01:53:11 +01:00
bf199bd7c6 ns/resolver: add redundant stub-zone addresses
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Configure Unbound to query both ns1 and ns2 for the home.2rjus.net
zone, in addition to local NSD. This provides redundancy during
bootstrap or if local NSD is temporarily unavailable.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 20:10:17 +01:00
3cccfc0487 monitoring: implement monitoring gaps coverage
Some checks failed
Run nix flake check / flake-check (push) Failing after 7m36s
Add exporters and scrape targets for services lacking monitoring:
- PostgreSQL: postgres-exporter on pgdb1
- Authelia: native telemetry metrics on auth01
- Unbound: unbound-exporter with remote-control on ns1/ns2
- NATS: HTTP monitoring endpoint on nats1
- OpenBao: telemetry config and Prometheus scrape with token auth
- Systemd: systemd-exporter on all hosts for per-service metrics

Add alert rules for postgres, auth (authelia + lldap), jellyfin,
vault (openbao), plus extend existing nats and unbound rules.

Add Terraform config for Prometheus metrics policy and token. The
token is created via vault_token resource and stored in KV, so no
manual token creation is needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 21:44:13 +01:00
e39e3cf0cb Add more dns servers to unbound 2024-06-27 21:19:11 +02:00
e451957df3 Start changing ns stuff to home.2rjus.net 2024-03-12 19:44:41 +01:00
5b838771e3 Improve ns stuff 2024-03-11 21:26:52 +01:00