Commit Graph

5 Commits

Author SHA1 Message Date
75210805d5 nix-cache01: decommission and remove all references
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Removed:
- hosts/nix-cache01/ directory
- services/nix-cache/build-flakes.{nix,sh} (replaced by NATS builder)
- Vault secret and AppRole for nix-cache01
- Old signing key variable from terraform
- Old trusted public key from system/nix.nix

Updated:
- flake.nix: removed nixosConfiguration
- README.md: nix-cache01 -> nix-cache02
- Monitoring rules: removed build-flakes alerts, updated harmonia to nix-cache02
- Simplified proxy.nix (no longer needs hostname conditional)

nix-cache02 is now the sole binary cache host.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-10 23:40:51 +01:00
9bd48e0808 monitoring: explicitly list valid HTTP status codes
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m6s
Empty valid_status_codes defaults to 2xx only, not "any".
Explicitly list common status codes (2xx, 3xx, 4xx, 5xx) so
services returning 400/401 like ha and nzbget pass the probe.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 22:41:47 +01:00
d1b0a5dc20 monitoring: accept any HTTP status in TLS probe
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Only care about TLS handshake success for certificate monitoring.
Services like nzbget (401) and ha (400) return non-2xx but have
valid certificates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 22:33:45 +01:00
4d32707130 monitoring: remove duplicate rules from blackbox.nix
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m7s
The rules were already added to rules.yml but the blackbox.nix file
still had them, causing duplicate 'groups' key errors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 22:28:42 +01:00
75e4fb61a5 monitoring: add blackbox exporter for TLS certificate monitoring
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m6s
Add blackbox exporter to monitoring01 to probe TLS endpoints and alert
on expiring certificates. Monitors all ACME-managed certificates from
OpenBao PKI including Caddy auto-TLS services.

Alerts:
- tls_certificate_expiring_soon (< 7 days, warning)
- tls_certificate_expiring_critical (< 24h, critical)
- tls_probe_failed (connectivity issues)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 22:21:42 +01:00