nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	4f593126c0	monitoring01: remove host and migrate services to monitoring02 Some checks failed Run nix flake check / flake-check (push) Failing after 3m15s Details Run nix flake check / flake-check (pull_request) Failing after 3m8s Details Remove monitoring01 host configuration and unused service modules (prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox, exportarr, and pve exporters to monitoring02 with scrape configs moved to VictoriaMetrics. Update alert rules, terraform vault policies/secrets, http-proxy entries, and documentation to reflect the monitoring02 migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:50:20 +01:00
Torjus Håkestad	75210805d5	nix-cache01: decommission and remove all references Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Removed: - hosts/nix-cache01/ directory - services/nix-cache/build-flakes.{nix,sh} (replaced by NATS builder) - Vault secret and AppRole for nix-cache01 - Old signing key variable from terraform - Old trusted public key from system/nix.nix Updated: - flake.nix: removed nixosConfiguration - README.md: nix-cache01 -> nix-cache02 - Monitoring rules: removed build-flakes alerts, updated harmonia to nix-cache02 - Simplified proxy.nix (no longer needs hostname conditional) nix-cache02 is now the sole binary cache host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 23:40:51 +01:00
Torjus Håkestad	9bd48e0808	monitoring: explicitly list valid HTTP status codes All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Empty valid_status_codes defaults to 2xx only, not "any". Explicitly list common status codes (2xx, 3xx, 4xx, 5xx) so services returning 400/401 like ha and nzbget pass the probe. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:41:47 +01:00
Torjus Håkestad	d1b0a5dc20	monitoring: accept any HTTP status in TLS probe Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Only care about TLS handshake success for certificate monitoring. Services like nzbget (401) and ha (400) return non-2xx but have valid certificates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:33:45 +01:00
Torjus Håkestad	4d32707130	monitoring: remove duplicate rules from blackbox.nix All checks were successful Run nix flake check / flake-check (push) Successful in 2m7s Details The rules were already added to rules.yml but the blackbox.nix file still had them, causing duplicate 'groups' key errors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:28:42 +01:00
Torjus Håkestad	75e4fb61a5	monitoring: add blackbox exporter for TLS certificate monitoring All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Add blackbox exporter to monitoring01 to probe TLS endpoints and alert on expiring certificates. Monitors all ACME-managed certificates from OpenBao PKI including Caddy auto-TLS services. Alerts: - tls_certificate_expiring_soon (< 7 days, warning) - tls_certificate_expiring_critical (< 24h, critical) - tls_probe_failed (connectivity issues) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:21:42 +01:00

6 Commits