nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	4f593126c0	monitoring01: remove host and migrate services to monitoring02 Some checks failed Run nix flake check / flake-check (push) Failing after 3m15s Details Run nix flake check / flake-check (pull_request) Failing after 3m8s Details Remove monitoring01 host configuration and unused service modules (prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox, exportarr, and pve exporters to monitoring02 with scrape configs moved to VictoriaMetrics. Update alert rules, terraform vault policies/secrets, http-proxy entries, and documentation to reflect the monitoring02 migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:50:20 +01:00
Torjus Håkestad	a6013d3950	monitoring02: enable alerting and migrate CNAMEs from http-proxy Some checks failed Run nix flake check / flake-check (push) Failing after 6m25s Details Run nix flake check / flake-check (pull_request) Failing after 3m52s Details - Switch vmalert from blackhole mode to sending alerts to local Alertmanager - Import alerttonotify service so alerts route to NATS notifications - Move alertmanager and grafana CNAMEs from http-proxy to monitoring02 - Add monitoring CNAME to monitoring02 - Add Caddy reverse proxy entries for alertmanager and grafana - Remove prometheus, alertmanager, and grafana Caddy entries from http-proxy (now served directly by monitoring02) - Move monitoring02 Vault AppRole to hosts-generated.tf with extra_policies support and prometheus-metrics policy - Update Promtail to use authenticated loki.home.2rjus.net endpoint only (remove unauthenticated monitoring01 client) - Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with basic auth from Vault secret - Move migration plan to completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:23:21 +01:00
Torjus Håkestad	43c81f6688	terraform: fix loki-push policy for generated hosts Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Revert ns1/ns2 from approle.tf (they're in hosts-generated.tf) and add loki-push policy to generated AppRoles instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:13:22 +01:00
Torjus Håkestad	58f901ad3e	terraform: add ns1 and ns2 to AppRole policies Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details They were missing from the host_policies map, so they didn't get shared policies like loki-push. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:10:37 +01:00
Torjus Håkestad	c13921d302	loki: add basic auth for log push and dual-ship promtail Some checks failed Run nix flake check / flake-check (push) Failing after 4m36s Details - Loki bound to localhost, Caddy reverse proxy with basic_auth - Vault secret (shared/loki/push-auth) for password, bcrypt hash generated at boot for Caddy environment - Promtail dual-ships to monitoring01 (direct) and loki.home.2rjus.net (with basic auth), conditional on vault.enable - Terraform: new shared loki-push policy added to all AppRoles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:00:08 +01:00
Torjus Håkestad	a013e80f1a	terraform: grant monitoring02 access to apiary-token secret Some checks failed Run nix flake check / flake-check (push) Failing after 3m59s Details Run nix flake check / flake-check (pull_request) Failing after 4m20s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	75210805d5	nix-cache01: decommission and remove all references Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Removed: - hosts/nix-cache01/ directory - services/nix-cache/build-flakes.{nix,sh} (replaced by NATS builder) - Vault secret and AppRole for nix-cache01 - Old signing key variable from terraform - Old trusted public key from system/nix.nix Updated: - flake.nix: removed nixosConfiguration - README.md: nix-cache01 -> nix-cache02 - Monitoring rules: removed build-flakes alerts, updated harmonia to nix-cache02 - Simplified proxy.nix (no longer needs hostname conditional) nix-cache02 is now the sole binary cache host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 23:40:51 +01:00
Torjus Håkestad	0a28c5f495	terraform: add radarr/sonarr API keys for exportarr Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Add vault secrets for Radarr and Sonarr API keys to enable exportarr metrics collection on monitoring01. - services/exportarr/radarr - Radarr API key - services/exportarr/sonarr - Sonarr API key - Grant monitoring01 access to services/exportarr/* Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:52:34 +01:00
Torjus Håkestad	e85f15b73d	vault: add OpenBao OIDC integration with Kanidm All checks were successful Run nix flake check / flake-check (push) Successful in 2m9s Details Enable Kanidm users to authenticate to OpenBao via OIDC for Web UI access. Members of the admins group get full read/write access to secrets. Changes: - Add OIDC auth backend in Terraform (oidc.tf) - Add oidc-admin and oidc-default policies - Add openbao OAuth2 client to Kanidm - Enable legacy crypto (RS256) for OpenBao compatibility - Allow imperative group membership management in Kanidm Limitations: - CLI login not supported (Kanidm requires HTTPS for confidential client redirects) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 19:42:26 +01:00
Torjus Håkestad	016f8c9119	terraform: add nixos-exporter shared policy Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details - Create shared policy granting all hosts access to nixos-exporter nkey - Add policy to both manual and generated host AppRoles - Remove duplicate kanidm01/monitoring02 entries from hosts-generated.tf Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:04:17 +01:00
Torjus Håkestad	030e8518c5	grafana: add Grafana on monitoring02 with Kanidm OIDC Some checks failed Run nix flake check / flake-check (push) Failing after 4m3s Details Deploy Grafana test instance on monitoring02 with: - Kanidm OIDC authentication (admins -> Admin role, others -> Viewer) - PKCE enabled for secure OAuth2 flow (required by Kanidm) - Declarative datasources for Prometheus and Loki on monitoring01 - Local Caddy for TLS termination via internal ACME CA - DNS CNAME grafana-test.home.2rjus.net Terraform changes add OAuth2 client secret and AppRole policies for kanidm01 and monitoring02. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:23:26 +01:00
Torjus Håkestad	94feae82a0	ns1: recreate with OpenTofu workflow Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Old VM had incorrect hardware-configuration.nix with hardcoded UUIDs that didn't match actual disk layout, causing boot failure (emergency mode). Recreated using template2-based configuration for OpenTofu provisioning. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:18:08 +01:00
Torjus Håkestad	b7e398c9a7	terraform: remove pgdb1 vault approle Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:55:39 +01:00
Torjus Håkestad	536daee4c7	ns2: migrate to OpenTofu management Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - Remove hosts/template/ (legacy template1) and give each legacy host its own hardware-configuration.nix copy - Recreate ns2 using create-host with template2 base - Add secondary DNS services (NSD + Unbound resolver) - Configure Vault policy for shared DNS secrets - Fix create-host IP uniqueness validator to check CIDR notation (prevents false positives from DNS resolver entries) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 19:28:35 +01:00
Torjus Håkestad	979040aaf7	vault01: enable homelab-deploy listener Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Enable vault.enable and homelab.deploy.enable on vault01 so it can receive NATS-based remote deployments. Vault fetches secrets from itself using AppRole after auto-unseal. Add systemd ordering to ensure vault-secret services wait for openbao to be unsealed before attempting to fetch secrets. Also adds vault01 AppRole entry to Terraform. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:55:09 +01:00
Torjus Håkestad	03e70ac094	hosts: remove vaulttest01 Test host no longer needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 12:55:38 +01:00
Torjus Håkestad	c214f8543c	homelab: add deploy.enable option with assertion All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Run nix flake check / flake-check (pull_request) Successful in 2m7s Details - Add homelab.deploy.enable option (requires vault.enable) - Create shared homelab-deploy Vault policy for all hosts - Enable homelab.deploy on all vault-enabled hosts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 06:54:42 +01:00
Torjus Håkestad	7933127d77	system: enable homelab-deploy listener for all vault hosts Add system/homelab-deploy.nix module that automatically enables the listener on all hosts with vault.enable=true. Uses homelab.host.tier and homelab.host.role for NATS subject subscriptions. - Add homelab-deploy access to all host AppRole policies - Remove manual listener config from vaulttest01 (now handled by system module) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 06:54:42 +01:00
Torjus Håkestad	ad8570f8db	homelab-deploy: add NATS-based deployment system Some checks failed Run nix flake check / flake-check (push) Failing after 3m45s Details Add homelab-deploy flake input and NixOS module for message-based deployments across the fleet. Configure DEPLOY account in NATS with tiered access control (listener, test-deployer, admin-deployer). Enable listener on vaulttest01 as initial test host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 05:22:06 +01:00
Torjus Håkestad	e9857afc11	monitoring: use AppRole token for OpenBao metrics scraping All checks were successful Run nix flake check / flake-check (push) Successful in 2m12s Details Run nix flake check / flake-check (pull_request) Successful in 2m19s Details Instead of creating a long-lived Vault token in Terraform (which gets invalidated when Terraform recreates it), monitoring01 now uses its existing AppRole credentials to fetch a fresh token for Prometheus. Changes: - Add prometheus-metrics policy to monitoring01's AppRole - Remove vault_token.prometheus_metrics resource from Terraform - Remove openbao-token KV secret from Terraform - Add systemd service to fetch AppRole token on boot - Add systemd timer to refresh token every 30 minutes This ensures Prometheus always has a valid token without depending on Terraform state or manual intervention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 23:51:11 +01:00
Torjus Håkestad	0700033c0a	secrets: migrate all hosts from sops to OpenBao vault Replace sops-nix secrets with OpenBao vault secrets across all hosts. Hardcode root password hash, add extractKey option to vault-secrets module, update Terraform with secrets/policies for all hosts, and create AppRole provisioning playbook. Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01 Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 18:43:09 +01:00
Torjus Håkestad	5d513fd5af	terraform: add vault secret managment to terraform	2026-02-01 23:07:47 +01:00

22 Commits