nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	2f89d564f7	vault: add approles for pn01/pn02, fix provision playbook Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Add pn01 and pn02 to hosts-generated.tf for Vault AppRole access. Fix provision-approle.yml: the localhost play was skipped when using -l filter, since localhost didn't match the target. Merged into a single play using delegate_to: localhost for the bao commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:51:56 +01:00
Torjus Håkestad	4f593126c0	monitoring01: remove host and migrate services to monitoring02 Some checks failed Run nix flake check / flake-check (push) Failing after 3m15s Details Run nix flake check / flake-check (pull_request) Failing after 3m8s Details Remove monitoring01 host configuration and unused service modules (prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox, exportarr, and pve exporters to monitoring02 with scrape configs moved to VictoriaMetrics. Update alert rules, terraform vault policies/secrets, http-proxy entries, and documentation to reflect the monitoring02 migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:50:20 +01:00
Torjus Håkestad	a6013d3950	monitoring02: enable alerting and migrate CNAMEs from http-proxy Some checks failed Run nix flake check / flake-check (push) Failing after 6m25s Details Run nix flake check / flake-check (pull_request) Failing after 3m52s Details - Switch vmalert from blackhole mode to sending alerts to local Alertmanager - Import alerttonotify service so alerts route to NATS notifications - Move alertmanager and grafana CNAMEs from http-proxy to monitoring02 - Add monitoring CNAME to monitoring02 - Add Caddy reverse proxy entries for alertmanager and grafana - Remove prometheus, alertmanager, and grafana Caddy entries from http-proxy (now served directly by monitoring02) - Move monitoring02 Vault AppRole to hosts-generated.tf with extra_policies support and prometheus-metrics policy - Update Promtail to use authenticated loki.home.2rjus.net endpoint only (remove unauthenticated monitoring01 client) - Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with basic auth from Vault secret - Move migration plan to completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:23:21 +01:00
Torjus Håkestad	43c81f6688	terraform: fix loki-push policy for generated hosts Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Revert ns1/ns2 from approle.tf (they're in hosts-generated.tf) and add loki-push policy to generated AppRoles instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:13:22 +01:00
Torjus Håkestad	58f901ad3e	terraform: add ns1 and ns2 to AppRole policies Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details They were missing from the host_policies map, so they didn't get shared policies like loki-push. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:10:37 +01:00
Torjus Håkestad	c13921d302	loki: add basic auth for log push and dual-ship promtail Some checks failed Run nix flake check / flake-check (push) Failing after 4m36s Details - Loki bound to localhost, Caddy reverse proxy with basic_auth - Vault secret (shared/loki/push-auth) for password, bcrypt hash generated at boot for Caddy environment - Promtail dual-ships to monitoring01 (direct) and loki.home.2rjus.net (with basic auth), conditional on vault.enable - Terraform: new shared loki-push policy added to all AppRoles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:00:08 +01:00
Torjus Håkestad	a013e80f1a	terraform: grant monitoring02 access to apiary-token secret Some checks failed Run nix flake check / flake-check (push) Failing after 3m59s Details Run nix flake check / flake-check (pull_request) Failing after 4m20s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	1942591d2e	monitoring: add apiary metrics scraping with bearer token auth Some checks failed Run nix flake check / flake-check (push) Failing after 12m52s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:36:26 +01:00
Torjus Håkestad	ffaf95d109	terraform: add Vault secret for garage01 environment Some checks failed Run nix flake check / flake-check (push) Failing after 3m13s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 21:27:43 +01:00
Torjus Håkestad	b2b6ab4799	garage01: add Garage S3 service with Caddy HTTPS proxy Configure Garage object storage on garage01 with S3 API, Vault secrets for RPC secret and admin token, and Caddy reverse proxy for HTTPS access at s3.home.2rjus.net via internal ACME CA. Includes flake entry, VM definition, and Vault policy for the host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 21:24:25 +01:00
Torjus Håkestad	ed1821b073	nix-cache02: add scheduled builds timer Some checks failed Run nix flake check / flake-check (push) Failing after 5m7s Details Periodic flake update / flake-update (push) Successful in 2m18s Details Add a systemd timer that triggers builds for all hosts every 2 hours via NATS, keeping the binary cache warm. - Add scheduler.nix with timer (every 2h) and oneshot service - Add scheduler NATS user to DEPLOY account - Add Vault secret and variable for scheduler NKey - Increase nix-cache02 memory from 16GB to 20GB Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 00:50:09 +01:00
Torjus Håkestad	75210805d5	nix-cache01: decommission and remove all references Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Removed: - hosts/nix-cache01/ directory - services/nix-cache/build-flakes.{nix,sh} (replaced by NATS builder) - Vault secret and AppRole for nix-cache01 - Old signing key variable from terraform - Old trusted public key from system/nix.nix Updated: - flake.nix: removed nixosConfiguration - README.md: nix-cache01 -> nix-cache02 - Monitoring rules: removed build-flakes alerts, updated harmonia to nix-cache02 - Simplified proxy.nix (no longer needs hostname conditional) nix-cache02 is now the sole binary cache host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 23:40:51 +01:00
Torjus Håkestad	751edfc11d	nix-cache02: add Harmonia binary cache service Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details - Parameterize harmonia.nix to use hostname-based Vault paths - Add nix-cache services to nix-cache02 - Add Vault secret and variable for nix-cache02 signing key - Add nix-cache02 public key to trusted-public-keys on all hosts - Update plan doc to remove actions runner references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 23:08:48 +01:00
Torjus Håkestad	98a7301985	nix-cache: remove unused Gitea Actions runner All checks were successful Run nix flake check / flake-check (push) Successful in 2m23s Details The actions runner on nix-cache01 was never actively used. Removing it before migrating to nix-cache02. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 22:57:08 +01:00
Torjus Håkestad	47747329c4	nix-cache02: add homelab-deploy builder service Some checks failed Run nix flake check / flake-check (push) Failing after 4m51s Details - Configure builder to build nixos-servers and nixos (gunter) repos - Add builder NKey to Vault secrets - Update NATS permissions for builder, test-deployer, and admin-deployer - Grant nix-cache02 access to shared homelab-deploy secrets Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 22:26:40 +01:00
Torjus Håkestad	2d9ca2a73f	hosts: add nix-cache02 build host Some checks failed Run nix flake check / flake-check (push) Failing after 16m26s Details New build host to replace nix-cache01 with: - 8 CPU cores, 16GB RAM, 200GB disk - Static IP 10.69.13.25 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 21:53:29 +01:00
Torjus Håkestad	0a28c5f495	terraform: add radarr/sonarr API keys for exportarr Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Add vault secrets for Radarr and Sonarr API keys to enable exportarr metrics collection on monitoring01. - services/exportarr/radarr - Radarr API key - services/exportarr/sonarr - Sonarr API key - Grant monitoring01 access to services/exportarr/* Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:52:34 +01:00
Torjus Håkestad	e85f15b73d	vault: add OpenBao OIDC integration with Kanidm All checks were successful Run nix flake check / flake-check (push) Successful in 2m9s Details Enable Kanidm users to authenticate to OpenBao via OIDC for Web UI access. Members of the admins group get full read/write access to secrets. Changes: - Add OIDC auth backend in Terraform (oidc.tf) - Add oidc-admin and oidc-default policies - Add openbao OAuth2 client to Kanidm - Enable legacy crypto (RS256) for OpenBao compatibility - Allow imperative group membership management in Kanidm Limitations: - CLI login not supported (Kanidm requires HTTPS for confidential client redirects) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 19:42:26 +01:00
Torjus Håkestad	016f8c9119	terraform: add nixos-exporter shared policy Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details - Create shared policy granting all hosts access to nixos-exporter nkey - Add policy to both manual and generated host AppRoles - Remove duplicate kanidm01/monitoring02 entries from hosts-generated.tf Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:04:17 +01:00
Torjus Håkestad	60c04a2052	nixos-exporter: enable NATS cache sharing Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m17s Details Run nix flake check / flake-check (push) Failing after 5m16s Details When one host fetches the latest flake revision, it publishes to NATS and all other hosts receive the update immediately. This reduces redundant nix flake metadata calls across the fleet. - Add nkeys to devshell for key generation - Add nixos-exporter user to NATS HOMELAB account - Add Vault secret for NKey storage - Configure all hosts to use NATS for revision sharing - Update nixos-exporter input to version with NATS support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 23:57:28 +01:00
Torjus Håkestad	030e8518c5	grafana: add Grafana on monitoring02 with Kanidm OIDC Some checks failed Run nix flake check / flake-check (push) Failing after 4m3s Details Deploy Grafana test instance on monitoring02 with: - Kanidm OIDC authentication (admins -> Admin role, others -> Viewer) - PKCE enabled for secure OAuth2 flow (required by Kanidm) - Declarative datasources for Prometheus and Loki on monitoring01 - Local Caddy for TLS termination via internal ACME CA - DNS CNAME grafana-test.home.2rjus.net Terraform changes add OAuth2 client secret and AppRole policies for kanidm01 and monitoring02. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:23:26 +01:00
Torjus Håkestad	0b977808ca	hosts: add monitoring02 configuration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details New test-tier host for monitoring stack expansion with: - Static IP 10.69.13.24 - 4 CPU cores, 4GB RAM, 20GB disk - Vault integration and NATS-based deployment enabled Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 19:19:38 +01:00
Torjus Håkestad	ca0e3fd629	kanidm01: add kanidm authentication server Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - New test-tier VM at 10.69.13.23 with role=auth - Kanidm 1.8 server with HTTPS (443) and LDAPS (636) - ACME certificate from internal CA (auth.home.2rjus.net) - Provisioned groups: admins, users, ssh-users - Provisioned user: torjus - Daily backups at 22:00 (7 versions) - Prometheus monitoring scrape target Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 00:13:59 +01:00
Torjus Håkestad	735b8a9ee3	terraform: add dns and homelab-deploy secrets to ns1 policy Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details ns1 needs access to shared/dns/* for zone transfer key and shared/homelab-deploy/* for the NATS listener. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:33:36 +01:00
Torjus Håkestad	94feae82a0	ns1: recreate with OpenTofu workflow Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Old VM had incorrect hardware-configuration.nix with hardcoded UUIDs that didn't match actual disk layout, causing boot failure (emergency mode). Recreated using template2-based configuration for OpenTofu provisioning. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:18:08 +01:00
Torjus Håkestad	b7e398c9a7	terraform: remove pgdb1 vault approle Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:55:39 +01:00
Torjus Håkestad	536daee4c7	ns2: migrate to OpenTofu management Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - Remove hosts/template/ (legacy template1) and give each legacy host its own hardware-configuration.nix copy - Recreate ns2 using create-host with template2 base - Add secondary DNS services (NSD + Unbound resolver) - Configure Vault policy for shared DNS secrets - Fix create-host IP uniqueness validator to check CIDR notation (prevents false positives from DNS resolver entries) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 19:28:35 +01:00
Torjus Håkestad	979040aaf7	vault01: enable homelab-deploy listener Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Enable vault.enable and homelab.deploy.enable on vault01 so it can receive NATS-based remote deployments. Vault fetches secrets from itself using AppRole after auto-unseal. Add systemd ordering to ensure vault-secret services wait for openbao to be unsealed before attempting to fetch secrets. Also adds vault01 AppRole entry to Terraform. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:55:09 +01:00
Torjus Håkestad	38348c5980	vault: add homelab-deploy policy to generated hosts Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details The homelab-deploy listener requires access to shared/homelab-deploy/* secrets. Update hosts-generated.tf and the generator script to include this policy automatically. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 14:05:42 +01:00
Torjus Håkestad	7bc465b414	hosts: add testvm01, testvm02, testvm03 test hosts Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Three permanent test hosts for validating deployment and bootstrapping workflow. Each host configured with: - Static IP (10.69.13.20-22/24) - Vault AppRole integration - Bootstrap from deploy-test-hosts branch Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 13:34:16 +01:00
Torjus Håkestad	03e70ac094	hosts: remove vaulttest01 Test host no longer needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 12:55:38 +01:00
Torjus Håkestad	c214f8543c	homelab: add deploy.enable option with assertion All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Run nix flake check / flake-check (pull_request) Successful in 2m7s Details - Add homelab.deploy.enable option (requires vault.enable) - Create shared homelab-deploy Vault policy for all hosts - Enable homelab.deploy on all vault-enabled hosts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 06:54:42 +01:00
Torjus Håkestad	7933127d77	system: enable homelab-deploy listener for all vault hosts Add system/homelab-deploy.nix module that automatically enables the listener on all hosts with vault.enable=true. Uses homelab.host.tier and homelab.host.role for NATS subject subscriptions. - Add homelab-deploy access to all host AppRole policies - Remove manual listener config from vaulttest01 (now handled by system module) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 06:54:42 +01:00
Torjus Håkestad	ad8570f8db	homelab-deploy: add NATS-based deployment system Some checks failed Run nix flake check / flake-check (push) Failing after 3m45s Details Add homelab-deploy flake input and NixOS module for message-based deployments across the fleet. Configure DEPLOY account in NATS with tiered access control (listener, test-deployer, admin-deployer). Enable listener on vaulttest01 as initial test host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 05:22:06 +01:00
Torjus Håkestad	e9857afc11	monitoring: use AppRole token for OpenBao metrics scraping All checks were successful Run nix flake check / flake-check (push) Successful in 2m12s Details Run nix flake check / flake-check (pull_request) Successful in 2m19s Details Instead of creating a long-lived Vault token in Terraform (which gets invalidated when Terraform recreates it), monitoring01 now uses its existing AppRole credentials to fetch a fresh token for Prometheus. Changes: - Add prometheus-metrics policy to monitoring01's AppRole - Remove vault_token.prometheus_metrics resource from Terraform - Remove openbao-token KV secret from Terraform - Add systemd service to fetch AppRole token on boot - Add systemd timer to refresh token every 30 minutes This ensures Prometheus always has a valid token without depending on Terraform state or manual intervention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 23:51:11 +01:00
Torjus Håkestad	3cccfc0487	monitoring: implement monitoring gaps coverage Some checks failed Run nix flake check / flake-check (push) Failing after 7m36s Details Add exporters and scrape targets for services lacking monitoring: - PostgreSQL: postgres-exporter on pgdb1 - Authelia: native telemetry metrics on auth01 - Unbound: unbound-exporter with remote-control on ns1/ns2 - NATS: HTTP monitoring endpoint on nats1 - OpenBao: telemetry config and Prometheus scrape with token auth - Systemd: systemd-exporter on all hosts for per-service metrics Add alert rules for postgres, auth (authelia + lldap), jellyfin, vault (openbao), plus extend existing nats and unbound rules. Add Terraform config for Prometheus metrics policy and token. The token is created via vault_token resource and stored in KV, so no manual token creation is needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 21:44:13 +01:00
Torjus Håkestad	ccb1c3fe2e	terraform: auto-generate backup password instead of manual All checks were successful Run nix flake check / flake-check (push) Successful in 2m19s Details Remove backup_helper_secret variable and switch shared/backup/password to auto_generate. New password will be added alongside existing restic repository key. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 18:58:39 +01:00
Torjus Håkestad	0700033c0a	secrets: migrate all hosts from sops to OpenBao vault Replace sops-nix secrets with OpenBao vault secrets across all hosts. Hardcode root password hash, add extractKey option to vault-secrets module, update Terraform with secrets/policies for all hosts, and create AppRole provisioning playbook. Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01 Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 18:43:09 +01:00
Torjus Håkestad	7ae474fd3e	pki: add new vault root ca to pki	2026-02-03 06:53:59 +01:00
Torjus Håkestad	01d4812280	vault: implement bootstrap integration Some checks failed Run nix flake check / flake-check (push) Successful in 2m31s Details Run nix flake check / flake-check (pull_request) Failing after 14m16s Details	2026-02-03 01:10:36 +01:00
Torjus Håkestad	3f2f91aedd	terraform: add vault pki management to terraform Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details	2026-02-01 23:23:03 +01:00
Torjus Håkestad	5d513fd5af	terraform: add vault secret managment to terraform	2026-02-01 23:07:47 +01:00

42 Commits