- Switch vmalert from blackhole mode to sending alerts to local
Alertmanager
- Import alerttonotify service so alerts route to NATS notifications
- Move alertmanager and grafana CNAMEs from http-proxy to monitoring02
- Add monitoring CNAME to monitoring02
- Add Caddy reverse proxy entries for alertmanager and grafana
- Remove prometheus, alertmanager, and grafana Caddy entries from
http-proxy (now served directly by monitoring02)
- Move monitoring02 Vault AppRole to hosts-generated.tf with
extra_policies support and prometheus-metrics policy
- Update Promtail to use authenticated loki.home.2rjus.net endpoint
only (remove unauthenticated monitoring01 client)
- Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with
basic auth from Vault secret
- Move migration plan to completed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert ns1/ns2 from approle.tf (they're in hosts-generated.tf) and add
loki-push policy to generated AppRoles instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
They were missing from the host_policies map, so they didn't get
shared policies like loki-push.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Loki bound to localhost, Caddy reverse proxy with basic_auth
- Vault secret (shared/loki/push-auth) for password, bcrypt hash
generated at boot for Caddy environment
- Promtail dual-ships to monitoring01 (direct) and loki.home.2rjus.net
(with basic auth), conditional on vault.enable
- Terraform: new shared loki-push policy added to all AppRoles
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removed:
- hosts/nix-cache01/ directory
- services/nix-cache/build-flakes.{nix,sh} (replaced by NATS builder)
- Vault secret and AppRole for nix-cache01
- Old signing key variable from terraform
- Old trusted public key from system/nix.nix
Updated:
- flake.nix: removed nixosConfiguration
- README.md: nix-cache01 -> nix-cache02
- Monitoring rules: removed build-flakes alerts, updated harmonia to nix-cache02
- Simplified proxy.nix (no longer needs hostname conditional)
nix-cache02 is now the sole binary cache host.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add vault secrets for Radarr and Sonarr API keys to enable
exportarr metrics collection on monitoring01.
- services/exportarr/radarr - Radarr API key
- services/exportarr/sonarr - Sonarr API key
- Grant monitoring01 access to services/exportarr/*
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable Kanidm users to authenticate to OpenBao via OIDC for Web UI access.
Members of the admins group get full read/write access to secrets.
Changes:
- Add OIDC auth backend in Terraform (oidc.tf)
- Add oidc-admin and oidc-default policies
- Add openbao OAuth2 client to Kanidm
- Enable legacy crypto (RS256) for OpenBao compatibility
- Allow imperative group membership management in Kanidm
Limitations:
- CLI login not supported (Kanidm requires HTTPS for confidential client redirects)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create shared policy granting all hosts access to nixos-exporter nkey
- Add policy to both manual and generated host AppRoles
- Remove duplicate kanidm01/monitoring02 entries from hosts-generated.tf
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deploy Grafana test instance on monitoring02 with:
- Kanidm OIDC authentication (admins -> Admin role, others -> Viewer)
- PKCE enabled for secure OAuth2 flow (required by Kanidm)
- Declarative datasources for Prometheus and Loki on monitoring01
- Local Caddy for TLS termination via internal ACME CA
- DNS CNAME grafana-test.home.2rjus.net
Terraform changes add OAuth2 client secret and AppRole policies for
kanidm01 and monitoring02.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Old VM had incorrect hardware-configuration.nix with hardcoded UUIDs
that didn't match actual disk layout, causing boot failure (emergency mode).
Recreated using template2-based configuration for OpenTofu provisioning.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove hosts/template/ (legacy template1) and give each legacy host
its own hardware-configuration.nix copy
- Recreate ns2 using create-host with template2 base
- Add secondary DNS services (NSD + Unbound resolver)
- Configure Vault policy for shared DNS secrets
- Fix create-host IP uniqueness validator to check CIDR notation
(prevents false positives from DNS resolver entries)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable vault.enable and homelab.deploy.enable on vault01 so it can
receive NATS-based remote deployments. Vault fetches secrets from
itself using AppRole after auto-unseal.
Add systemd ordering to ensure vault-secret services wait for openbao
to be unsealed before attempting to fetch secrets.
Also adds vault01 AppRole entry to Terraform.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add homelab.deploy.enable option (requires vault.enable)
- Create shared homelab-deploy Vault policy for all hosts
- Enable homelab.deploy on all vault-enabled hosts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add system/homelab-deploy.nix module that automatically enables the
listener on all hosts with vault.enable=true. Uses homelab.host.tier
and homelab.host.role for NATS subject subscriptions.
- Add homelab-deploy access to all host AppRole policies
- Remove manual listener config from vaulttest01 (now handled by system module)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add homelab-deploy flake input and NixOS module for message-based
deployments across the fleet. Configure DEPLOY account in NATS with
tiered access control (listener, test-deployer, admin-deployer).
Enable listener on vaulttest01 as initial test host.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of creating a long-lived Vault token in Terraform (which gets
invalidated when Terraform recreates it), monitoring01 now uses its
existing AppRole credentials to fetch a fresh token for Prometheus.
Changes:
- Add prometheus-metrics policy to monitoring01's AppRole
- Remove vault_token.prometheus_metrics resource from Terraform
- Remove openbao-token KV secret from Terraform
- Add systemd service to fetch AppRole token on boot
- Add systemd timer to refresh token every 30 minutes
This ensures Prometheus always has a valid token without depending on
Terraform state or manual intervention.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace sops-nix secrets with OpenBao vault secrets across all hosts.
Hardcode root password hash, add extractKey option to vault-secrets
module, update Terraform with secrets/policies for all hosts, and
create AppRole provisioning playbook.
Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01
Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>