Some checks failed
Run nix flake check / flake-check (push) Failing after 18m17s
Update CLAUDE.md and README.md to reflect that secrets are now managed by OpenBao, with sops only remaining for ca. Update migration plans with sops cleanup checklist and auth01 decommission. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.2 KiB
3.2 KiB
Sops to OpenBao Secrets Migration Plan
Status: Complete (except ca, deferred)
Remaining sops cleanup
The sops-nix flake input, system/sops.nix, .sops.yaml, and secrets/ directory are
still present because ca still uses sops for its step-ca secrets (5 secrets in
services/ca/default.nix). The services/authelia/ and services/lldap/ modules also
reference sops but are only used by auth01 (decommissioned).
Once ca is migrated to OpenBao PKI (Phase 4c in host-migration-to-opentofu.md), remove:
sops-nixinput fromflake.nixsops-nix.nixosModules.sopsfrom all host module lists inflake.nixinherit sops-nixfrom all specialArgs inflake.nixsystem/sops.nixand its import insystem/default.nix.sops.yamlsecrets/directory- All
sops.secrets.*declarations inservices/ca/,services/authelia/,services/lldap/
Overview
Migrate all hosts from sops-nix secrets to OpenBao (vault) secrets management. Pilot with ha1, then roll out to remaining hosts in waves.
Pre-requisites (completed)
- Hardcoded root password hash in
system/root-user.nix(removes sops dependency for all hosts) - Added
extractKeyoption tosystem/vault-secrets.nix(extracts single key as file)
Deployment Order
Pilot: ha1
- Terraform: shared/backup/password secret, ha1 AppRole policy
- Provision AppRole credentials via
playbooks/provision-approle.yml - NixOS: vault.enable + backup-helper vault secret
Wave 1: nats1, jelly01, pgdb1
- No service secrets (only root password, already handled)
- Just need AppRole policies + credential provisioning
Wave 2: monitoring01
- 3 secrets: backup password, nats nkey, pve-exporter config
- Updates: alerttonotify.nix, pve.nix, configuration.nix
Wave 3: ns1, then ns2 (critical - deploy ns1 first, verify, then ns2)
- DNS zone transfer key (shared/dns/xfer-key)
Wave 4: http-proxy
- WireGuard private key
Wave 5: nix-cache01
- Cache signing key + Gitea Actions token
Wave 6: ca (DEFERRED - waiting for PKI migration)
Skipped: auth01 (decommissioned)
Terraform variables needed
User must extract from sops and add to terraform/vault/terraform.tfvars:
| Variable | Source |
|---|---|
backup_helper_secret |
sops -d secrets/secrets.yaml |
ns_xfer_key |
sops -d secrets/secrets.yaml |
nats_nkey |
sops -d secrets/secrets.yaml |
pve_exporter_config |
sops -d secrets/monitoring01/pve-exporter.yaml |
wireguard_private_key |
sops -d secrets/http-proxy/wireguard.yaml |
cache_signing_key |
sops -d secrets/nix-cache01/cache-secret |
actions_token_1 |
sops -d secrets/nix-cache01/actions_token_1 |
Provisioning AppRole credentials
export BAO_ADDR='https://vault01.home.2rjus.net:8200'
export BAO_TOKEN='<root-token>'
nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>
Verification (per host)
systemctl status vault-secret-*- all secret fetch services succeeded- Check secret files exist at expected paths with correct permissions
- Verify dependent services are running
- Check
/var/lib/vault/cache/is populated (fallback ready) - Reboot host to verify boot-time secret fetching works