Files
nixos-servers/docs/plans/completed/sops-to-openbao-migration.md
Torjus Håkestad 7d92c55d37
Some checks failed
Run nix flake check / flake-check (push) Failing after 18m17s
docs: update for sops-to-openbao migration completion
Update CLAUDE.md and README.md to reflect that secrets are now managed
by OpenBao, with sops only remaining for ca. Update migration plans
with sops cleanup checklist and auth01 decommission.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 20:06:21 +01:00

3.2 KiB

Sops to OpenBao Secrets Migration Plan

Status: Complete (except ca, deferred)

Remaining sops cleanup

The sops-nix flake input, system/sops.nix, .sops.yaml, and secrets/ directory are still present because ca still uses sops for its step-ca secrets (5 secrets in services/ca/default.nix). The services/authelia/ and services/lldap/ modules also reference sops but are only used by auth01 (decommissioned).

Once ca is migrated to OpenBao PKI (Phase 4c in host-migration-to-opentofu.md), remove:

  • sops-nix input from flake.nix
  • sops-nix.nixosModules.sops from all host module lists in flake.nix
  • inherit sops-nix from all specialArgs in flake.nix
  • system/sops.nix and its import in system/default.nix
  • .sops.yaml
  • secrets/ directory
  • All sops.secrets.* declarations in services/ca/, services/authelia/, services/lldap/

Overview

Migrate all hosts from sops-nix secrets to OpenBao (vault) secrets management. Pilot with ha1, then roll out to remaining hosts in waves.

Pre-requisites (completed)

  1. Hardcoded root password hash in system/root-user.nix (removes sops dependency for all hosts)
  2. Added extractKey option to system/vault-secrets.nix (extracts single key as file)

Deployment Order

Pilot: ha1

  • Terraform: shared/backup/password secret, ha1 AppRole policy
  • Provision AppRole credentials via playbooks/provision-approle.yml
  • NixOS: vault.enable + backup-helper vault secret

Wave 1: nats1, jelly01, pgdb1

  • No service secrets (only root password, already handled)
  • Just need AppRole policies + credential provisioning

Wave 2: monitoring01

  • 3 secrets: backup password, nats nkey, pve-exporter config
  • Updates: alerttonotify.nix, pve.nix, configuration.nix

Wave 3: ns1, then ns2 (critical - deploy ns1 first, verify, then ns2)

  • DNS zone transfer key (shared/dns/xfer-key)

Wave 4: http-proxy

  • WireGuard private key

Wave 5: nix-cache01

  • Cache signing key + Gitea Actions token

Wave 6: ca (DEFERRED - waiting for PKI migration)

Skipped: auth01 (decommissioned)

Terraform variables needed

User must extract from sops and add to terraform/vault/terraform.tfvars:

Variable Source
backup_helper_secret sops -d secrets/secrets.yaml
ns_xfer_key sops -d secrets/secrets.yaml
nats_nkey sops -d secrets/secrets.yaml
pve_exporter_config sops -d secrets/monitoring01/pve-exporter.yaml
wireguard_private_key sops -d secrets/http-proxy/wireguard.yaml
cache_signing_key sops -d secrets/nix-cache01/cache-secret
actions_token_1 sops -d secrets/nix-cache01/actions_token_1

Provisioning AppRole credentials

export BAO_ADDR='https://vault01.home.2rjus.net:8200'
export BAO_TOKEN='<root-token>'
nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>

Verification (per host)

  1. systemctl status vault-secret-* - all secret fetch services succeeded
  2. Check secret files exist at expected paths with correct permissions
  3. Verify dependent services are running
  4. Check /var/lib/vault/cache/ is populated (fallback ready)
  5. Reboot host to verify boot-time secret fetching works