Some checks failed
Run nix flake check / flake-check (push) Failing after 18m17s
Update CLAUDE.md and README.md to reflect that secrets are now managed by OpenBao, with sops only remaining for ca. Update migration plans with sops cleanup checklist and auth01 decommission. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
87 lines
3.2 KiB
Markdown
87 lines
3.2 KiB
Markdown
# Sops to OpenBao Secrets Migration Plan
|
|
|
|
## Status: Complete (except ca, deferred)
|
|
|
|
## Remaining sops cleanup
|
|
|
|
The `sops-nix` flake input, `system/sops.nix`, `.sops.yaml`, and `secrets/` directory are
|
|
still present because `ca` still uses sops for its step-ca secrets (5 secrets in
|
|
`services/ca/default.nix`). The `services/authelia/` and `services/lldap/` modules also
|
|
reference sops but are only used by auth01 (decommissioned).
|
|
|
|
Once `ca` is migrated to OpenBao PKI (Phase 4c in host-migration-to-opentofu.md), remove:
|
|
- `sops-nix` input from `flake.nix`
|
|
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
|
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
|
- `system/sops.nix` and its import in `system/default.nix`
|
|
- `.sops.yaml`
|
|
- `secrets/` directory
|
|
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
|
|
|
## Overview
|
|
|
|
Migrate all hosts from sops-nix secrets to OpenBao (vault) secrets management. Pilot with ha1, then roll out to remaining hosts in waves.
|
|
|
|
## Pre-requisites (completed)
|
|
|
|
1. Hardcoded root password hash in `system/root-user.nix` (removes sops dependency for all hosts)
|
|
2. Added `extractKey` option to `system/vault-secrets.nix` (extracts single key as file)
|
|
|
|
## Deployment Order
|
|
|
|
### Pilot: ha1
|
|
- Terraform: shared/backup/password secret, ha1 AppRole policy
|
|
- Provision AppRole credentials via `playbooks/provision-approle.yml`
|
|
- NixOS: vault.enable + backup-helper vault secret
|
|
|
|
### Wave 1: nats1, jelly01, pgdb1
|
|
- No service secrets (only root password, already handled)
|
|
- Just need AppRole policies + credential provisioning
|
|
|
|
### Wave 2: monitoring01
|
|
- 3 secrets: backup password, nats nkey, pve-exporter config
|
|
- Updates: alerttonotify.nix, pve.nix, configuration.nix
|
|
|
|
### Wave 3: ns1, then ns2 (critical - deploy ns1 first, verify, then ns2)
|
|
- DNS zone transfer key (shared/dns/xfer-key)
|
|
|
|
### Wave 4: http-proxy
|
|
- WireGuard private key
|
|
|
|
### Wave 5: nix-cache01
|
|
- Cache signing key + Gitea Actions token
|
|
|
|
### Wave 6: ca (DEFERRED - waiting for PKI migration)
|
|
|
|
### Skipped: auth01 (decommissioned)
|
|
|
|
## Terraform variables needed
|
|
|
|
User must extract from sops and add to `terraform/vault/terraform.tfvars`:
|
|
|
|
| Variable | Source |
|
|
|----------|--------|
|
|
| `backup_helper_secret` | `sops -d secrets/secrets.yaml` |
|
|
| `ns_xfer_key` | `sops -d secrets/secrets.yaml` |
|
|
| `nats_nkey` | `sops -d secrets/secrets.yaml` |
|
|
| `pve_exporter_config` | `sops -d secrets/monitoring01/pve-exporter.yaml` |
|
|
| `wireguard_private_key` | `sops -d secrets/http-proxy/wireguard.yaml` |
|
|
| `cache_signing_key` | `sops -d secrets/nix-cache01/cache-secret` |
|
|
| `actions_token_1` | `sops -d secrets/nix-cache01/actions_token_1` |
|
|
|
|
## Provisioning AppRole credentials
|
|
|
|
```bash
|
|
export BAO_ADDR='https://vault01.home.2rjus.net:8200'
|
|
export BAO_TOKEN='<root-token>'
|
|
nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>
|
|
```
|
|
|
|
## Verification (per host)
|
|
|
|
1. `systemctl status vault-secret-*` - all secret fetch services succeeded
|
|
2. Check secret files exist at expected paths with correct permissions
|
|
3. Verify dependent services are running
|
|
4. Check `/var/lib/vault/cache/` is populated (fallback ready)
|
|
5. Reboot host to verify boot-time secret fetching works
|