docs: update for sops-to-openbao migration completion
Some checks failed
Run nix flake check / flake-check (push) Failing after 18m17s

Update CLAUDE.md and README.md to reflect that secrets are now managed
by OpenBao, with sops only remaining for ca. Update migration plans
with sops cleanup checklist and auth01 decommission.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-05 20:06:21 +01:00
parent 6d117d68ca
commit 7d92c55d37
4 changed files with 81 additions and 26 deletions

View File

@@ -52,7 +52,12 @@ nix develop
### Secrets Management
Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the
`vault.secrets` option defined in `system/vault-secrets.nix` to fetch secrets at boot.
Terraform manages the secrets and AppRole policies in `terraform/vault/`.
Legacy sops-nix is still present but only actively used by the `ca` host. Do not edit any
`.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
### Git Workflow
@@ -119,7 +124,7 @@ This ensures documentation matches the exact nixpkgs version (currently NixOS 25
- `default.nix` - Entry point, imports configuration.nix and services
- `configuration.nix` - Host-specific settings (networking, hardware, users)
- `/system/` - Shared system-level configurations applied to ALL hosts
- Core modules: nix.nix, sshd.nix, sops.nix, acme.nix, autoupgrade.nix
- Core modules: nix.nix, sshd.nix, sops.nix (legacy), vault-secrets.nix, acme.nix, autoupgrade.nix
- Monitoring: node-exporter and promtail on every host
- `/modules/` - Custom NixOS modules
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
@@ -131,13 +136,13 @@ This ensures documentation matches the exact nixpkgs version (currently NixOS 25
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
- `ns/` - DNS services (authoritative, resolver, zone generation)
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
- `/secrets/` - SOPS-encrypted secrets with age encryption
- `/secrets/` - SOPS-encrypted secrets with age encryption (legacy, only used by ca)
- `/common/` - Shared configurations (e.g., VM guest agent)
- `/docs/` - Documentation and plans
- `plans/` - Future plans and proposals
- `plans/completed/` - Completed plans (moved here when done)
- `/playbooks/` - Ansible playbooks for fleet management
- `/.sops.yaml` - SOPS configuration with age keys for all servers
- `/.sops.yaml` - SOPS configuration with age keys (legacy, only used by ca)
### Configuration Inheritance
@@ -153,7 +158,7 @@ hosts/<hostname>/default.nix
All hosts automatically get:
- Nix binary cache (nix-cache.home.2rjus.net)
- SSH with root login enabled
- SOPS secrets management with auto-generated age keys
- OpenBao (Vault) secrets management via AppRole
- Internal ACME CA integration (ca.home.2rjus.net)
- Daily auto-upgrades with auto-reboot
- Prometheus node-exporter + Promtail (logs to monitoring01)
@@ -173,7 +178,6 @@ Production servers managed by `rebuild-all.sh`:
- `nix-cache01` - Binary cache server
- `pgdb1` - PostgreSQL database
- `nats1` - NATS messaging server
- `auth01` - Authentication service
Template/test hosts:
- `template1` - Base template for cloning new hosts
@@ -182,7 +186,7 @@ Template/test hosts:
- `nixpkgs` - NixOS 25.11 stable (primary)
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
- `sops-nix` - Secrets management
- `sops-nix` - Secrets management (legacy, only used by ca)
- Custom packages from git.t-juice.club:
- `alerttonotify` - Alert routing
- `labmon` - Lab monitoring
@@ -198,12 +202,21 @@ Template/test hosts:
### Secrets Management
- Uses SOPS with age encryption
- Each server has unique age key in `.sops.yaml`
- Keys auto-generated at `/var/lib/sops-nix/key.txt` on first boot
Most hosts use OpenBao (Vault) for secrets:
- Vault server at `vault01.home.2rjus.net:8200`
- AppRole authentication with credentials at `/var/lib/vault/approle/`
- Secrets defined in Terraform (`terraform/vault/secrets.tf`)
- AppRole policies in Terraform (`terraform/vault/approle.tf`)
- NixOS module: `system/vault-secrets.nix` with `vault.secrets.<name>` options
- `extractKey` option extracts a single key from vault JSON as a plain file
- Secrets fetched at boot by `vault-secret-<name>.service` systemd units
- Fallback to cached secrets in `/var/lib/vault/cache/` when Vault is unreachable
- Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
Legacy SOPS (only used by `ca` host):
- SOPS with age encryption, keys in `.sops.yaml`
- Shared secrets: `/secrets/secrets.yaml`
- Per-host secrets: `/secrets/<hostname>/`
- All production servers can decrypt shared secrets; host-specific secrets require specific host keys
### Auto-Upgrade System
@@ -303,13 +316,15 @@ This means:
3. Add host entry to `flake.nix` nixosConfigurations
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
6. User clones template host
7. User runs `prepare-host.sh` on new host, this deletes files which should be regenerated, like ssh host keys, machine-id etc. It also creates a new age key, and prints the public key
8. This key is then added to `.sops.yaml`
9. Create `/secrets/<hostname>/` if needed
10. Commit changes, and merge to master.
11. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
12. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
6. Add `vault.enable = true;` to the host configuration
7. Add AppRole policy in `terraform/vault/approle.tf` and any secrets in `secrets.tf`
8. Run `tofu apply` in `terraform/vault/`
9. User clones template host
10. User runs `prepare-host.sh` on new host
11. Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
12. Commit changes, and merge to master.
13. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
14. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.