vault: implement bootstrap integration
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
Run nix flake check / flake-check (pull_request) Successful in 2m20s

This commit is contained in:
2026-02-02 22:27:28 +01:00
parent 7fc69c40a6
commit 2b4dc424cc
28 changed files with 2298 additions and 77 deletions

138
TODO.md
View File

@@ -185,7 +185,7 @@ create-host \
**Current Architecture:**
```
vault.home.2rjus.net (10.69.13.19)
vault01.home.2rjus.net (10.69.13.19)
├─ KV Secrets Engine (ready to replace sops-nix)
│ ├─ secret/hosts/{hostname}/*
│ ├─ secret/services/{service}/*
@@ -197,18 +197,18 @@ vault.home.2rjus.net (10.69.13.19)
├─ SSH CA Engine (TODO: Phase 4c)
└─ AppRole Auth (per-host authentication configured)
[Phase 4d] New hosts authenticate on first boot
[Phase 4d] Fetch secrets via Vault API
[Phase 4d] New hosts authenticate on first boot
[Phase 4d] Fetch secrets via Vault API
No manual key distribution needed
```
**Completed:**
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
- ✅ Phase 4d: Bootstrap integration for automated secrets access
**Next Steps:**
- Phase 4c: Migrate from step-ca to OpenBao PKI
- Phase 4d: Bootstrap integration for automated secrets access
---
@@ -243,7 +243,7 @@ vault.home.2rjus.net (10.69.13.19)
- [x] File storage backend
- [x] Self-signed TLS certificates via LoadCredential
- [x] Deploy to infrastructure
- [x] DNS entry added for vault.home.2rjus.net
- [x] DNS entry added for vault01.home.2rjus.net
- [x] VM deployed via terraform
- [x] Verified OpenBao running and auto-unsealing
@@ -353,7 +353,7 @@ vault.home.2rjus.net (10.69.13.19)
- [x] Enabled ACME on intermediate CA
- [x] Created PKI role for `*.home.2rjus.net`
- [x] Set certificate TTLs (30 day max) and allowed domains
- [x] ACME directory: `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [x] ACME directory: `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Download and distribute root CA certificate
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
@@ -368,7 +368,7 @@ vault.home.2rjus.net (10.69.13.19)
- [ ] Update service configuration
- [ ] Migrate hosts from step-ca to OpenBao
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Change server to `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test on one host (non-critical service)
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Configure SSH CA in OpenBao (optional, future work)
@@ -388,55 +388,99 @@ vault.home.2rjus.net (10.69.13.19)
---
#### Phase 4d: Bootstrap Integration
#### Phase 4d: Bootstrap Integration ✅ COMPLETED (2026-02-02)
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
**Tasks:**
- [ ] Update create-host tool
- [ ] Generate AppRole role_id + secret_id for new host
- [ ] Or create wrapped token for one-time bootstrap
- [ ] Add host-specific policy to Vault (via terraform)
- [ ] Store bootstrap credentials for cloud-init injection
- [ ] Update template2 for Vault authentication
- [ ] Create Vault authentication module
- [ ] Reads bootstrap credentials from cloud-init
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
- [ ] Stores role_id + secret_id locally for services to use
- [ ] Create NixOS Vault secrets module
- [ ] Replacement for sops.secrets
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
- [ ] Or runtime secret fetching for services
- [ ] Handle Vault token renewal
- [ ] Update bootstrap service
- [ ] After authenticating to Vault, fetch any bootstrap secrets
- [ ] Run nixos-rebuild with host configuration
- [ ] Services automatically fetch their secrets from Vault
- [ ] Update terraform cloud-init
- [ ] Inject Vault address and bootstrap credentials
- [ ] Pass via cloud-init user-data or write_files
- [ ] Credentials scoped to single use or short TTL
- [ ] Test complete flow
- [ ] Run create-host to generate new host config
- [ ] Deploy with terraform
- [ ] Verify host bootstraps and authenticates to Vault
- [ ] Verify services can fetch secrets
- [ ] Confirm no manual steps required
- [x] Update create-host tool
- [x] Generate wrapped token (24h TTL, single-use) for new host
- [x] Add host-specific policy to Vault (via terraform/vault/hosts-generated.tf)
- [x] Store wrapped token in terraform/vms.tf for cloud-init injection
- [x] Add `--regenerate-token` flag to regenerate only the token without overwriting config
- [x] Update template2 for Vault authentication
- [x] Reads wrapped token from cloud-init (/run/cloud-init-env)
- [x] Unwraps token to get role_id + secret_id
- [x] Stores AppRole credentials in /var/lib/vault/approle/ (persistent)
- [x] Graceful fallback if Vault unavailable during bootstrap
- [x] Create NixOS Vault secrets module (system/vault-secrets.nix)
- [x] Runtime secret fetching (services fetch on start, not at nixos-rebuild time)
- [x] Secrets cached in /var/lib/vault/cache/ for fallback when Vault unreachable
- [x] Secrets written to /run/secrets/ (tmpfs, cleared on reboot)
- [x] Fresh authentication per service start (no token renewal needed)
- [x] Optional periodic rotation with systemd timers
- [x] Critical service protection (no auto-restart for DNS, CA, Vault itself)
- [x] Create vault-fetch helper script
- [x] Standalone tool for fetching secrets from Vault
- [x] Authenticates using AppRole credentials
- [x] Writes individual files per secret key
- [x] Handles caching and fallback logic
- [x] Update bootstrap service (hosts/template2/bootstrap.nix)
- [x] Unwraps Vault token on first boot
- [x] Stores persistent AppRole credentials
- [x] Continues with nixos-rebuild
- [x] Services fetch secrets when they start
- [x] Update terraform cloud-init (terraform/cloud-init.tf)
- [x] Inject VAULT_ADDR and VAULT_WRAPPED_TOKEN via write_files
- [x] Write to /run/cloud-init-env (tmpfs, cleaned on reboot)
- [x] Fixed YAML indentation issues (write_files at top level)
- [x] Support flake_branch alongside vault credentials
- [x] Test complete flow
- [x] Created vaulttest01 test host
- [x] Verified bootstrap with Vault integration
- [x] Verified service secret fetching
- [x] Tested cache fallback when Vault unreachable
- [x] Tested wrapped token single-use (second bootstrap fails as expected)
- [x] Confirmed zero manual steps required
**Bootstrap flow:**
**Implementation Details:**
**Wrapped Token Security:**
- Single-use tokens prevent reuse if leaked
- 24h TTL limits exposure window
- Safe to commit to git (expired/used tokens useless)
- Regenerate with `create-host --hostname X --regenerate-token`
**Secret Fetching:**
- Runtime (not build-time) keeps secrets out of Nix store
- Cache fallback enables service availability when Vault down
- Fresh authentication per service start (no renewal complexity)
- Individual files per secret key for easy consumption
**Bootstrap Flow:**
```
1. terraform apply (deploys VM with cloud-init)
2. Cloud-init sets hostname + Vault bootstrap credentials
1. create-host --hostname myhost --ip 10.69.13.x/24
↓ Generates wrapped token, updates terraform
2. tofu apply (deploys VM with cloud-init)
↓ Cloud-init writes wrapped token to /run/cloud-init-env
3. nixos-bootstrap.service runs:
- Authenticates to Vault with bootstrap credentials
- Retrieves permanent AppRole credentials
- Stores locally for service use
- Runs nixos-rebuild
4. Host services fetch secrets from Vault as needed
5. Done - no manual intervention
↓ Unwraps token → gets role_id + secret_id
↓ Stores in /var/lib/vault/approle/ (persistent)
↓ Runs nixos-rebuild boot
4. Service starts → fetches secrets from Vault
↓ Uses stored AppRole credentials
↓ Caches secrets for fallback
5. Done - zero manual intervention
```
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
**Files Created:**
- `scripts/vault-fetch/` - Secret fetching helper (Nix package)
- `system/vault-secrets.nix` - NixOS module for declarative Vault secrets
- `scripts/create-host/vault_helper.py` - Vault API integration
- `terraform/vault/hosts-generated.tf` - Auto-generated host policies
- `docs/vault-bootstrap-implementation.md` - Architecture documentation
- `docs/vault-bootstrap-testing.md` - Testing guide
**Configuration:**
- Vault address: `https://vault01.home.2rjus.net:8200` (configurable)
- All defaults remain configurable via environment variables or NixOS options
**Next Steps:**
- Gradually migrate existing services from sops-nix to Vault
- Add CNAME for vault.home.2rjus.net → vault01.home.2rjus.net
- Phase 4c: Migrate from step-ca to OpenBao PKI (future)
**Deliverable:** ✅ Fully automated secrets access from first boot, zero manual steps
---