vault: implement bootstrap integration
This commit is contained in:
138
TODO.md
138
TODO.md
@@ -185,7 +185,7 @@ create-host \
|
||||
|
||||
**Current Architecture:**
|
||||
```
|
||||
vault.home.2rjus.net (10.69.13.19)
|
||||
vault01.home.2rjus.net (10.69.13.19)
|
||||
├─ KV Secrets Engine (ready to replace sops-nix)
|
||||
│ ├─ secret/hosts/{hostname}/*
|
||||
│ ├─ secret/services/{service}/*
|
||||
@@ -197,18 +197,18 @@ vault.home.2rjus.net (10.69.13.19)
|
||||
├─ SSH CA Engine (TODO: Phase 4c)
|
||||
└─ AppRole Auth (per-host authentication configured)
|
||||
↓
|
||||
[Phase 4d] New hosts authenticate on first boot
|
||||
[Phase 4d] Fetch secrets via Vault API
|
||||
[✅ Phase 4d] New hosts authenticate on first boot
|
||||
[✅ Phase 4d] Fetch secrets via Vault API
|
||||
No manual key distribution needed
|
||||
```
|
||||
|
||||
**Completed:**
|
||||
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
|
||||
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
|
||||
- ✅ Phase 4d: Bootstrap integration for automated secrets access
|
||||
|
||||
**Next Steps:**
|
||||
- Phase 4c: Migrate from step-ca to OpenBao PKI
|
||||
- Phase 4d: Bootstrap integration for automated secrets access
|
||||
|
||||
---
|
||||
|
||||
@@ -243,7 +243,7 @@ vault.home.2rjus.net (10.69.13.19)
|
||||
- [x] File storage backend
|
||||
- [x] Self-signed TLS certificates via LoadCredential
|
||||
- [x] Deploy to infrastructure
|
||||
- [x] DNS entry added for vault.home.2rjus.net
|
||||
- [x] DNS entry added for vault01.home.2rjus.net
|
||||
- [x] VM deployed via terraform
|
||||
- [x] Verified OpenBao running and auto-unsealing
|
||||
|
||||
@@ -353,7 +353,7 @@ vault.home.2rjus.net (10.69.13.19)
|
||||
- [x] Enabled ACME on intermediate CA
|
||||
- [x] Created PKI role for `*.home.2rjus.net`
|
||||
- [x] Set certificate TTLs (30 day max) and allowed domains
|
||||
- [x] ACME directory: `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [x] ACME directory: `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Download and distribute root CA certificate
|
||||
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
|
||||
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
|
||||
@@ -368,7 +368,7 @@ vault.home.2rjus.net (10.69.13.19)
|
||||
- [ ] Update service configuration
|
||||
- [ ] Migrate hosts from step-ca to OpenBao
|
||||
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Change server to `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test on one host (non-critical service)
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Configure SSH CA in OpenBao (optional, future work)
|
||||
@@ -388,55 +388,99 @@ vault.home.2rjus.net (10.69.13.19)
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4d: Bootstrap Integration
|
||||
#### Phase 4d: Bootstrap Integration ✅ COMPLETED (2026-02-02)
|
||||
|
||||
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Update create-host tool
|
||||
- [ ] Generate AppRole role_id + secret_id for new host
|
||||
- [ ] Or create wrapped token for one-time bootstrap
|
||||
- [ ] Add host-specific policy to Vault (via terraform)
|
||||
- [ ] Store bootstrap credentials for cloud-init injection
|
||||
- [ ] Update template2 for Vault authentication
|
||||
- [ ] Create Vault authentication module
|
||||
- [ ] Reads bootstrap credentials from cloud-init
|
||||
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
|
||||
- [ ] Stores role_id + secret_id locally for services to use
|
||||
- [ ] Create NixOS Vault secrets module
|
||||
- [ ] Replacement for sops.secrets
|
||||
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
|
||||
- [ ] Or runtime secret fetching for services
|
||||
- [ ] Handle Vault token renewal
|
||||
- [ ] Update bootstrap service
|
||||
- [ ] After authenticating to Vault, fetch any bootstrap secrets
|
||||
- [ ] Run nixos-rebuild with host configuration
|
||||
- [ ] Services automatically fetch their secrets from Vault
|
||||
- [ ] Update terraform cloud-init
|
||||
- [ ] Inject Vault address and bootstrap credentials
|
||||
- [ ] Pass via cloud-init user-data or write_files
|
||||
- [ ] Credentials scoped to single use or short TTL
|
||||
- [ ] Test complete flow
|
||||
- [ ] Run create-host to generate new host config
|
||||
- [ ] Deploy with terraform
|
||||
- [ ] Verify host bootstraps and authenticates to Vault
|
||||
- [ ] Verify services can fetch secrets
|
||||
- [ ] Confirm no manual steps required
|
||||
- [x] Update create-host tool
|
||||
- [x] Generate wrapped token (24h TTL, single-use) for new host
|
||||
- [x] Add host-specific policy to Vault (via terraform/vault/hosts-generated.tf)
|
||||
- [x] Store wrapped token in terraform/vms.tf for cloud-init injection
|
||||
- [x] Add `--regenerate-token` flag to regenerate only the token without overwriting config
|
||||
- [x] Update template2 for Vault authentication
|
||||
- [x] Reads wrapped token from cloud-init (/run/cloud-init-env)
|
||||
- [x] Unwraps token to get role_id + secret_id
|
||||
- [x] Stores AppRole credentials in /var/lib/vault/approle/ (persistent)
|
||||
- [x] Graceful fallback if Vault unavailable during bootstrap
|
||||
- [x] Create NixOS Vault secrets module (system/vault-secrets.nix)
|
||||
- [x] Runtime secret fetching (services fetch on start, not at nixos-rebuild time)
|
||||
- [x] Secrets cached in /var/lib/vault/cache/ for fallback when Vault unreachable
|
||||
- [x] Secrets written to /run/secrets/ (tmpfs, cleared on reboot)
|
||||
- [x] Fresh authentication per service start (no token renewal needed)
|
||||
- [x] Optional periodic rotation with systemd timers
|
||||
- [x] Critical service protection (no auto-restart for DNS, CA, Vault itself)
|
||||
- [x] Create vault-fetch helper script
|
||||
- [x] Standalone tool for fetching secrets from Vault
|
||||
- [x] Authenticates using AppRole credentials
|
||||
- [x] Writes individual files per secret key
|
||||
- [x] Handles caching and fallback logic
|
||||
- [x] Update bootstrap service (hosts/template2/bootstrap.nix)
|
||||
- [x] Unwraps Vault token on first boot
|
||||
- [x] Stores persistent AppRole credentials
|
||||
- [x] Continues with nixos-rebuild
|
||||
- [x] Services fetch secrets when they start
|
||||
- [x] Update terraform cloud-init (terraform/cloud-init.tf)
|
||||
- [x] Inject VAULT_ADDR and VAULT_WRAPPED_TOKEN via write_files
|
||||
- [x] Write to /run/cloud-init-env (tmpfs, cleaned on reboot)
|
||||
- [x] Fixed YAML indentation issues (write_files at top level)
|
||||
- [x] Support flake_branch alongside vault credentials
|
||||
- [x] Test complete flow
|
||||
- [x] Created vaulttest01 test host
|
||||
- [x] Verified bootstrap with Vault integration
|
||||
- [x] Verified service secret fetching
|
||||
- [x] Tested cache fallback when Vault unreachable
|
||||
- [x] Tested wrapped token single-use (second bootstrap fails as expected)
|
||||
- [x] Confirmed zero manual steps required
|
||||
|
||||
**Bootstrap flow:**
|
||||
**Implementation Details:**
|
||||
|
||||
**Wrapped Token Security:**
|
||||
- Single-use tokens prevent reuse if leaked
|
||||
- 24h TTL limits exposure window
|
||||
- Safe to commit to git (expired/used tokens useless)
|
||||
- Regenerate with `create-host --hostname X --regenerate-token`
|
||||
|
||||
**Secret Fetching:**
|
||||
- Runtime (not build-time) keeps secrets out of Nix store
|
||||
- Cache fallback enables service availability when Vault down
|
||||
- Fresh authentication per service start (no renewal complexity)
|
||||
- Individual files per secret key for easy consumption
|
||||
|
||||
**Bootstrap Flow:**
|
||||
```
|
||||
1. terraform apply (deploys VM with cloud-init)
|
||||
2. Cloud-init sets hostname + Vault bootstrap credentials
|
||||
1. create-host --hostname myhost --ip 10.69.13.x/24
|
||||
↓ Generates wrapped token, updates terraform
|
||||
2. tofu apply (deploys VM with cloud-init)
|
||||
↓ Cloud-init writes wrapped token to /run/cloud-init-env
|
||||
3. nixos-bootstrap.service runs:
|
||||
- Authenticates to Vault with bootstrap credentials
|
||||
- Retrieves permanent AppRole credentials
|
||||
- Stores locally for service use
|
||||
- Runs nixos-rebuild
|
||||
4. Host services fetch secrets from Vault as needed
|
||||
5. Done - no manual intervention
|
||||
↓ Unwraps token → gets role_id + secret_id
|
||||
↓ Stores in /var/lib/vault/approle/ (persistent)
|
||||
↓ Runs nixos-rebuild boot
|
||||
4. Service starts → fetches secrets from Vault
|
||||
↓ Uses stored AppRole credentials
|
||||
↓ Caches secrets for fallback
|
||||
5. Done - zero manual intervention
|
||||
```
|
||||
|
||||
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
|
||||
**Files Created:**
|
||||
- `scripts/vault-fetch/` - Secret fetching helper (Nix package)
|
||||
- `system/vault-secrets.nix` - NixOS module for declarative Vault secrets
|
||||
- `scripts/create-host/vault_helper.py` - Vault API integration
|
||||
- `terraform/vault/hosts-generated.tf` - Auto-generated host policies
|
||||
- `docs/vault-bootstrap-implementation.md` - Architecture documentation
|
||||
- `docs/vault-bootstrap-testing.md` - Testing guide
|
||||
|
||||
**Configuration:**
|
||||
- Vault address: `https://vault01.home.2rjus.net:8200` (configurable)
|
||||
- All defaults remain configurable via environment variables or NixOS options
|
||||
|
||||
**Next Steps:**
|
||||
- Gradually migrate existing services from sops-nix to Vault
|
||||
- Add CNAME for vault.home.2rjus.net → vault01.home.2rjus.net
|
||||
- Phase 4c: Migrate from step-ca to OpenBao PKI (future)
|
||||
|
||||
**Deliverable:** ✅ Fully automated secrets access from first boot, zero manual steps
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user