pki-migration #12
74
TODO.md
74
TODO.md
@@ -155,7 +155,7 @@ create-host \
|
||||
|
||||
### Phase 4: Secrets Management with OpenBao (Vault)
|
||||
|
||||
**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress
|
||||
**Status:** 🚧 Phases 4a, 4b, 4c (partial), & 4d Complete
|
||||
|
||||
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
||||
|
||||
@@ -339,6 +339,8 @@ vault01.home.2rjus.net (10.69.13.19)
|
||||
|
||||
#### Phase 4c: PKI Migration (Replace step-ca)
|
||||
|
||||
**Status:** 🚧 Partially Complete - vault01 and test host migrated, remaining hosts pending
|
||||
|
||||
**Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates
|
||||
|
||||
**Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support)
|
||||
@@ -349,27 +351,33 @@ vault01.home.2rjus.net (10.69.13.19)
|
||||
- [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384)
|
||||
- [x] Signed intermediate with root CA
|
||||
- [x] Configured CRL, OCSP, and issuing certificate URLs
|
||||
- [x] Enable ACME support (completed in Phase 4b)
|
||||
- [x] Enable ACME support (completed in Phase 4b, fixed in Phase 4c)
|
||||
- [x] Enabled ACME on intermediate CA
|
||||
- [x] Created PKI role for `*.home.2rjus.net`
|
||||
- [x] Set certificate TTLs (30 day max) and allowed domains
|
||||
- [x] ACME directory: `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Download and distribute root CA certificate
|
||||
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
|
||||
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
|
||||
- [ ] Deploy via auto-upgrade
|
||||
- [ ] Test certificate issuance
|
||||
- [ ] Issue test certificate using ACME client (lego/certbot)
|
||||
- [ ] Or issue static certificate via OpenBao CLI
|
||||
- [ ] Verify certificate chain and trust
|
||||
- [ ] Migrate vault01's own certificate
|
||||
- [ ] Issue new certificate from OpenBao PKI (self-issued)
|
||||
- [ ] Replace self-signed bootstrap certificate
|
||||
- [ ] Update service configuration
|
||||
- [x] Fixed ACME response headers (added Replay-Nonce, Link, Location to allowed_response_headers)
|
||||
- [x] Configured cluster path for ACME
|
||||
- [x] Download and distribute root CA certificate
|
||||
- [x] Added root CA to `system/pki/root-ca.nix`
|
||||
- [x] Distributed to all hosts via system imports
|
||||
- [x] Test certificate issuance
|
||||
- [x] Tested ACME issuance on vaulttest01 successfully
|
||||
- [x] Verified certificate chain and trust
|
||||
- [x] Migrate vault01's own certificate
|
||||
- [x] Created `bootstrap-vault-cert` script for initial certificate issuance via bao CLI
|
||||
- [x] Issued certificate with SANs (vault01.home.2rjus.net + vault.home.2rjus.net)
|
||||
- [x] Updated service to read certificates from `/var/lib/acme/vault01.home.2rjus.net/`
|
||||
- [x] Configured ACME for automatic renewals
|
||||
- [ ] Migrate hosts from step-ca to OpenBao
|
||||
- [x] Tested on vaulttest01 (non-production host)
|
||||
- [ ] Standardize hostname usage across all configurations
|
||||
- [ ] Use `vault.home.2rjus.net` (CNAME) consistently everywhere
|
||||
- [ ] Update NixOS configurations to use CNAME instead of vault01
|
||||
- [ ] Update Terraform configurations to use CNAME
|
||||
- [ ] Audit and fix mixed usage of vault01.home.2rjus.net vs vault.home.2rjus.net
|
||||
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
|
||||
- [ ] Change server to `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test on one host (non-critical service)
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Configure SSH CA in OpenBao (optional, future work)
|
||||
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
||||
@@ -384,7 +392,39 @@ vault01.home.2rjus.net (10.69.13.19)
|
||||
- [ ] Archive step-ca configuration for backup
|
||||
- [ ] Update documentation
|
||||
|
||||
**Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired
|
||||
**Implementation Details (2026-02-03):**
|
||||
|
||||
**ACME Configuration Fix:**
|
||||
The key blocker was that OpenBao's PKI mount was filtering out required ACME response headers. The solution was to add `allowed_response_headers` to the Terraform mount configuration:
|
||||
```hcl
|
||||
allowed_response_headers = [
|
||||
"Replay-Nonce", # Required for ACME nonce generation
|
||||
"Link", # Required for ACME navigation
|
||||
"Location" # Required for ACME resource location
|
||||
]
|
||||
```
|
||||
|
||||
**Cluster Path Configuration:**
|
||||
ACME requires the cluster path to include the full API path:
|
||||
```hcl
|
||||
path = "${var.vault_address}/v1/${vault_mount.pki_int.path}"
|
||||
aia_path = "${var.vault_address}/v1/${vault_mount.pki_int.path}"
|
||||
```
|
||||
|
||||
**Bootstrap Process:**
|
||||
Since vault01 needed a certificate from its own PKI (chicken-and-egg problem), we created a `bootstrap-vault-cert` script that:
|
||||
1. Uses the Unix socket (no TLS) to issue a certificate via `bao` CLI
|
||||
2. Places it in the ACME directory structure
|
||||
3. Includes both vault01.home.2rjus.net and vault.home.2rjus.net as SANs
|
||||
4. After restart, ACME manages renewals automatically
|
||||
|
||||
**Files Modified:**
|
||||
- `terraform/vault/pki.tf` - Added allowed_response_headers, cluster config, ACME config
|
||||
- `services/vault/default.nix` - Updated cert paths, added bootstrap script, configured ACME
|
||||
- `system/pki/root-ca.nix` - Added OpenBao root CA to trust store
|
||||
- `hosts/vaulttest01/configuration.nix` - Overrode ACME server for testing
|
||||
|
||||
**Deliverable:** ✅ vault01 and vaulttest01 using OpenBao PKI, remaining hosts still on step-ca
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user