From 63662b89e0b3bea2e3a0cd55211977fb201ebf62 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Tue, 3 Feb 2026 06:53:38 +0100 Subject: [PATCH] docs: update TODO.md --- TODO.md | 74 ++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 57 insertions(+), 17 deletions(-) diff --git a/TODO.md b/TODO.md index bf3d136..909aa92 100644 --- a/TODO.md +++ b/TODO.md @@ -155,7 +155,7 @@ create-host \ ### Phase 4: Secrets Management with OpenBao (Vault) -**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress +**Status:** 🚧 Phases 4a, 4b, 4c (partial), & 4d Complete **Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys @@ -339,6 +339,8 @@ vault01.home.2rjus.net (10.69.13.19) #### Phase 4c: PKI Migration (Replace step-ca) +**Status:** 🚧 Partially Complete - vault01 and test host migrated, remaining hosts pending + **Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates **Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support) @@ -349,27 +351,33 @@ vault01.home.2rjus.net (10.69.13.19) - [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384) - [x] Signed intermediate with root CA - [x] Configured CRL, OCSP, and issuing certificate URLs -- [x] Enable ACME support (completed in Phase 4b) +- [x] Enable ACME support (completed in Phase 4b, fixed in Phase 4c) - [x] Enabled ACME on intermediate CA - [x] Created PKI role for `*.home.2rjus.net` - [x] Set certificate TTLs (30 day max) and allowed domains - [x] ACME directory: `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory` -- [ ] Download and distribute root CA certificate - - [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt` - - [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles` - - [ ] Deploy via auto-upgrade -- [ ] Test certificate issuance - - [ ] Issue test certificate using ACME client (lego/certbot) - - [ ] Or issue static certificate via OpenBao CLI - - [ ] Verify certificate chain and trust -- [ ] Migrate vault01's own certificate - - [ ] Issue new certificate from OpenBao PKI (self-issued) - - [ ] Replace self-signed bootstrap certificate - - [ ] Update service configuration + - [x] Fixed ACME response headers (added Replay-Nonce, Link, Location to allowed_response_headers) + - [x] Configured cluster path for ACME +- [x] Download and distribute root CA certificate + - [x] Added root CA to `system/pki/root-ca.nix` + - [x] Distributed to all hosts via system imports +- [x] Test certificate issuance + - [x] Tested ACME issuance on vaulttest01 successfully + - [x] Verified certificate chain and trust +- [x] Migrate vault01's own certificate + - [x] Created `bootstrap-vault-cert` script for initial certificate issuance via bao CLI + - [x] Issued certificate with SANs (vault01.home.2rjus.net + vault.home.2rjus.net) + - [x] Updated service to read certificates from `/var/lib/acme/vault01.home.2rjus.net/` + - [x] Configured ACME for automatic renewals - [ ] Migrate hosts from step-ca to OpenBao + - [x] Tested on vaulttest01 (non-production host) + - [ ] Standardize hostname usage across all configurations + - [ ] Use `vault.home.2rjus.net` (CNAME) consistently everywhere + - [ ] Update NixOS configurations to use CNAME instead of vault01 + - [ ] Update Terraform configurations to use CNAME + - [ ] Audit and fix mixed usage of vault01.home.2rjus.net vs vault.home.2rjus.net - [ ] Update `system/acme.nix` to use OpenBao ACME endpoint - - [ ] Change server to `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory` - - [ ] Test on one host (non-critical service) + - [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory` - [ ] Roll out to all hosts via auto-upgrade - [ ] Configure SSH CA in OpenBao (optional, future work) - [ ] Enable SSH secrets engine (`ssh/` mount) @@ -384,7 +392,39 @@ vault01.home.2rjus.net (10.69.13.19) - [ ] Archive step-ca configuration for backup - [ ] Update documentation -**Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired +**Implementation Details (2026-02-03):** + +**ACME Configuration Fix:** +The key blocker was that OpenBao's PKI mount was filtering out required ACME response headers. The solution was to add `allowed_response_headers` to the Terraform mount configuration: +```hcl +allowed_response_headers = [ + "Replay-Nonce", # Required for ACME nonce generation + "Link", # Required for ACME navigation + "Location" # Required for ACME resource location +] +``` + +**Cluster Path Configuration:** +ACME requires the cluster path to include the full API path: +```hcl +path = "${var.vault_address}/v1/${vault_mount.pki_int.path}" +aia_path = "${var.vault_address}/v1/${vault_mount.pki_int.path}" +``` + +**Bootstrap Process:** +Since vault01 needed a certificate from its own PKI (chicken-and-egg problem), we created a `bootstrap-vault-cert` script that: +1. Uses the Unix socket (no TLS) to issue a certificate via `bao` CLI +2. Places it in the ACME directory structure +3. Includes both vault01.home.2rjus.net and vault.home.2rjus.net as SANs +4. After restart, ACME manages renewals automatically + +**Files Modified:** +- `terraform/vault/pki.tf` - Added allowed_response_headers, cluster config, ACME config +- `services/vault/default.nix` - Updated cert paths, added bootstrap script, configured ACME +- `system/pki/root-ca.nix` - Added OpenBao root CA to trust store +- `hosts/vaulttest01/configuration.nix` - Overrode ACME server for testing + +**Deliverable:** ✅ vault01 and vaulttest01 using OpenBao PKI, remaining hosts still on step-ca ---