diff --git a/TODO.md b/TODO.md index bf3d136..909aa92 100644 --- a/TODO.md +++ b/TODO.md @@ -155,7 +155,7 @@ create-host \ ### Phase 4: Secrets Management with OpenBao (Vault) -**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress +**Status:** 🚧 Phases 4a, 4b, 4c (partial), & 4d Complete **Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys @@ -339,6 +339,8 @@ vault01.home.2rjus.net (10.69.13.19) #### Phase 4c: PKI Migration (Replace step-ca) +**Status:** 🚧 Partially Complete - vault01 and test host migrated, remaining hosts pending + **Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates **Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support) @@ -349,27 +351,33 @@ vault01.home.2rjus.net (10.69.13.19) - [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384) - [x] Signed intermediate with root CA - [x] Configured CRL, OCSP, and issuing certificate URLs -- [x] Enable ACME support (completed in Phase 4b) +- [x] Enable ACME support (completed in Phase 4b, fixed in Phase 4c) - [x] Enabled ACME on intermediate CA - [x] Created PKI role for `*.home.2rjus.net` - [x] Set certificate TTLs (30 day max) and allowed domains - [x] ACME directory: `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory` -- [ ] Download and distribute root CA certificate - - [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt` - - [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles` - - [ ] Deploy via auto-upgrade -- [ ] Test certificate issuance - - [ ] Issue test certificate using ACME client (lego/certbot) - - [ ] Or issue static certificate via OpenBao CLI - - [ ] Verify certificate chain and trust -- [ ] Migrate vault01's own certificate - - [ ] Issue new certificate from OpenBao PKI (self-issued) - - [ ] Replace self-signed bootstrap certificate - - [ ] Update service configuration + - [x] Fixed ACME response headers (added Replay-Nonce, Link, Location to allowed_response_headers) + - [x] Configured cluster path for ACME +- [x] Download and distribute root CA certificate + - [x] Added root CA to `system/pki/root-ca.nix` + - [x] Distributed to all hosts via system imports +- [x] Test certificate issuance + - [x] Tested ACME issuance on vaulttest01 successfully + - [x] Verified certificate chain and trust +- [x] Migrate vault01's own certificate + - [x] Created `bootstrap-vault-cert` script for initial certificate issuance via bao CLI + - [x] Issued certificate with SANs (vault01.home.2rjus.net + vault.home.2rjus.net) + - [x] Updated service to read certificates from `/var/lib/acme/vault01.home.2rjus.net/` + - [x] Configured ACME for automatic renewals - [ ] Migrate hosts from step-ca to OpenBao + - [x] Tested on vaulttest01 (non-production host) + - [ ] Standardize hostname usage across all configurations + - [ ] Use `vault.home.2rjus.net` (CNAME) consistently everywhere + - [ ] Update NixOS configurations to use CNAME instead of vault01 + - [ ] Update Terraform configurations to use CNAME + - [ ] Audit and fix mixed usage of vault01.home.2rjus.net vs vault.home.2rjus.net - [ ] Update `system/acme.nix` to use OpenBao ACME endpoint - - [ ] Change server to `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory` - - [ ] Test on one host (non-critical service) + - [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory` - [ ] Roll out to all hosts via auto-upgrade - [ ] Configure SSH CA in OpenBao (optional, future work) - [ ] Enable SSH secrets engine (`ssh/` mount) @@ -384,7 +392,39 @@ vault01.home.2rjus.net (10.69.13.19) - [ ] Archive step-ca configuration for backup - [ ] Update documentation -**Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired +**Implementation Details (2026-02-03):** + +**ACME Configuration Fix:** +The key blocker was that OpenBao's PKI mount was filtering out required ACME response headers. The solution was to add `allowed_response_headers` to the Terraform mount configuration: +```hcl +allowed_response_headers = [ + "Replay-Nonce", # Required for ACME nonce generation + "Link", # Required for ACME navigation + "Location" # Required for ACME resource location +] +``` + +**Cluster Path Configuration:** +ACME requires the cluster path to include the full API path: +```hcl +path = "${var.vault_address}/v1/${vault_mount.pki_int.path}" +aia_path = "${var.vault_address}/v1/${vault_mount.pki_int.path}" +``` + +**Bootstrap Process:** +Since vault01 needed a certificate from its own PKI (chicken-and-egg problem), we created a `bootstrap-vault-cert` script that: +1. Uses the Unix socket (no TLS) to issue a certificate via `bao` CLI +2. Places it in the ACME directory structure +3. Includes both vault01.home.2rjus.net and vault.home.2rjus.net as SANs +4. After restart, ACME manages renewals automatically + +**Files Modified:** +- `terraform/vault/pki.tf` - Added allowed_response_headers, cluster config, ACME config +- `services/vault/default.nix` - Updated cert paths, added bootstrap script, configured ACME +- `system/pki/root-ca.nix` - Added OpenBao root CA to trust store +- `hosts/vaulttest01/configuration.nix` - Overrode ACME server for testing + +**Deliverable:** ✅ vault01 and vaulttest01 using OpenBao PKI, remaining hosts still on step-ca --- diff --git a/hosts/vaulttest01/configuration.nix b/hosts/vaulttest01/configuration.nix index 76342ff..96baf1e 100644 --- a/hosts/vaulttest01/configuration.nix +++ b/hosts/vaulttest01/configuration.nix @@ -105,6 +105,17 @@ }; }; + # Test ACME certificate issuance from OpenBao PKI + # Override the global ACME server (from system/acme.nix) to use OpenBao instead of step-ca + security.acme.defaults.server = lib.mkForce "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory"; + + # Request a certificate for this host + # Using HTTP-01 challenge with standalone listener on port 80 + security.acme.certs."vaulttest01.home.2rjus.net" = { + listenHTTP = ":80"; + enableDebugLogs = true; + }; + system.stateVersion = "25.11"; # Did you read the comment? } diff --git a/services/vault/default.nix b/services/vault/default.nix index bb30d60..3439a67 100644 --- a/services/vault/default.nix +++ b/services/vault/default.nix @@ -77,8 +77,89 @@ let fi ''; }; + + bootstrapCertScript = pkgs.writeShellApplication { + name = "bootstrap-vault-cert"; + runtimeInputs = with pkgs; [ + openbao + jq + openssl + coreutils + ]; + text = '' + # Bootstrap vault01 with a proper certificate from its own PKI + # This solves the chicken-and-egg problem where ACME clients can't trust + # vault01's self-signed certificate. + + echo "=== Bootstrapping vault01 certificate ===" + + # Use Unix socket to avoid TLS issues + export BAO_ADDR='unix:///run/openbao/openbao.sock' + + # ACME certificate directory + CERT_DIR="/var/lib/acme/vault01.home.2rjus.net" + + # Issue certificate for vault01 with vault as SAN + echo "Issuing certificate for vault01.home.2rjus.net (with SAN: vault.home.2rjus.net)..." + OUTPUT=$(bao write -format=json pki_int/issue/homelab \ + common_name="vault01.home.2rjus.net" \ + alt_names="vault.home.2rjus.net" \ + ttl="720h") + + # Create ACME directory structure + echo "Creating ACME certificate directory..." + mkdir -p "$CERT_DIR" + + # Extract certificate components to temp files + echo "$OUTPUT" | jq -r '.data.certificate' > /tmp/vault01-cert.pem + echo "$OUTPUT" | jq -r '.data.private_key' > /tmp/vault01-key.pem + echo "$OUTPUT" | jq -r '.data.issuing_ca' > /tmp/vault01-ca.pem + + # Create fullchain (cert + CA) + cat /tmp/vault01-cert.pem /tmp/vault01-ca.pem > /tmp/vault01-fullchain.pem + + # Backup old certificates if they exist + if [ -f "$CERT_DIR/fullchain.pem" ]; then + echo "Backing up old certificate..." + cp "$CERT_DIR/fullchain.pem" "$CERT_DIR/fullchain.pem.backup" + cp "$CERT_DIR/key.pem" "$CERT_DIR/key.pem.backup" + fi + + # Install new certificates + echo "Installing new certificate..." + mv /tmp/vault01-fullchain.pem "$CERT_DIR/fullchain.pem" + mv /tmp/vault01-cert.pem "$CERT_DIR/cert.pem" + mv /tmp/vault01-ca.pem "$CERT_DIR/chain.pem" + mv /tmp/vault01-key.pem "$CERT_DIR/key.pem" + + # Set proper ownership and permissions (ACME-style) + chown -R acme:acme "$CERT_DIR" + chmod 750 "$CERT_DIR" + chmod 640 "$CERT_DIR"/*.pem + + echo "Certificate installed successfully!" + echo "" + echo "Certificate details:" + openssl x509 -in "$CERT_DIR/cert.pem" -noout -subject -issuer -dates + echo "" + echo "Subject Alternative Names:" + openssl x509 -in "$CERT_DIR/cert.pem" -noout -ext subjectAltName + + echo "" + echo "Now restart openbao service:" + echo " systemctl restart openbao" + echo "" + echo "After restart, verify ACME endpoint is accessible:" + echo " curl https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory" + echo "" + echo "Once working, ACME will automatically manage certificate renewals." + ''; + }; in { + # Make bootstrap script available as a command + environment.systemPackages = [ bootstrapCertScript ]; + services.openbao = { enable = true; @@ -101,8 +182,8 @@ in systemd.services.openbao.serviceConfig = { LoadCredential = [ - "key.pem:/var/lib/openbao/key.pem" - "cert.pem:/var/lib/openbao/cert.pem" + "key.pem:/var/lib/acme/vault01.home.2rjus.net/key.pem" + "cert.pem:/var/lib/acme/vault01.home.2rjus.net/fullchain.pem" ]; # TPM2-encrypted unseal key (created manually, see setup instructions) LoadCredentialEncrypted = [ @@ -110,5 +191,16 @@ in ]; # Auto-unseal on service start ExecStartPost = "${unsealScript}/bin/openbao-unseal"; + # Add openbao user to acme group to read certificates + SupplementaryGroups = [ "acme" ]; + }; + + # ACME certificate management + # Bootstrapped with bootstrap-vault-cert, now managed by ACME + security.acme.certs."vault01.home.2rjus.net" = { + server = "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory"; + listenHTTP = ":80"; + reloadServices = [ "openbao" ]; + extraDomainNames = [ "vault.home.2rjus.net" ]; }; } diff --git a/system/default.nix b/system/default.nix index 7957c30..3c59c8c 100644 --- a/system/default.nix +++ b/system/default.nix @@ -7,7 +7,7 @@ ./packages.nix ./nix.nix ./root-user.nix - ./root-ca.nix + ./pki/root-ca.nix ./sops.nix ./sshd.nix ./vault-secrets.nix diff --git a/system/root-ca.crt b/system/pki/root-ca.crt similarity index 100% rename from system/root-ca.crt rename to system/pki/root-ca.crt diff --git a/system/root-ca.nix b/system/pki/root-ca.nix similarity index 84% rename from system/root-ca.nix rename to system/pki/root-ca.nix index 5e5ff78..0397351 100644 --- a/system/root-ca.nix +++ b/system/pki/root-ca.nix @@ -4,6 +4,7 @@ certificateFiles = [ "${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" ./root-ca.crt + ./vault-root-ca.crt ]; }; } diff --git a/system/pki/vault-root-ca.crt b/system/pki/vault-root-ca.crt new file mode 100644 index 0000000..c45d391 --- /dev/null +++ b/system/pki/vault-root-ca.crt @@ -0,0 +1,14 @@ +-----BEGIN CERTIFICATE----- +MIICIjCCAaigAwIBAgIUQ/Bd/4kNvkPjQjgGLUMynIVzGeAwCgYIKoZIzj0EAwMw +QDELMAkGA1UEBhMCTk8xEDAOBgNVBAoTB0hvbWVsYWIxHzAdBgNVBAMTFmhvbWUu +MnJqdXMubmV0IFJvb3QgQ0EwHhcNMjYwMjAxMjIxODA5WhcNMzYwMTMwMjIxODM5 +WjBAMQswCQYDVQQGEwJOTzEQMA4GA1UEChMHSG9tZWxhYjEfMB0GA1UEAxMWaG9t +ZS4ycmp1cy5uZXQgUm9vdCBDQTB2MBAGByqGSM49AgEGBSuBBAAiA2IABH8xhIOl +Nd1Yb1OFhgIJQZM+OkwoFenOQiKfuQ4oPMxaF+fnXdKc77qPDVRjeDy61oGS38X3 +CjPOZAzS9kjo7FmVbzdqlYK7ut/OylF+8MJkCT8mFO1xvuzIXhufnyAD4aNjMGEw +DgYDVR0PAQH/BAQDAgEGMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFEimBeAg +3JVeF4BqdC9hMZ8MYKw2MB8GA1UdIwQYMBaAFEimBeAg3JVeF4BqdC9hMZ8MYKw2 +MAoGCCqGSM49BAMDA2gAMGUCMQCvhRElHBra/XyT93SKcG6ZzIG+K+DH3J5jm6Xr +zaGj2VtdhBRVmEKaUcjU7htgSxcCMA9qHKYFcUH72W7By763M6sy8OOiGQNDSERY +VgnNv9rLCvCef1C8G2bYh/sKGZTPGQ== +-----END CERTIFICATE----- \ No newline at end of file diff --git a/terraform/vault/pki.tf b/terraform/vault/pki.tf index 7583d41..23d47c8 100644 --- a/terraform/vault/pki.tf +++ b/terraform/vault/pki.tf @@ -62,6 +62,13 @@ resource "vault_mount" "pki_int" { description = "Intermediate CA" default_lease_ttl_seconds = 157680000 # 5 years max_lease_ttl_seconds = 157680000 # 5 years + + # Required for ACME support - allow ACME-specific response headers + allowed_response_headers = [ + "Replay-Nonce", + "Link", + "Location" + ] } resource "vault_pki_secret_backend_intermediate_cert_request" "intermediate" { @@ -139,6 +146,33 @@ resource "vault_pki_secret_backend_config_urls" "config_urls" { ] } +# Configure cluster path (required for ACME) +resource "vault_pki_secret_backend_config_cluster" "cluster" { + backend = vault_mount.pki_int.path + path = "${var.vault_address}/v1/${vault_mount.pki_int.path}" + aia_path = "${var.vault_address}/v1/${vault_mount.pki_int.path}" +} + +# Enable ACME support +resource "vault_generic_endpoint" "acme_config" { + depends_on = [ + vault_pki_secret_backend_config_cluster.cluster, + vault_pki_secret_backend_role.homelab + ] + + path = "${vault_mount.pki_int.path}/config/acme" + ignore_absent_fields = true + disable_read = true + disable_delete = true + + data_json = jsonencode({ + enabled = true + allowed_issuers = ["*"] + allowed_roles = ["*"] + default_directory_policy = "sign-verbatim" + }) +} + # ============================================================================ # Direct Certificate Issuance (Non-ACME) # ============================================================================ diff --git a/terraform/vms.tf b/terraform/vms.tf index cf15eef..de8e2af 100644 --- a/terraform/vms.tf +++ b/terraform/vms.tf @@ -50,8 +50,8 @@ locals { cpu_cores = 2 memory = 2048 disk_size = "20G" - flake_branch = "vault-bootstrap-integration" - vault_wrapped_token = "s.HwNenAYvXBsPs8uICh4CbE11" + flake_branch = "pki-migration" + vault_wrapped_token = "s.UCpQCOp7cOKDdtGGBvfRWwAt" } }