# Phase 4d: Vault Bootstrap Integration - Testing Guide This guide walks through testing the complete Vault bootstrap workflow implemented in Phase 4d. ## Prerequisites Before testing, ensure: 1. **Vault server is running**: vault01 (vault01.home.2rjus.net:8200) is accessible 2. **Vault access**: You have a Vault token with admin permissions (set `BAO_TOKEN` env var) 3. **Terraform installed**: OpenTofu is available in your PATH 4. **Git repository clean**: All Phase 4d changes are committed to a branch ## Test Scenario: Create vaulttest01 ### Step 1: Create Test Host Configuration Run the create-host tool with Vault integration: ```bash # Ensure you have Vault token export BAO_TOKEN="your-vault-admin-token" # Create test host nix run .#create-host -- \ --hostname vaulttest01 \ --ip 10.69.13.150/24 \ --cpu 2 \ --memory 2048 \ --disk 20G # If you need to regenerate (e.g., wrapped token expired): nix run .#create-host -- \ --hostname vaulttest01 \ --ip 10.69.13.150/24 \ --force ``` **What this does:** - Creates `hosts/vaulttest01/` configuration - Updates `flake.nix` with new host - Updates `terraform/vms.tf` with VM definition - Generates `terraform/vault/hosts-generated.tf` with AppRole and policy - Creates a wrapped token (24h TTL, single-use) - Adds wrapped token to VM configuration **Expected output:** ``` ✓ All validations passed ✓ Created hosts/vaulttest01/default.nix ✓ Created hosts/vaulttest01/configuration.nix ✓ Updated flake.nix ✓ Updated terraform/vms.tf Configuring Vault integration... ✓ Updated terraform/vault/hosts-generated.tf Applying Vault Terraform configuration... ✓ Terraform applied successfully Reading AppRole credentials for vaulttest01... ✓ Retrieved role_id ✓ Generated secret_id Creating wrapped token (24h TTL, single-use)... ✓ Created wrapped token: hvs.CAESIBw... ⚠️ Token expires in 24 hours ⚠️ Token can only be used once ✓ Added wrapped token to terraform/vms.tf ✓ Host configuration generated successfully! ``` ### Step 2: Add Test Service Configuration Edit `hosts/vaulttest01/configuration.nix` to enable Vault and add a test service: ```nix { config, pkgs, lib, ... }: { imports = [ ../../system ../../common/vm ]; # Enable Vault secrets management vault.enable = true; # Define a test secret vault.secrets.test-service = { secretPath = "hosts/vaulttest01/test-service"; restartTrigger = true; restartInterval = "daily"; services = [ "vault-test" ]; }; # Create a test service that uses the secret systemd.services.vault-test = { description = "Test Vault secret fetching"; wantedBy = [ "multi-user.target" ]; after = [ "vault-secret-test-service.service" ]; serviceConfig = { Type = "oneshot"; RemainAfterExit = true; ExecStart = pkgs.writeShellScript "vault-test" '' echo "=== Vault Secret Test ===" echo "Secret path: hosts/vaulttest01/test-service" if [ -f /run/secrets/test-service/password ]; then echo "✓ Password file exists" echo "Password length: $(wc -c < /run/secrets/test-service/password)" else echo "✗ Password file missing!" exit 1 fi if [ -d /var/lib/vault/cache/test-service ]; then echo "✓ Cache directory exists" else echo "✗ Cache directory missing!" exit 1 fi echo "Test successful!" ''; StandardOutput = "journal+console"; }; }; # Rest of configuration... networking.hostName = "vaulttest01"; networking.domain = "home.2rjus.net"; systemd.network.networks."10-lan" = { matchConfig.Name = "ens18"; address = [ "10.69.13.150/24" ]; gateway = [ "10.69.13.1" ]; dns = [ "10.69.13.5" "10.69.13.6" ]; domains = [ "home.2rjus.net" ]; }; system.stateVersion = "25.11"; } ``` ### Step 3: Create Test Secrets in Vault Add test secrets to Vault using Terraform: Edit `terraform/vault/secrets.tf`: ```hcl locals { secrets = { # ... existing secrets ... # Test secret for vaulttest01 "hosts/vaulttest01/test-service" = { auto_generate = true password_length = 24 } } } ``` Apply the Vault configuration: ```bash cd terraform/vault tofu apply ``` **Verify the secret exists:** ```bash export VAULT_ADDR=https://vault01.home.2rjus.net:8200 export VAULT_SKIP_VERIFY=1 vault kv get secret/hosts/vaulttest01/test-service ``` ### Step 4: Deploy the VM **Important**: Deploy within 24 hours of creating the host (wrapped token TTL) ```bash cd terraform tofu plan # Review changes tofu apply # Deploy VM ``` ### Step 5: Monitor Bootstrap Process SSH into the VM and monitor the bootstrap: ```bash # Watch bootstrap logs ssh root@vaulttest01 journalctl -fu nixos-bootstrap.service # Expected log output: # Starting NixOS bootstrap for host: vaulttest01 # Network connectivity confirmed # Unwrapping Vault token to get AppRole credentials... # Vault credentials unwrapped and stored successfully # Fetching and building NixOS configuration from flake... # Successfully built configuration for vaulttest01 # Rebooting into new configuration... ``` ### Step 6: Verify Vault Integration After the VM reboots, verify the integration: ```bash ssh root@vaulttest01 # Check AppRole credentials were stored ls -la /var/lib/vault/approle/ # Expected: role-id and secret-id files with 600 permissions cat /var/lib/vault/approle/role-id # Should show a UUID # Check vault-secret service ran successfully systemctl status vault-secret-test-service.service # Should be active (exited) journalctl -u vault-secret-test-service.service # Should show successful secret fetch: # [vault-fetch] Authenticating to Vault at https://vault01.home.2rjus.net:8200 # [vault-fetch] Successfully authenticated to Vault # [vault-fetch] Fetching secret from path: hosts/vaulttest01/test-service # [vault-fetch] Writing secrets to /run/secrets/test-service # [vault-fetch] - Wrote secret key: password # [vault-fetch] Successfully fetched and cached secrets # Check test service passed systemctl status vault-test.service journalctl -u vault-test.service # Should show: # === Vault Secret Test === # ✓ Password file exists # ✓ Cache directory exists # Test successful! # Verify secret files exist ls -la /run/secrets/test-service/ # Should show password file with 400 permissions # Verify cache exists ls -la /var/lib/vault/cache/test-service/ # Should show cached password file ``` ## Test Scenarios ### Scenario 1: Fresh Deployment ✅ **Expected**: All secrets fetched successfully from Vault ### Scenario 2: Service Restart ```bash systemctl restart vault-test.service ``` ✅ **Expected**: Secrets re-fetched from Vault, service starts successfully ### Scenario 3: Vault Unreachable ```bash # On vault01, stop Vault temporarily ssh root@vault01 systemctl stop openbao # On vaulttest01, restart test service ssh root@vaulttest01 systemctl restart vault-test.service journalctl -u vault-secret-test-service.service | tail -20 ``` ✅ **Expected**: - Warning logged: "Using cached secrets from /var/lib/vault/cache/test-service" - Service starts successfully using cached secrets ```bash # Restore Vault ssh root@vault01 systemctl start openbao ``` ### Scenario 4: Secret Rotation ```bash # Update secret in Vault vault kv put secret/hosts/vaulttest01/test-service password="new-secret-value" # On vaulttest01, trigger rotation ssh root@vaulttest01 systemctl restart vault-secret-test-service.service # Verify new secret cat /run/secrets/test-service/password # Should show new value ``` ✅ **Expected**: New secret fetched and cached ### Scenario 5: Expired Wrapped Token ```bash # Wait 24+ hours after create-host, then try to deploy cd terraform tofu apply ``` ❌ **Expected**: Bootstrap fails with message about expired token **Fix (Option 1 - Regenerate token only):** ```bash # Only regenerates the wrapped token, preserves all other configuration nix run .#create-host -- --hostname vaulttest01 --regenerate-token cd terraform tofu apply ``` **Fix (Option 2 - Full regeneration with --force):** ```bash # Overwrites entire host configuration (including any manual changes) nix run .#create-host -- --hostname vaulttest01 --force cd terraform tofu apply ``` **Recommendation**: Use `--regenerate-token` to avoid losing manual configuration changes. ### Scenario 6: Already-Used Wrapped Token Try to deploy the same VM twice without regenerating token. ❌ **Expected**: Second bootstrap fails with "token already used" message ## Cleanup After testing: ```bash # Destroy test VM cd terraform tofu destroy -target=proxmox_vm_qemu.vm[\"vaulttest01\"] # Remove test secrets from Vault vault kv delete secret/hosts/vaulttest01/test-service # Remove host configuration (optional) git rm -r hosts/vaulttest01 # Edit flake.nix to remove nixosConfigurations.vaulttest01 # Edit terraform/vms.tf to remove vaulttest01 # Edit terraform/vault/hosts-generated.tf to remove vaulttest01 ``` ## Success Criteria Checklist Phase 4d is considered successful when: - [x] create-host generates Vault configuration automatically - [x] New hosts receive AppRole credentials via cloud-init - [x] Bootstrap stores credentials in /var/lib/vault/approle/ - [x] Services can fetch secrets using vault.secrets option - [x] Secrets extracted to individual files in /run/secrets/ - [x] Cached secrets work when Vault is unreachable - [x] Periodic restart timers work for secret rotation - [x] Critical services excluded from auto-restart - [x] Test host deploys and verifies working - [x] sops-nix continues to work for existing services ## Troubleshooting ### Bootstrap fails with "Failed to unwrap Vault token" **Possible causes:** - Token already used (wrapped tokens are single-use) - Token expired (24h TTL) - Invalid token - Vault unreachable **Solution:** ```bash # Regenerate token nix run .#create-host -- --hostname vaulttest01 --force cd terraform && tofu apply ``` ### Secret fetch fails with authentication error **Check:** ```bash # Verify AppRole exists vault read auth/approle/role/vaulttest01 # Verify policy exists vault policy read host-vaulttest01 # Test authentication manually ROLE_ID=$(cat /var/lib/vault/approle/role-id) SECRET_ID=$(cat /var/lib/vault/approle/secret-id) vault write auth/approle/login role_id="$ROLE_ID" secret_id="$SECRET_ID" ``` ### Cache not working **Check:** ```bash # Verify cache directory exists and has files ls -la /var/lib/vault/cache/test-service/ # Check permissions stat /var/lib/vault/cache/test-service/password # Should be 600 (rw-------) ``` ## Next Steps After successful testing: 1. Gradually migrate existing services from sops-nix to Vault 2. Consider implementing secret watcher for faster rotation (future enhancement) 3. Phase 4c: Migrate from step-ca to OpenBao PKI 4. Eventually deprecate and remove sops-nix