420 lines
11 KiB
Markdown
420 lines
11 KiB
Markdown
# Phase 4d: Vault Bootstrap Integration - Testing Guide
|
|
|
|
This guide walks through testing the complete Vault bootstrap workflow implemented in Phase 4d.
|
|
|
|
## Prerequisites
|
|
|
|
Before testing, ensure:
|
|
|
|
1. **Vault server is running**: vault01 (vault01.home.2rjus.net:8200) is accessible
|
|
2. **Vault access**: You have a Vault token with admin permissions (set `BAO_TOKEN` env var)
|
|
3. **Terraform installed**: OpenTofu is available in your PATH
|
|
4. **Git repository clean**: All Phase 4d changes are committed to a branch
|
|
|
|
## Test Scenario: Create vaulttest01
|
|
|
|
### Step 1: Create Test Host Configuration
|
|
|
|
Run the create-host tool with Vault integration:
|
|
|
|
```bash
|
|
# Ensure you have Vault token
|
|
export BAO_TOKEN="your-vault-admin-token"
|
|
|
|
# Create test host
|
|
nix run .#create-host -- \
|
|
--hostname vaulttest01 \
|
|
--ip 10.69.13.150/24 \
|
|
--cpu 2 \
|
|
--memory 2048 \
|
|
--disk 20G
|
|
|
|
# If you need to regenerate (e.g., wrapped token expired):
|
|
nix run .#create-host -- \
|
|
--hostname vaulttest01 \
|
|
--ip 10.69.13.150/24 \
|
|
--force
|
|
```
|
|
|
|
**What this does:**
|
|
- Creates `hosts/vaulttest01/` configuration
|
|
- Updates `flake.nix` with new host
|
|
- Updates `terraform/vms.tf` with VM definition
|
|
- Generates `terraform/vault/hosts-generated.tf` with AppRole and policy
|
|
- Creates a wrapped token (24h TTL, single-use)
|
|
- Adds wrapped token to VM configuration
|
|
|
|
**Expected output:**
|
|
```
|
|
✓ All validations passed
|
|
✓ Created hosts/vaulttest01/default.nix
|
|
✓ Created hosts/vaulttest01/configuration.nix
|
|
✓ Updated flake.nix
|
|
✓ Updated terraform/vms.tf
|
|
|
|
Configuring Vault integration...
|
|
✓ Updated terraform/vault/hosts-generated.tf
|
|
Applying Vault Terraform configuration...
|
|
✓ Terraform applied successfully
|
|
Reading AppRole credentials for vaulttest01...
|
|
✓ Retrieved role_id
|
|
✓ Generated secret_id
|
|
Creating wrapped token (24h TTL, single-use)...
|
|
✓ Created wrapped token: hvs.CAESIBw...
|
|
⚠️ Token expires in 24 hours
|
|
⚠️ Token can only be used once
|
|
✓ Added wrapped token to terraform/vms.tf
|
|
|
|
✓ Host configuration generated successfully!
|
|
```
|
|
|
|
### Step 2: Add Test Service Configuration
|
|
|
|
Edit `hosts/vaulttest01/configuration.nix` to enable Vault and add a test service:
|
|
|
|
```nix
|
|
{ config, pkgs, lib, ... }:
|
|
{
|
|
imports = [
|
|
../../system
|
|
../../common/vm
|
|
];
|
|
|
|
# Enable Vault secrets management
|
|
vault.enable = true;
|
|
|
|
# Define a test secret
|
|
vault.secrets.test-service = {
|
|
secretPath = "hosts/vaulttest01/test-service";
|
|
restartTrigger = true;
|
|
restartInterval = "daily";
|
|
services = [ "vault-test" ];
|
|
};
|
|
|
|
# Create a test service that uses the secret
|
|
systemd.services.vault-test = {
|
|
description = "Test Vault secret fetching";
|
|
wantedBy = [ "multi-user.target" ];
|
|
after = [ "vault-secret-test-service.service" ];
|
|
|
|
serviceConfig = {
|
|
Type = "oneshot";
|
|
RemainAfterExit = true;
|
|
|
|
ExecStart = pkgs.writeShellScript "vault-test" ''
|
|
echo "=== Vault Secret Test ==="
|
|
echo "Secret path: hosts/vaulttest01/test-service"
|
|
|
|
if [ -f /run/secrets/test-service/password ]; then
|
|
echo "✓ Password file exists"
|
|
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
|
|
else
|
|
echo "✗ Password file missing!"
|
|
exit 1
|
|
fi
|
|
|
|
if [ -d /var/lib/vault/cache/test-service ]; then
|
|
echo "✓ Cache directory exists"
|
|
else
|
|
echo "✗ Cache directory missing!"
|
|
exit 1
|
|
fi
|
|
|
|
echo "Test successful!"
|
|
'';
|
|
|
|
StandardOutput = "journal+console";
|
|
};
|
|
};
|
|
|
|
# Rest of configuration...
|
|
networking.hostName = "vaulttest01";
|
|
networking.domain = "home.2rjus.net";
|
|
|
|
systemd.network.networks."10-lan" = {
|
|
matchConfig.Name = "ens18";
|
|
address = [ "10.69.13.150/24" ];
|
|
gateway = [ "10.69.13.1" ];
|
|
dns = [ "10.69.13.5" "10.69.13.6" ];
|
|
domains = [ "home.2rjus.net" ];
|
|
};
|
|
|
|
system.stateVersion = "25.11";
|
|
}
|
|
```
|
|
|
|
### Step 3: Create Test Secrets in Vault
|
|
|
|
Add test secrets to Vault using Terraform:
|
|
|
|
Edit `terraform/vault/secrets.tf`:
|
|
|
|
```hcl
|
|
locals {
|
|
secrets = {
|
|
# ... existing secrets ...
|
|
|
|
# Test secret for vaulttest01
|
|
"hosts/vaulttest01/test-service" = {
|
|
auto_generate = true
|
|
password_length = 24
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Apply the Vault configuration:
|
|
|
|
```bash
|
|
cd terraform/vault
|
|
tofu apply
|
|
```
|
|
|
|
**Verify the secret exists:**
|
|
```bash
|
|
export VAULT_ADDR=https://vault01.home.2rjus.net:8200
|
|
export VAULT_SKIP_VERIFY=1
|
|
|
|
vault kv get secret/hosts/vaulttest01/test-service
|
|
```
|
|
|
|
### Step 4: Deploy the VM
|
|
|
|
**Important**: Deploy within 24 hours of creating the host (wrapped token TTL)
|
|
|
|
```bash
|
|
cd terraform
|
|
tofu plan # Review changes
|
|
tofu apply # Deploy VM
|
|
```
|
|
|
|
### Step 5: Monitor Bootstrap Process
|
|
|
|
SSH into the VM and monitor the bootstrap:
|
|
|
|
```bash
|
|
# Watch bootstrap logs
|
|
ssh root@vaulttest01
|
|
journalctl -fu nixos-bootstrap.service
|
|
|
|
# Expected log output:
|
|
# Starting NixOS bootstrap for host: vaulttest01
|
|
# Network connectivity confirmed
|
|
# Unwrapping Vault token to get AppRole credentials...
|
|
# Vault credentials unwrapped and stored successfully
|
|
# Fetching and building NixOS configuration from flake...
|
|
# Successfully built configuration for vaulttest01
|
|
# Rebooting into new configuration...
|
|
```
|
|
|
|
### Step 6: Verify Vault Integration
|
|
|
|
After the VM reboots, verify the integration:
|
|
|
|
```bash
|
|
ssh root@vaulttest01
|
|
|
|
# Check AppRole credentials were stored
|
|
ls -la /var/lib/vault/approle/
|
|
# Expected: role-id and secret-id files with 600 permissions
|
|
|
|
cat /var/lib/vault/approle/role-id
|
|
# Should show a UUID
|
|
|
|
# Check vault-secret service ran successfully
|
|
systemctl status vault-secret-test-service.service
|
|
# Should be active (exited)
|
|
|
|
journalctl -u vault-secret-test-service.service
|
|
# Should show successful secret fetch:
|
|
# [vault-fetch] Authenticating to Vault at https://vault01.home.2rjus.net:8200
|
|
# [vault-fetch] Successfully authenticated to Vault
|
|
# [vault-fetch] Fetching secret from path: hosts/vaulttest01/test-service
|
|
# [vault-fetch] Writing secrets to /run/secrets/test-service
|
|
# [vault-fetch] - Wrote secret key: password
|
|
# [vault-fetch] Successfully fetched and cached secrets
|
|
|
|
# Check test service passed
|
|
systemctl status vault-test.service
|
|
journalctl -u vault-test.service
|
|
# Should show:
|
|
# === Vault Secret Test ===
|
|
# ✓ Password file exists
|
|
# ✓ Cache directory exists
|
|
# Test successful!
|
|
|
|
# Verify secret files exist
|
|
ls -la /run/secrets/test-service/
|
|
# Should show password file with 400 permissions
|
|
|
|
# Verify cache exists
|
|
ls -la /var/lib/vault/cache/test-service/
|
|
# Should show cached password file
|
|
```
|
|
|
|
## Test Scenarios
|
|
|
|
### Scenario 1: Fresh Deployment
|
|
✅ **Expected**: All secrets fetched successfully from Vault
|
|
|
|
### Scenario 2: Service Restart
|
|
```bash
|
|
systemctl restart vault-test.service
|
|
```
|
|
✅ **Expected**: Secrets re-fetched from Vault, service starts successfully
|
|
|
|
### Scenario 3: Vault Unreachable
|
|
```bash
|
|
# On vault01, stop Vault temporarily
|
|
ssh root@vault01
|
|
systemctl stop openbao
|
|
|
|
# On vaulttest01, restart test service
|
|
ssh root@vaulttest01
|
|
systemctl restart vault-test.service
|
|
journalctl -u vault-secret-test-service.service | tail -20
|
|
```
|
|
✅ **Expected**:
|
|
- Warning logged: "Using cached secrets from /var/lib/vault/cache/test-service"
|
|
- Service starts successfully using cached secrets
|
|
|
|
```bash
|
|
# Restore Vault
|
|
ssh root@vault01
|
|
systemctl start openbao
|
|
```
|
|
|
|
### Scenario 4: Secret Rotation
|
|
```bash
|
|
# Update secret in Vault
|
|
vault kv put secret/hosts/vaulttest01/test-service password="new-secret-value"
|
|
|
|
# On vaulttest01, trigger rotation
|
|
ssh root@vaulttest01
|
|
systemctl restart vault-secret-test-service.service
|
|
|
|
# Verify new secret
|
|
cat /run/secrets/test-service/password
|
|
# Should show new value
|
|
```
|
|
✅ **Expected**: New secret fetched and cached
|
|
|
|
### Scenario 5: Expired Wrapped Token
|
|
```bash
|
|
# Wait 24+ hours after create-host, then try to deploy
|
|
cd terraform
|
|
tofu apply
|
|
```
|
|
❌ **Expected**: Bootstrap fails with message about expired token
|
|
|
|
**Fix (Option 1 - Regenerate token only):**
|
|
```bash
|
|
# Only regenerates the wrapped token, preserves all other configuration
|
|
nix run .#create-host -- --hostname vaulttest01 --regenerate-token
|
|
cd terraform
|
|
tofu apply
|
|
```
|
|
|
|
**Fix (Option 2 - Full regeneration with --force):**
|
|
```bash
|
|
# Overwrites entire host configuration (including any manual changes)
|
|
nix run .#create-host -- --hostname vaulttest01 --force
|
|
cd terraform
|
|
tofu apply
|
|
```
|
|
|
|
**Recommendation**: Use `--regenerate-token` to avoid losing manual configuration changes.
|
|
|
|
### Scenario 6: Already-Used Wrapped Token
|
|
Try to deploy the same VM twice without regenerating token.
|
|
|
|
❌ **Expected**: Second bootstrap fails with "token already used" message
|
|
|
|
## Cleanup
|
|
|
|
After testing:
|
|
|
|
```bash
|
|
# Destroy test VM
|
|
cd terraform
|
|
tofu destroy -target=proxmox_vm_qemu.vm[\"vaulttest01\"]
|
|
|
|
# Remove test secrets from Vault
|
|
vault kv delete secret/hosts/vaulttest01/test-service
|
|
|
|
# Remove host configuration (optional)
|
|
git rm -r hosts/vaulttest01
|
|
# Edit flake.nix to remove nixosConfigurations.vaulttest01
|
|
# Edit terraform/vms.tf to remove vaulttest01
|
|
# Edit terraform/vault/hosts-generated.tf to remove vaulttest01
|
|
```
|
|
|
|
## Success Criteria Checklist
|
|
|
|
Phase 4d is considered successful when:
|
|
|
|
- [x] create-host generates Vault configuration automatically
|
|
- [x] New hosts receive AppRole credentials via cloud-init
|
|
- [x] Bootstrap stores credentials in /var/lib/vault/approle/
|
|
- [x] Services can fetch secrets using vault.secrets option
|
|
- [x] Secrets extracted to individual files in /run/secrets/
|
|
- [x] Cached secrets work when Vault is unreachable
|
|
- [x] Periodic restart timers work for secret rotation
|
|
- [x] Critical services excluded from auto-restart
|
|
- [x] Test host deploys and verifies working
|
|
- [x] sops-nix continues to work for existing services
|
|
|
|
## Troubleshooting
|
|
|
|
### Bootstrap fails with "Failed to unwrap Vault token"
|
|
|
|
**Possible causes:**
|
|
- Token already used (wrapped tokens are single-use)
|
|
- Token expired (24h TTL)
|
|
- Invalid token
|
|
- Vault unreachable
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Regenerate token
|
|
nix run .#create-host -- --hostname vaulttest01 --force
|
|
cd terraform && tofu apply
|
|
```
|
|
|
|
### Secret fetch fails with authentication error
|
|
|
|
**Check:**
|
|
```bash
|
|
# Verify AppRole exists
|
|
vault read auth/approle/role/vaulttest01
|
|
|
|
# Verify policy exists
|
|
vault policy read host-vaulttest01
|
|
|
|
# Test authentication manually
|
|
ROLE_ID=$(cat /var/lib/vault/approle/role-id)
|
|
SECRET_ID=$(cat /var/lib/vault/approle/secret-id)
|
|
vault write auth/approle/login role_id="$ROLE_ID" secret_id="$SECRET_ID"
|
|
```
|
|
|
|
### Cache not working
|
|
|
|
**Check:**
|
|
```bash
|
|
# Verify cache directory exists and has files
|
|
ls -la /var/lib/vault/cache/test-service/
|
|
|
|
# Check permissions
|
|
stat /var/lib/vault/cache/test-service/password
|
|
# Should be 600 (rw-------)
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
After successful testing:
|
|
|
|
1. Gradually migrate existing services from sops-nix to Vault
|
|
2. Consider implementing secret watcher for faster rotation (future enhancement)
|
|
3. Phase 4c: Migrate from step-ca to OpenBao PKI
|
|
4. Eventually deprecate and remove sops-nix
|