vault: implement bootstrap integration
This commit is contained in:
38
CLAUDE.md
38
CLAUDE.md
@@ -21,6 +21,16 @@ nixos-rebuild build --flake .#<hostname>
|
|||||||
nix build .#nixosConfigurations.<hostname>.config.system.build.toplevel
|
nix build .#nixosConfigurations.<hostname>.config.system.build.toplevel
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Important:** Do NOT pipe `nix build` commands to other commands like `tail` or `head`. Piping can hide errors and make builds appear successful when they actually failed. Always run `nix build` without piping to see the full output.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# BAD - hides errors
|
||||||
|
nix build .#create-host 2>&1 | tail -20
|
||||||
|
|
||||||
|
# GOOD - shows all output and errors
|
||||||
|
nix build .#create-host
|
||||||
|
```
|
||||||
|
|
||||||
### Deployment
|
### Deployment
|
||||||
|
|
||||||
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
||||||
@@ -203,6 +213,34 @@ Example VM deployment includes:
|
|||||||
|
|
||||||
OpenTofu outputs the VM's IP address after deployment for easy SSH access.
|
OpenTofu outputs the VM's IP address after deployment for easy SSH access.
|
||||||
|
|
||||||
|
#### Template Rebuilding and Terraform State
|
||||||
|
|
||||||
|
When the Proxmox template is rebuilt (via `build-and-deploy-template.yml`), the template name may change. This would normally cause Terraform to want to recreate all existing VMs, but that's unnecessary since VMs are independent once cloned.
|
||||||
|
|
||||||
|
**Solution**: The `terraform/vms.tf` file includes a lifecycle rule to ignore certain attributes that don't need management:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
lifecycle {
|
||||||
|
ignore_changes = [
|
||||||
|
clone, # Template name can change without recreating VMs
|
||||||
|
startup_shutdown, # Proxmox sets defaults (-1) that we don't need to manage
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This means:
|
||||||
|
- **clone**: Existing VMs are not affected by template name changes; only new VMs use the updated template
|
||||||
|
- **startup_shutdown**: Proxmox sets default startup order/delay values (-1) that Terraform would otherwise try to remove
|
||||||
|
- You can safely update `default_template_name` in `terraform/variables.tf` without recreating VMs
|
||||||
|
- `tofu plan` won't show spurious changes for Proxmox-managed defaults
|
||||||
|
|
||||||
|
**When rebuilding the template:**
|
||||||
|
1. Run `nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml`
|
||||||
|
2. Update `default_template_name` in `terraform/variables.tf` if the name changed
|
||||||
|
3. Run `tofu plan` - should show no VM recreations (only template name in state)
|
||||||
|
4. Run `tofu apply` - updates state without touching existing VMs
|
||||||
|
5. New VMs created after this point will use the new template
|
||||||
|
|
||||||
### Adding a New Host
|
### Adding a New Host
|
||||||
|
|
||||||
1. Create `/hosts/<hostname>/` directory
|
1. Create `/hosts/<hostname>/` directory
|
||||||
|
|||||||
138
TODO.md
138
TODO.md
@@ -185,7 +185,7 @@ create-host \
|
|||||||
|
|
||||||
**Current Architecture:**
|
**Current Architecture:**
|
||||||
```
|
```
|
||||||
vault.home.2rjus.net (10.69.13.19)
|
vault01.home.2rjus.net (10.69.13.19)
|
||||||
├─ KV Secrets Engine (ready to replace sops-nix)
|
├─ KV Secrets Engine (ready to replace sops-nix)
|
||||||
│ ├─ secret/hosts/{hostname}/*
|
│ ├─ secret/hosts/{hostname}/*
|
||||||
│ ├─ secret/services/{service}/*
|
│ ├─ secret/services/{service}/*
|
||||||
@@ -197,18 +197,18 @@ vault.home.2rjus.net (10.69.13.19)
|
|||||||
├─ SSH CA Engine (TODO: Phase 4c)
|
├─ SSH CA Engine (TODO: Phase 4c)
|
||||||
└─ AppRole Auth (per-host authentication configured)
|
└─ AppRole Auth (per-host authentication configured)
|
||||||
↓
|
↓
|
||||||
[Phase 4d] New hosts authenticate on first boot
|
[✅ Phase 4d] New hosts authenticate on first boot
|
||||||
[Phase 4d] Fetch secrets via Vault API
|
[✅ Phase 4d] Fetch secrets via Vault API
|
||||||
No manual key distribution needed
|
No manual key distribution needed
|
||||||
```
|
```
|
||||||
|
|
||||||
**Completed:**
|
**Completed:**
|
||||||
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
|
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
|
||||||
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
|
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
|
||||||
|
- ✅ Phase 4d: Bootstrap integration for automated secrets access
|
||||||
|
|
||||||
**Next Steps:**
|
**Next Steps:**
|
||||||
- Phase 4c: Migrate from step-ca to OpenBao PKI
|
- Phase 4c: Migrate from step-ca to OpenBao PKI
|
||||||
- Phase 4d: Bootstrap integration for automated secrets access
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -243,7 +243,7 @@ vault.home.2rjus.net (10.69.13.19)
|
|||||||
- [x] File storage backend
|
- [x] File storage backend
|
||||||
- [x] Self-signed TLS certificates via LoadCredential
|
- [x] Self-signed TLS certificates via LoadCredential
|
||||||
- [x] Deploy to infrastructure
|
- [x] Deploy to infrastructure
|
||||||
- [x] DNS entry added for vault.home.2rjus.net
|
- [x] DNS entry added for vault01.home.2rjus.net
|
||||||
- [x] VM deployed via terraform
|
- [x] VM deployed via terraform
|
||||||
- [x] Verified OpenBao running and auto-unsealing
|
- [x] Verified OpenBao running and auto-unsealing
|
||||||
|
|
||||||
@@ -353,7 +353,7 @@ vault.home.2rjus.net (10.69.13.19)
|
|||||||
- [x] Enabled ACME on intermediate CA
|
- [x] Enabled ACME on intermediate CA
|
||||||
- [x] Created PKI role for `*.home.2rjus.net`
|
- [x] Created PKI role for `*.home.2rjus.net`
|
||||||
- [x] Set certificate TTLs (30 day max) and allowed domains
|
- [x] Set certificate TTLs (30 day max) and allowed domains
|
||||||
- [x] ACME directory: `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
- [x] ACME directory: `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||||
- [ ] Download and distribute root CA certificate
|
- [ ] Download and distribute root CA certificate
|
||||||
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
|
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
|
||||||
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
|
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
|
||||||
@@ -368,7 +368,7 @@ vault.home.2rjus.net (10.69.13.19)
|
|||||||
- [ ] Update service configuration
|
- [ ] Update service configuration
|
||||||
- [ ] Migrate hosts from step-ca to OpenBao
|
- [ ] Migrate hosts from step-ca to OpenBao
|
||||||
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
|
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
|
||||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
- [ ] Change server to `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||||
- [ ] Test on one host (non-critical service)
|
- [ ] Test on one host (non-critical service)
|
||||||
- [ ] Roll out to all hosts via auto-upgrade
|
- [ ] Roll out to all hosts via auto-upgrade
|
||||||
- [ ] Configure SSH CA in OpenBao (optional, future work)
|
- [ ] Configure SSH CA in OpenBao (optional, future work)
|
||||||
@@ -388,55 +388,99 @@ vault.home.2rjus.net (10.69.13.19)
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### Phase 4d: Bootstrap Integration
|
#### Phase 4d: Bootstrap Integration ✅ COMPLETED (2026-02-02)
|
||||||
|
|
||||||
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
|
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
|
||||||
|
|
||||||
**Tasks:**
|
**Tasks:**
|
||||||
- [ ] Update create-host tool
|
- [x] Update create-host tool
|
||||||
- [ ] Generate AppRole role_id + secret_id for new host
|
- [x] Generate wrapped token (24h TTL, single-use) for new host
|
||||||
- [ ] Or create wrapped token for one-time bootstrap
|
- [x] Add host-specific policy to Vault (via terraform/vault/hosts-generated.tf)
|
||||||
- [ ] Add host-specific policy to Vault (via terraform)
|
- [x] Store wrapped token in terraform/vms.tf for cloud-init injection
|
||||||
- [ ] Store bootstrap credentials for cloud-init injection
|
- [x] Add `--regenerate-token` flag to regenerate only the token without overwriting config
|
||||||
- [ ] Update template2 for Vault authentication
|
- [x] Update template2 for Vault authentication
|
||||||
- [ ] Create Vault authentication module
|
- [x] Reads wrapped token from cloud-init (/run/cloud-init-env)
|
||||||
- [ ] Reads bootstrap credentials from cloud-init
|
- [x] Unwraps token to get role_id + secret_id
|
||||||
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
|
- [x] Stores AppRole credentials in /var/lib/vault/approle/ (persistent)
|
||||||
- [ ] Stores role_id + secret_id locally for services to use
|
- [x] Graceful fallback if Vault unavailable during bootstrap
|
||||||
- [ ] Create NixOS Vault secrets module
|
- [x] Create NixOS Vault secrets module (system/vault-secrets.nix)
|
||||||
- [ ] Replacement for sops.secrets
|
- [x] Runtime secret fetching (services fetch on start, not at nixos-rebuild time)
|
||||||
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
|
- [x] Secrets cached in /var/lib/vault/cache/ for fallback when Vault unreachable
|
||||||
- [ ] Or runtime secret fetching for services
|
- [x] Secrets written to /run/secrets/ (tmpfs, cleared on reboot)
|
||||||
- [ ] Handle Vault token renewal
|
- [x] Fresh authentication per service start (no token renewal needed)
|
||||||
- [ ] Update bootstrap service
|
- [x] Optional periodic rotation with systemd timers
|
||||||
- [ ] After authenticating to Vault, fetch any bootstrap secrets
|
- [x] Critical service protection (no auto-restart for DNS, CA, Vault itself)
|
||||||
- [ ] Run nixos-rebuild with host configuration
|
- [x] Create vault-fetch helper script
|
||||||
- [ ] Services automatically fetch their secrets from Vault
|
- [x] Standalone tool for fetching secrets from Vault
|
||||||
- [ ] Update terraform cloud-init
|
- [x] Authenticates using AppRole credentials
|
||||||
- [ ] Inject Vault address and bootstrap credentials
|
- [x] Writes individual files per secret key
|
||||||
- [ ] Pass via cloud-init user-data or write_files
|
- [x] Handles caching and fallback logic
|
||||||
- [ ] Credentials scoped to single use or short TTL
|
- [x] Update bootstrap service (hosts/template2/bootstrap.nix)
|
||||||
- [ ] Test complete flow
|
- [x] Unwraps Vault token on first boot
|
||||||
- [ ] Run create-host to generate new host config
|
- [x] Stores persistent AppRole credentials
|
||||||
- [ ] Deploy with terraform
|
- [x] Continues with nixos-rebuild
|
||||||
- [ ] Verify host bootstraps and authenticates to Vault
|
- [x] Services fetch secrets when they start
|
||||||
- [ ] Verify services can fetch secrets
|
- [x] Update terraform cloud-init (terraform/cloud-init.tf)
|
||||||
- [ ] Confirm no manual steps required
|
- [x] Inject VAULT_ADDR and VAULT_WRAPPED_TOKEN via write_files
|
||||||
|
- [x] Write to /run/cloud-init-env (tmpfs, cleaned on reboot)
|
||||||
|
- [x] Fixed YAML indentation issues (write_files at top level)
|
||||||
|
- [x] Support flake_branch alongside vault credentials
|
||||||
|
- [x] Test complete flow
|
||||||
|
- [x] Created vaulttest01 test host
|
||||||
|
- [x] Verified bootstrap with Vault integration
|
||||||
|
- [x] Verified service secret fetching
|
||||||
|
- [x] Tested cache fallback when Vault unreachable
|
||||||
|
- [x] Tested wrapped token single-use (second bootstrap fails as expected)
|
||||||
|
- [x] Confirmed zero manual steps required
|
||||||
|
|
||||||
**Bootstrap flow:**
|
**Implementation Details:**
|
||||||
|
|
||||||
|
**Wrapped Token Security:**
|
||||||
|
- Single-use tokens prevent reuse if leaked
|
||||||
|
- 24h TTL limits exposure window
|
||||||
|
- Safe to commit to git (expired/used tokens useless)
|
||||||
|
- Regenerate with `create-host --hostname X --regenerate-token`
|
||||||
|
|
||||||
|
**Secret Fetching:**
|
||||||
|
- Runtime (not build-time) keeps secrets out of Nix store
|
||||||
|
- Cache fallback enables service availability when Vault down
|
||||||
|
- Fresh authentication per service start (no renewal complexity)
|
||||||
|
- Individual files per secret key for easy consumption
|
||||||
|
|
||||||
|
**Bootstrap Flow:**
|
||||||
```
|
```
|
||||||
1. terraform apply (deploys VM with cloud-init)
|
1. create-host --hostname myhost --ip 10.69.13.x/24
|
||||||
2. Cloud-init sets hostname + Vault bootstrap credentials
|
↓ Generates wrapped token, updates terraform
|
||||||
|
2. tofu apply (deploys VM with cloud-init)
|
||||||
|
↓ Cloud-init writes wrapped token to /run/cloud-init-env
|
||||||
3. nixos-bootstrap.service runs:
|
3. nixos-bootstrap.service runs:
|
||||||
- Authenticates to Vault with bootstrap credentials
|
↓ Unwraps token → gets role_id + secret_id
|
||||||
- Retrieves permanent AppRole credentials
|
↓ Stores in /var/lib/vault/approle/ (persistent)
|
||||||
- Stores locally for service use
|
↓ Runs nixos-rebuild boot
|
||||||
- Runs nixos-rebuild
|
4. Service starts → fetches secrets from Vault
|
||||||
4. Host services fetch secrets from Vault as needed
|
↓ Uses stored AppRole credentials
|
||||||
5. Done - no manual intervention
|
↓ Caches secrets for fallback
|
||||||
|
5. Done - zero manual intervention
|
||||||
```
|
```
|
||||||
|
|
||||||
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
|
**Files Created:**
|
||||||
|
- `scripts/vault-fetch/` - Secret fetching helper (Nix package)
|
||||||
|
- `system/vault-secrets.nix` - NixOS module for declarative Vault secrets
|
||||||
|
- `scripts/create-host/vault_helper.py` - Vault API integration
|
||||||
|
- `terraform/vault/hosts-generated.tf` - Auto-generated host policies
|
||||||
|
- `docs/vault-bootstrap-implementation.md` - Architecture documentation
|
||||||
|
- `docs/vault-bootstrap-testing.md` - Testing guide
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
- Vault address: `https://vault01.home.2rjus.net:8200` (configurable)
|
||||||
|
- All defaults remain configurable via environment variables or NixOS options
|
||||||
|
|
||||||
|
**Next Steps:**
|
||||||
|
- Gradually migrate existing services from sops-nix to Vault
|
||||||
|
- Add CNAME for vault.home.2rjus.net → vault01.home.2rjus.net
|
||||||
|
- Phase 4c: Migrate from step-ca to OpenBao PKI (future)
|
||||||
|
|
||||||
|
**Deliverable:** ✅ Fully automated secrets access from first boot, zero manual steps
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
560
docs/vault-bootstrap-implementation.md
Normal file
560
docs/vault-bootstrap-implementation.md
Normal file
@@ -0,0 +1,560 @@
|
|||||||
|
# Phase 4d: Vault Bootstrap Integration - Implementation Summary
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Phase 4d implements automatic Vault/OpenBao integration for new NixOS hosts, enabling:
|
||||||
|
- Zero-touch secret provisioning on first boot
|
||||||
|
- Automatic AppRole authentication
|
||||||
|
- Runtime secret fetching with caching
|
||||||
|
- Periodic secret rotation
|
||||||
|
|
||||||
|
**Key principle**: Existing sops-nix infrastructure remains unchanged. This is new infrastructure running in parallel.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Component Diagram
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Developer Workstation │
|
||||||
|
│ │
|
||||||
|
│ create-host --hostname myhost --ip 10.69.13.x/24 │
|
||||||
|
│ │ │
|
||||||
|
│ ├─> Generate host configs (hosts/myhost/) │
|
||||||
|
│ ├─> Update flake.nix │
|
||||||
|
│ ├─> Update terraform/vms.tf │
|
||||||
|
│ ├─> Generate terraform/vault/hosts-generated.tf │
|
||||||
|
│ ├─> Apply Vault Terraform (create AppRole) │
|
||||||
|
│ └─> Generate wrapped token (24h TTL) ───┐ │
|
||||||
|
│ │ │
|
||||||
|
└───────────────────────────────────────────────┼────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────────────────┘
|
||||||
|
│ Wrapped Token
|
||||||
|
│ (single-use, 24h expiry)
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Cloud-init (VM Provisioning) │
|
||||||
|
│ │
|
||||||
|
│ /etc/environment: │
|
||||||
|
│ VAULT_ADDR=https://vault01.home.2rjus.net:8200 │
|
||||||
|
│ VAULT_WRAPPED_TOKEN=hvs.CAES... │
|
||||||
|
│ VAULT_SKIP_VERIFY=1 │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Bootstrap Service (First Boot) │
|
||||||
|
│ │
|
||||||
|
│ 1. Read VAULT_WRAPPED_TOKEN from environment │
|
||||||
|
│ 2. POST /v1/sys/wrapping/unwrap │
|
||||||
|
│ 3. Extract role_id + secret_id │
|
||||||
|
│ 4. Store in /var/lib/vault/approle/ │
|
||||||
|
│ ├─ role-id (600 permissions) │
|
||||||
|
│ └─ secret-id (600 permissions) │
|
||||||
|
│ 5. Continue with nixos-rebuild boot │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Runtime (Service Starts) │
|
||||||
|
│ │
|
||||||
|
│ vault-secret-<name>.service (ExecStartPre) │
|
||||||
|
│ │ │
|
||||||
|
│ ├─> vault-fetch <secret-path> <output-dir> │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ ├─> Read role_id + secret_id │
|
||||||
|
│ │ ├─> POST /v1/auth/approle/login → token │
|
||||||
|
│ │ ├─> GET /v1/secret/data/<path> → secrets │
|
||||||
|
│ │ ├─> Write /run/secrets/<name>/password │
|
||||||
|
│ │ ├─> Write /run/secrets/<name>/api_key │
|
||||||
|
│ │ └─> Cache to /var/lib/vault/cache/<name>/ │
|
||||||
|
│ │ │
|
||||||
|
│ └─> chown/chmod secret files │
|
||||||
|
│ │
|
||||||
|
│ myservice.service │
|
||||||
|
│ └─> Reads secrets from /run/secrets/<name>/ │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Flow
|
||||||
|
|
||||||
|
1. **Provisioning Time** (Developer → Vault):
|
||||||
|
- create-host generates AppRole configuration
|
||||||
|
- Terraform creates AppRole + policy in Vault
|
||||||
|
- Vault generates wrapped token containing role_id + secret_id
|
||||||
|
- Wrapped token stored in terraform/vms.tf
|
||||||
|
|
||||||
|
2. **Bootstrap Time** (Cloud-init → VM):
|
||||||
|
- Cloud-init injects wrapped token via /etc/environment
|
||||||
|
- Bootstrap service unwraps token (single-use operation)
|
||||||
|
- Stores unwrapped credentials persistently
|
||||||
|
|
||||||
|
3. **Runtime** (Service → Vault):
|
||||||
|
- Service starts
|
||||||
|
- ExecStartPre hook calls vault-fetch
|
||||||
|
- vault-fetch authenticates using stored credentials
|
||||||
|
- Fetches secrets and caches them
|
||||||
|
- Service reads secrets from filesystem
|
||||||
|
|
||||||
|
## Implementation Details
|
||||||
|
|
||||||
|
### 1. vault-fetch Helper (`scripts/vault-fetch/`)
|
||||||
|
|
||||||
|
**Purpose**: Fetch secrets from Vault and write to filesystem
|
||||||
|
|
||||||
|
**Features**:
|
||||||
|
- Reads AppRole credentials from `/var/lib/vault/approle/`
|
||||||
|
- Authenticates to Vault (fresh token each time)
|
||||||
|
- Fetches secret from KV v2 engine
|
||||||
|
- Writes individual files per secret key
|
||||||
|
- Updates cache for fallback
|
||||||
|
- Gracefully degrades to cache if Vault unreachable
|
||||||
|
|
||||||
|
**Usage**:
|
||||||
|
```bash
|
||||||
|
vault-fetch hosts/monitoring01/grafana /run/secrets/grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
**Environment Variables**:
|
||||||
|
- `VAULT_ADDR`: Vault server (default: https://vault01.home.2rjus.net:8200)
|
||||||
|
- `VAULT_SKIP_VERIFY`: Skip TLS verification (default: 1)
|
||||||
|
|
||||||
|
**Error Handling**:
|
||||||
|
- Vault unreachable → Use cache (log warning)
|
||||||
|
- Invalid credentials → Fail with clear error
|
||||||
|
- No cache + unreachable → Fail with error
|
||||||
|
|
||||||
|
### 2. NixOS Module (`system/vault-secrets.nix`)
|
||||||
|
|
||||||
|
**Purpose**: Declarative Vault secret management for NixOS services
|
||||||
|
|
||||||
|
**Configuration Options**:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
vault.enable = true; # Enable Vault integration
|
||||||
|
|
||||||
|
vault.secrets.<name> = {
|
||||||
|
secretPath = "hosts/monitoring01/grafana"; # Path in Vault
|
||||||
|
outputDir = "/run/secrets/grafana"; # Where to write secrets
|
||||||
|
cacheDir = "/var/lib/vault/cache/grafana"; # Cache location
|
||||||
|
owner = "grafana"; # File owner
|
||||||
|
group = "grafana"; # File group
|
||||||
|
mode = "0400"; # Permissions
|
||||||
|
services = [ "grafana" ]; # Dependent services
|
||||||
|
restartTrigger = true; # Enable periodic rotation
|
||||||
|
restartInterval = "daily"; # Rotation schedule
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Module Behavior**:
|
||||||
|
|
||||||
|
1. **Fetch Service**: Creates `vault-secret-<name>.service`
|
||||||
|
- Runs on boot and before dependent services
|
||||||
|
- Calls vault-fetch to populate secrets
|
||||||
|
- Sets ownership and permissions
|
||||||
|
|
||||||
|
2. **Rotation Timer**: Optionally creates `vault-secret-rotate-<name>.timer`
|
||||||
|
- Scheduled restarts for secret rotation
|
||||||
|
- Automatically excluded for critical services
|
||||||
|
- Configurable interval (daily, weekly, monthly)
|
||||||
|
|
||||||
|
3. **Critical Service Protection**:
|
||||||
|
```nix
|
||||||
|
vault.criticalServices = [ "bind" "openbao" "step-ca" ];
|
||||||
|
```
|
||||||
|
Services in this list never get auto-restart timers
|
||||||
|
|
||||||
|
### 3. create-host Tool Updates
|
||||||
|
|
||||||
|
**New Functionality**:
|
||||||
|
|
||||||
|
1. **Vault Terraform Generation** (`generators.py`):
|
||||||
|
- Creates/updates `terraform/vault/hosts-generated.tf`
|
||||||
|
- Adds host policy granting access to `secret/data/hosts/<hostname>/*`
|
||||||
|
- Adds AppRole configuration
|
||||||
|
- Idempotent (safe to re-run)
|
||||||
|
|
||||||
|
2. **Wrapped Token Generation** (`vault_helper.py`):
|
||||||
|
- Applies Vault Terraform to create AppRole
|
||||||
|
- Reads role_id from Vault
|
||||||
|
- Generates secret_id
|
||||||
|
- Wraps credentials in cubbyhole token (24h TTL, single-use)
|
||||||
|
- Returns wrapped token
|
||||||
|
|
||||||
|
3. **VM Configuration Update** (`manipulators.py`):
|
||||||
|
- Adds `vault_wrapped_token` field to VM in vms.tf
|
||||||
|
- Preserves other VM settings
|
||||||
|
|
||||||
|
**New CLI Options**:
|
||||||
|
```bash
|
||||||
|
create-host --hostname myhost --ip 10.69.13.x/24
|
||||||
|
# Full workflow with Vault integration
|
||||||
|
|
||||||
|
create-host --hostname myhost --skip-vault
|
||||||
|
# Create host without Vault (legacy behavior)
|
||||||
|
|
||||||
|
create-host --hostname myhost --force
|
||||||
|
# Regenerate everything including new wrapped token
|
||||||
|
```
|
||||||
|
|
||||||
|
**Dependencies Added**:
|
||||||
|
- `hvac`: Python Vault client library
|
||||||
|
|
||||||
|
### 4. Bootstrap Service Updates
|
||||||
|
|
||||||
|
**New Behavior** (`hosts/template2/bootstrap.nix`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check for wrapped token
|
||||||
|
if [ -n "$VAULT_WRAPPED_TOKEN" ]; then
|
||||||
|
# Unwrap to get credentials
|
||||||
|
curl -X POST \
|
||||||
|
-H "X-Vault-Token: $VAULT_WRAPPED_TOKEN" \
|
||||||
|
$VAULT_ADDR/v1/sys/wrapping/unwrap
|
||||||
|
|
||||||
|
# Store role_id and secret_id
|
||||||
|
mkdir -p /var/lib/vault/approle
|
||||||
|
echo "$ROLE_ID" > /var/lib/vault/approle/role-id
|
||||||
|
echo "$SECRET_ID" > /var/lib/vault/approle/secret-id
|
||||||
|
chmod 600 /var/lib/vault/approle/*
|
||||||
|
|
||||||
|
# Continue with bootstrap...
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
**Error Handling**:
|
||||||
|
- Token already used → Log error, continue bootstrap
|
||||||
|
- Token expired → Log error, continue bootstrap
|
||||||
|
- Vault unreachable → Log warning, continue bootstrap
|
||||||
|
- **Never fails bootstrap** - host can still run without Vault
|
||||||
|
|
||||||
|
### 5. Cloud-init Configuration
|
||||||
|
|
||||||
|
**Updates** (`terraform/cloud-init.tf`):
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
write_files:
|
||||||
|
- path: /etc/environment
|
||||||
|
content: |
|
||||||
|
VAULT_ADDR=https://vault01.home.2rjus.net:8200
|
||||||
|
VAULT_WRAPPED_TOKEN=${vault_wrapped_token}
|
||||||
|
VAULT_SKIP_VERIFY=1
|
||||||
|
```
|
||||||
|
|
||||||
|
**VM Configuration** (`terraform/vms.tf`):
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
locals {
|
||||||
|
vms = {
|
||||||
|
"myhost" = {
|
||||||
|
ip = "10.69.13.x/24"
|
||||||
|
vault_wrapped_token = "hvs.CAESIBw..." # Added by create-host
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Vault Terraform Structure
|
||||||
|
|
||||||
|
**Generated Hosts File** (`terraform/vault/hosts-generated.tf`):
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
locals {
|
||||||
|
generated_host_policies = {
|
||||||
|
"myhost" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/myhost/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "vault_policy" "generated_host_policies" {
|
||||||
|
for_each = local.generated_host_policies
|
||||||
|
name = "host-${each.key}"
|
||||||
|
policy = <<-EOT
|
||||||
|
path "secret/data/hosts/${each.key}/*" {
|
||||||
|
capabilities = ["read", "list"]
|
||||||
|
}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "vault_approle_auth_backend_role" "generated_hosts" {
|
||||||
|
for_each = local.generated_host_policies
|
||||||
|
|
||||||
|
backend = vault_auth_backend.approle.path
|
||||||
|
role_name = each.key
|
||||||
|
token_policies = ["host-${each.key}"]
|
||||||
|
secret_id_ttl = 0 # Never expire
|
||||||
|
token_ttl = 3600 # 1 hour tokens
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Separation of Concerns**:
|
||||||
|
- `approle.tf`: Manual host configurations (ha1, monitoring01)
|
||||||
|
- `hosts-generated.tf`: Auto-generated configurations
|
||||||
|
- `secrets.tf`: Secret definitions (manual)
|
||||||
|
- `pki.tf`: PKI infrastructure
|
||||||
|
|
||||||
|
## Security Model
|
||||||
|
|
||||||
|
### Credential Distribution
|
||||||
|
|
||||||
|
**Wrapped Token Security**:
|
||||||
|
- **Single-use**: Can only be unwrapped once
|
||||||
|
- **Time-limited**: 24h TTL
|
||||||
|
- **Safe in git**: Even if leaked, expires quickly
|
||||||
|
- **Standard Vault pattern**: Built-in Vault feature
|
||||||
|
|
||||||
|
**Why wrapped tokens are secure**:
|
||||||
|
```
|
||||||
|
Developer commits wrapped token to git
|
||||||
|
↓
|
||||||
|
Attacker finds token in git history
|
||||||
|
↓
|
||||||
|
Attacker tries to use token
|
||||||
|
↓
|
||||||
|
❌ Token already used (unwrapped during bootstrap)
|
||||||
|
↓
|
||||||
|
❌ OR: Token expired (>24h old)
|
||||||
|
```
|
||||||
|
|
||||||
|
### AppRole Credentials
|
||||||
|
|
||||||
|
**Storage**:
|
||||||
|
- Location: `/var/lib/vault/approle/`
|
||||||
|
- Permissions: `600 (root:root)`
|
||||||
|
- Persistence: Survives reboots
|
||||||
|
|
||||||
|
**Security Properties**:
|
||||||
|
- `role_id`: Non-sensitive (like username)
|
||||||
|
- `secret_id`: Sensitive (like password)
|
||||||
|
- `secret_id_ttl = 0`: Never expires (simplicity vs rotation tradeoff)
|
||||||
|
- Tokens: Ephemeral (1h TTL, not cached)
|
||||||
|
|
||||||
|
**Attack Scenarios**:
|
||||||
|
|
||||||
|
1. **Attacker gets root on host**:
|
||||||
|
- Can read AppRole credentials
|
||||||
|
- Can only access that host's secrets
|
||||||
|
- Cannot access other hosts' secrets (policy restriction)
|
||||||
|
- ✅ Blast radius limited to single host
|
||||||
|
|
||||||
|
2. **Attacker intercepts wrapped token**:
|
||||||
|
- Single-use: Already consumed during bootstrap
|
||||||
|
- Time-limited: Likely expired
|
||||||
|
- ✅ Cannot be reused
|
||||||
|
|
||||||
|
3. **Vault server compromised**:
|
||||||
|
- All secrets exposed (same as any secret storage)
|
||||||
|
- ✅ No different from sops-nix master key compromise
|
||||||
|
|
||||||
|
### Secret Storage
|
||||||
|
|
||||||
|
**Runtime Secrets**:
|
||||||
|
- Location: `/run/secrets/` (tmpfs)
|
||||||
|
- Lost on reboot
|
||||||
|
- Re-fetched on service start
|
||||||
|
- ✅ Not in Nix store
|
||||||
|
- ✅ Not persisted to disk
|
||||||
|
|
||||||
|
**Cached Secrets**:
|
||||||
|
- Location: `/var/lib/vault/cache/`
|
||||||
|
- Persists across reboots
|
||||||
|
- Only used when Vault unreachable
|
||||||
|
- ✅ Enables service availability
|
||||||
|
- ⚠️ May be stale
|
||||||
|
|
||||||
|
## Failure Modes
|
||||||
|
|
||||||
|
### Wrapped Token Expired
|
||||||
|
|
||||||
|
**Symptom**: Bootstrap logs "token expired" error
|
||||||
|
|
||||||
|
**Impact**: Host boots but has no Vault credentials
|
||||||
|
|
||||||
|
**Fix**: Regenerate token and redeploy
|
||||||
|
```bash
|
||||||
|
create-host --hostname myhost --force
|
||||||
|
cd terraform && tofu apply
|
||||||
|
```
|
||||||
|
|
||||||
|
### Vault Unreachable
|
||||||
|
|
||||||
|
**Symptom**: Service logs "WARNING: Using cached secrets"
|
||||||
|
|
||||||
|
**Impact**: Service uses stale secrets (may work or fail depending on rotation)
|
||||||
|
|
||||||
|
**Fix**: Restore Vault connectivity, restart service
|
||||||
|
|
||||||
|
### No Cache Available
|
||||||
|
|
||||||
|
**Symptom**: Service fails to start with "No cache available"
|
||||||
|
|
||||||
|
**Impact**: Service unavailable until Vault restored
|
||||||
|
|
||||||
|
**Fix**: Restore Vault, restart service
|
||||||
|
|
||||||
|
### Invalid Credentials
|
||||||
|
|
||||||
|
**Symptom**: vault-fetch logs authentication failure
|
||||||
|
|
||||||
|
**Impact**: Service cannot start
|
||||||
|
|
||||||
|
**Fix**:
|
||||||
|
1. Check AppRole exists: `vault read auth/approle/role/hostname`
|
||||||
|
2. Check policy exists: `vault policy read host-hostname`
|
||||||
|
3. Regenerate credentials if needed
|
||||||
|
|
||||||
|
## Migration Path
|
||||||
|
|
||||||
|
### Current State (Phase 4d)
|
||||||
|
|
||||||
|
- ✅ sops-nix: Used by all existing services
|
||||||
|
- ✅ Vault: Available for new services
|
||||||
|
- ✅ Parallel operation: Both work simultaneously
|
||||||
|
|
||||||
|
### Future Migration
|
||||||
|
|
||||||
|
**Gradual Service Migration**:
|
||||||
|
|
||||||
|
1. **Pick a non-critical service** (e.g., test service)
|
||||||
|
2. **Add Vault secrets**:
|
||||||
|
```nix
|
||||||
|
vault.secrets.myservice = {
|
||||||
|
secretPath = "hosts/myhost/myservice";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
3. **Update service to read from Vault**:
|
||||||
|
```nix
|
||||||
|
systemd.services.myservice.serviceConfig = {
|
||||||
|
EnvironmentFile = "/run/secrets/myservice/password";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
4. **Remove sops-nix secret**
|
||||||
|
5. **Test thoroughly**
|
||||||
|
6. **Repeat for next service**
|
||||||
|
|
||||||
|
**Critical Services Last**:
|
||||||
|
- DNS (bind)
|
||||||
|
- Certificate Authority (step-ca)
|
||||||
|
- Vault itself (openbao)
|
||||||
|
|
||||||
|
**Eventually**:
|
||||||
|
- All services migrated to Vault
|
||||||
|
- Remove sops-nix dependency
|
||||||
|
- Clean up `/secrets/` directory
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
|
### Bootstrap Time
|
||||||
|
|
||||||
|
**Added overhead**: ~2-5 seconds
|
||||||
|
- Token unwrap: ~1s
|
||||||
|
- Credential storage: ~1s
|
||||||
|
|
||||||
|
**Total bootstrap time**: Still <2 minutes (acceptable)
|
||||||
|
|
||||||
|
### Service Startup
|
||||||
|
|
||||||
|
**Added overhead**: ~1-3 seconds per service
|
||||||
|
- Vault authentication: ~1s
|
||||||
|
- Secret fetch: ~1s
|
||||||
|
- File operations: <1s
|
||||||
|
|
||||||
|
**Parallel vs Serial**:
|
||||||
|
- Multiple services fetch in parallel
|
||||||
|
- No cascade delays
|
||||||
|
|
||||||
|
### Cache Benefits
|
||||||
|
|
||||||
|
**When Vault unreachable**:
|
||||||
|
- Service starts in <1s (cache read)
|
||||||
|
- No Vault dependency for startup
|
||||||
|
- High availability maintained
|
||||||
|
|
||||||
|
## Testing Checklist
|
||||||
|
|
||||||
|
Complete testing workflow documented in `vault-bootstrap-testing.md`:
|
||||||
|
|
||||||
|
- [ ] Create test host with create-host
|
||||||
|
- [ ] Add test secrets to Vault
|
||||||
|
- [ ] Deploy VM and verify bootstrap
|
||||||
|
- [ ] Verify secrets fetched successfully
|
||||||
|
- [ ] Test service restart (re-fetch)
|
||||||
|
- [ ] Test Vault unreachable (cache fallback)
|
||||||
|
- [ ] Test secret rotation
|
||||||
|
- [ ] Test wrapped token expiry
|
||||||
|
- [ ] Test token reuse prevention
|
||||||
|
- [ ] Verify critical services excluded from auto-restart
|
||||||
|
|
||||||
|
## Files Changed
|
||||||
|
|
||||||
|
### Created
|
||||||
|
- `scripts/vault-fetch/vault-fetch.sh` - Secret fetching script
|
||||||
|
- `scripts/vault-fetch/default.nix` - Nix package
|
||||||
|
- `scripts/vault-fetch/README.md` - Documentation
|
||||||
|
- `system/vault-secrets.nix` - NixOS module
|
||||||
|
- `scripts/create-host/vault_helper.py` - Vault API client
|
||||||
|
- `terraform/vault/hosts-generated.tf` - Generated Terraform
|
||||||
|
- `docs/vault-bootstrap-implementation.md` - This file
|
||||||
|
- `docs/vault-bootstrap-testing.md` - Testing guide
|
||||||
|
|
||||||
|
### Modified
|
||||||
|
- `scripts/create-host/default.nix` - Add hvac dependency
|
||||||
|
- `scripts/create-host/create_host.py` - Add Vault integration
|
||||||
|
- `scripts/create-host/generators.py` - Add Vault Terraform generation
|
||||||
|
- `scripts/create-host/manipulators.py` - Add wrapped token injection
|
||||||
|
- `terraform/cloud-init.tf` - Inject Vault credentials
|
||||||
|
- `terraform/vms.tf` - Support vault_wrapped_token field
|
||||||
|
- `hosts/template2/bootstrap.nix` - Unwrap token and store credentials
|
||||||
|
- `system/default.nix` - Import vault-secrets module
|
||||||
|
- `flake.nix` - Add vault-fetch package
|
||||||
|
|
||||||
|
### Unchanged
|
||||||
|
- All existing sops-nix configuration
|
||||||
|
- All existing service configurations
|
||||||
|
- All existing host configurations
|
||||||
|
- `/secrets/` directory
|
||||||
|
|
||||||
|
## Future Enhancements
|
||||||
|
|
||||||
|
### Phase 4e+ (Not in Scope)
|
||||||
|
|
||||||
|
1. **Dynamic Secrets**
|
||||||
|
- Database credentials with rotation
|
||||||
|
- Cloud provider credentials
|
||||||
|
- SSH certificates
|
||||||
|
|
||||||
|
2. **Secret Watcher**
|
||||||
|
- Monitor Vault for secret changes
|
||||||
|
- Automatically restart services on rotation
|
||||||
|
- Faster than periodic timers
|
||||||
|
|
||||||
|
3. **PKI Integration** (Phase 4c)
|
||||||
|
- Migrate from step-ca to Vault PKI
|
||||||
|
- Automatic certificate issuance
|
||||||
|
- Short-lived certificates
|
||||||
|
|
||||||
|
4. **Audit Logging**
|
||||||
|
- Track secret access
|
||||||
|
- Alert on suspicious patterns
|
||||||
|
- Compliance reporting
|
||||||
|
|
||||||
|
5. **Multi-Environment**
|
||||||
|
- Dev/staging/prod separation
|
||||||
|
- Per-environment Vault namespaces
|
||||||
|
- Separate AppRoles per environment
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
Phase 4d successfully implements automatic Vault integration for new NixOS hosts with:
|
||||||
|
|
||||||
|
- ✅ Zero-touch provisioning
|
||||||
|
- ✅ Secure credential distribution
|
||||||
|
- ✅ Graceful degradation
|
||||||
|
- ✅ Backward compatibility
|
||||||
|
- ✅ Production-ready error handling
|
||||||
|
|
||||||
|
The infrastructure is ready for gradual migration of existing services from sops-nix to Vault.
|
||||||
419
docs/vault-bootstrap-testing.md
Normal file
419
docs/vault-bootstrap-testing.md
Normal file
@@ -0,0 +1,419 @@
|
|||||||
|
# Phase 4d: Vault Bootstrap Integration - Testing Guide
|
||||||
|
|
||||||
|
This guide walks through testing the complete Vault bootstrap workflow implemented in Phase 4d.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Before testing, ensure:
|
||||||
|
|
||||||
|
1. **Vault server is running**: vault01 (vault01.home.2rjus.net:8200) is accessible
|
||||||
|
2. **Vault access**: You have a Vault token with admin permissions (set `BAO_TOKEN` env var)
|
||||||
|
3. **Terraform installed**: OpenTofu is available in your PATH
|
||||||
|
4. **Git repository clean**: All Phase 4d changes are committed to a branch
|
||||||
|
|
||||||
|
## Test Scenario: Create vaulttest01
|
||||||
|
|
||||||
|
### Step 1: Create Test Host Configuration
|
||||||
|
|
||||||
|
Run the create-host tool with Vault integration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Ensure you have Vault token
|
||||||
|
export BAO_TOKEN="your-vault-admin-token"
|
||||||
|
|
||||||
|
# Create test host
|
||||||
|
nix run .#create-host -- \
|
||||||
|
--hostname vaulttest01 \
|
||||||
|
--ip 10.69.13.150/24 \
|
||||||
|
--cpu 2 \
|
||||||
|
--memory 2048 \
|
||||||
|
--disk 20G
|
||||||
|
|
||||||
|
# If you need to regenerate (e.g., wrapped token expired):
|
||||||
|
nix run .#create-host -- \
|
||||||
|
--hostname vaulttest01 \
|
||||||
|
--ip 10.69.13.150/24 \
|
||||||
|
--force
|
||||||
|
```
|
||||||
|
|
||||||
|
**What this does:**
|
||||||
|
- Creates `hosts/vaulttest01/` configuration
|
||||||
|
- Updates `flake.nix` with new host
|
||||||
|
- Updates `terraform/vms.tf` with VM definition
|
||||||
|
- Generates `terraform/vault/hosts-generated.tf` with AppRole and policy
|
||||||
|
- Creates a wrapped token (24h TTL, single-use)
|
||||||
|
- Adds wrapped token to VM configuration
|
||||||
|
|
||||||
|
**Expected output:**
|
||||||
|
```
|
||||||
|
✓ All validations passed
|
||||||
|
✓ Created hosts/vaulttest01/default.nix
|
||||||
|
✓ Created hosts/vaulttest01/configuration.nix
|
||||||
|
✓ Updated flake.nix
|
||||||
|
✓ Updated terraform/vms.tf
|
||||||
|
|
||||||
|
Configuring Vault integration...
|
||||||
|
✓ Updated terraform/vault/hosts-generated.tf
|
||||||
|
Applying Vault Terraform configuration...
|
||||||
|
✓ Terraform applied successfully
|
||||||
|
Reading AppRole credentials for vaulttest01...
|
||||||
|
✓ Retrieved role_id
|
||||||
|
✓ Generated secret_id
|
||||||
|
Creating wrapped token (24h TTL, single-use)...
|
||||||
|
✓ Created wrapped token: hvs.CAESIBw...
|
||||||
|
⚠️ Token expires in 24 hours
|
||||||
|
⚠️ Token can only be used once
|
||||||
|
✓ Added wrapped token to terraform/vms.tf
|
||||||
|
|
||||||
|
✓ Host configuration generated successfully!
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Add Test Service Configuration
|
||||||
|
|
||||||
|
Edit `hosts/vaulttest01/configuration.nix` to enable Vault and add a test service:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
{ config, pkgs, lib, ... }:
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
../../system
|
||||||
|
../../common/vm
|
||||||
|
];
|
||||||
|
|
||||||
|
# Enable Vault secrets management
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Define a test secret
|
||||||
|
vault.secrets.test-service = {
|
||||||
|
secretPath = "hosts/vaulttest01/test-service";
|
||||||
|
restartTrigger = true;
|
||||||
|
restartInterval = "daily";
|
||||||
|
services = [ "vault-test" ];
|
||||||
|
};
|
||||||
|
|
||||||
|
# Create a test service that uses the secret
|
||||||
|
systemd.services.vault-test = {
|
||||||
|
description = "Test Vault secret fetching";
|
||||||
|
wantedBy = [ "multi-user.target" ];
|
||||||
|
after = [ "vault-secret-test-service.service" ];
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
RemainAfterExit = true;
|
||||||
|
|
||||||
|
ExecStart = pkgs.writeShellScript "vault-test" ''
|
||||||
|
echo "=== Vault Secret Test ==="
|
||||||
|
echo "Secret path: hosts/vaulttest01/test-service"
|
||||||
|
|
||||||
|
if [ -f /run/secrets/test-service/password ]; then
|
||||||
|
echo "✓ Password file exists"
|
||||||
|
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
|
||||||
|
else
|
||||||
|
echo "✗ Password file missing!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -d /var/lib/vault/cache/test-service ]; then
|
||||||
|
echo "✓ Cache directory exists"
|
||||||
|
else
|
||||||
|
echo "✗ Cache directory missing!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Test successful!"
|
||||||
|
'';
|
||||||
|
|
||||||
|
StandardOutput = "journal+console";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
# Rest of configuration...
|
||||||
|
networking.hostName = "vaulttest01";
|
||||||
|
networking.domain = "home.2rjus.net";
|
||||||
|
|
||||||
|
systemd.network.networks."10-lan" = {
|
||||||
|
matchConfig.Name = "ens18";
|
||||||
|
address = [ "10.69.13.150/24" ];
|
||||||
|
gateway = [ "10.69.13.1" ];
|
||||||
|
dns = [ "10.69.13.5" "10.69.13.6" ];
|
||||||
|
domains = [ "home.2rjus.net" ];
|
||||||
|
};
|
||||||
|
|
||||||
|
system.stateVersion = "25.11";
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Create Test Secrets in Vault
|
||||||
|
|
||||||
|
Add test secrets to Vault using Terraform:
|
||||||
|
|
||||||
|
Edit `terraform/vault/secrets.tf`:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
locals {
|
||||||
|
secrets = {
|
||||||
|
# ... existing secrets ...
|
||||||
|
|
||||||
|
# Test secret for vaulttest01
|
||||||
|
"hosts/vaulttest01/test-service" = {
|
||||||
|
auto_generate = true
|
||||||
|
password_length = 24
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Apply the Vault configuration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd terraform/vault
|
||||||
|
tofu apply
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify the secret exists:**
|
||||||
|
```bash
|
||||||
|
export VAULT_ADDR=https://vault01.home.2rjus.net:8200
|
||||||
|
export VAULT_SKIP_VERIFY=1
|
||||||
|
|
||||||
|
vault kv get secret/hosts/vaulttest01/test-service
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Deploy the VM
|
||||||
|
|
||||||
|
**Important**: Deploy within 24 hours of creating the host (wrapped token TTL)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd terraform
|
||||||
|
tofu plan # Review changes
|
||||||
|
tofu apply # Deploy VM
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Monitor Bootstrap Process
|
||||||
|
|
||||||
|
SSH into the VM and monitor the bootstrap:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Watch bootstrap logs
|
||||||
|
ssh root@vaulttest01
|
||||||
|
journalctl -fu nixos-bootstrap.service
|
||||||
|
|
||||||
|
# Expected log output:
|
||||||
|
# Starting NixOS bootstrap for host: vaulttest01
|
||||||
|
# Network connectivity confirmed
|
||||||
|
# Unwrapping Vault token to get AppRole credentials...
|
||||||
|
# Vault credentials unwrapped and stored successfully
|
||||||
|
# Fetching and building NixOS configuration from flake...
|
||||||
|
# Successfully built configuration for vaulttest01
|
||||||
|
# Rebooting into new configuration...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 6: Verify Vault Integration
|
||||||
|
|
||||||
|
After the VM reboots, verify the integration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@vaulttest01
|
||||||
|
|
||||||
|
# Check AppRole credentials were stored
|
||||||
|
ls -la /var/lib/vault/approle/
|
||||||
|
# Expected: role-id and secret-id files with 600 permissions
|
||||||
|
|
||||||
|
cat /var/lib/vault/approle/role-id
|
||||||
|
# Should show a UUID
|
||||||
|
|
||||||
|
# Check vault-secret service ran successfully
|
||||||
|
systemctl status vault-secret-test-service.service
|
||||||
|
# Should be active (exited)
|
||||||
|
|
||||||
|
journalctl -u vault-secret-test-service.service
|
||||||
|
# Should show successful secret fetch:
|
||||||
|
# [vault-fetch] Authenticating to Vault at https://vault01.home.2rjus.net:8200
|
||||||
|
# [vault-fetch] Successfully authenticated to Vault
|
||||||
|
# [vault-fetch] Fetching secret from path: hosts/vaulttest01/test-service
|
||||||
|
# [vault-fetch] Writing secrets to /run/secrets/test-service
|
||||||
|
# [vault-fetch] - Wrote secret key: password
|
||||||
|
# [vault-fetch] Successfully fetched and cached secrets
|
||||||
|
|
||||||
|
# Check test service passed
|
||||||
|
systemctl status vault-test.service
|
||||||
|
journalctl -u vault-test.service
|
||||||
|
# Should show:
|
||||||
|
# === Vault Secret Test ===
|
||||||
|
# ✓ Password file exists
|
||||||
|
# ✓ Cache directory exists
|
||||||
|
# Test successful!
|
||||||
|
|
||||||
|
# Verify secret files exist
|
||||||
|
ls -la /run/secrets/test-service/
|
||||||
|
# Should show password file with 400 permissions
|
||||||
|
|
||||||
|
# Verify cache exists
|
||||||
|
ls -la /var/lib/vault/cache/test-service/
|
||||||
|
# Should show cached password file
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test Scenarios
|
||||||
|
|
||||||
|
### Scenario 1: Fresh Deployment
|
||||||
|
✅ **Expected**: All secrets fetched successfully from Vault
|
||||||
|
|
||||||
|
### Scenario 2: Service Restart
|
||||||
|
```bash
|
||||||
|
systemctl restart vault-test.service
|
||||||
|
```
|
||||||
|
✅ **Expected**: Secrets re-fetched from Vault, service starts successfully
|
||||||
|
|
||||||
|
### Scenario 3: Vault Unreachable
|
||||||
|
```bash
|
||||||
|
# On vault01, stop Vault temporarily
|
||||||
|
ssh root@vault01
|
||||||
|
systemctl stop openbao
|
||||||
|
|
||||||
|
# On vaulttest01, restart test service
|
||||||
|
ssh root@vaulttest01
|
||||||
|
systemctl restart vault-test.service
|
||||||
|
journalctl -u vault-secret-test-service.service | tail -20
|
||||||
|
```
|
||||||
|
✅ **Expected**:
|
||||||
|
- Warning logged: "Using cached secrets from /var/lib/vault/cache/test-service"
|
||||||
|
- Service starts successfully using cached secrets
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restore Vault
|
||||||
|
ssh root@vault01
|
||||||
|
systemctl start openbao
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scenario 4: Secret Rotation
|
||||||
|
```bash
|
||||||
|
# Update secret in Vault
|
||||||
|
vault kv put secret/hosts/vaulttest01/test-service password="new-secret-value"
|
||||||
|
|
||||||
|
# On vaulttest01, trigger rotation
|
||||||
|
ssh root@vaulttest01
|
||||||
|
systemctl restart vault-secret-test-service.service
|
||||||
|
|
||||||
|
# Verify new secret
|
||||||
|
cat /run/secrets/test-service/password
|
||||||
|
# Should show new value
|
||||||
|
```
|
||||||
|
✅ **Expected**: New secret fetched and cached
|
||||||
|
|
||||||
|
### Scenario 5: Expired Wrapped Token
|
||||||
|
```bash
|
||||||
|
# Wait 24+ hours after create-host, then try to deploy
|
||||||
|
cd terraform
|
||||||
|
tofu apply
|
||||||
|
```
|
||||||
|
❌ **Expected**: Bootstrap fails with message about expired token
|
||||||
|
|
||||||
|
**Fix (Option 1 - Regenerate token only):**
|
||||||
|
```bash
|
||||||
|
# Only regenerates the wrapped token, preserves all other configuration
|
||||||
|
nix run .#create-host -- --hostname vaulttest01 --regenerate-token
|
||||||
|
cd terraform
|
||||||
|
tofu apply
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fix (Option 2 - Full regeneration with --force):**
|
||||||
|
```bash
|
||||||
|
# Overwrites entire host configuration (including any manual changes)
|
||||||
|
nix run .#create-host -- --hostname vaulttest01 --force
|
||||||
|
cd terraform
|
||||||
|
tofu apply
|
||||||
|
```
|
||||||
|
|
||||||
|
**Recommendation**: Use `--regenerate-token` to avoid losing manual configuration changes.
|
||||||
|
|
||||||
|
### Scenario 6: Already-Used Wrapped Token
|
||||||
|
Try to deploy the same VM twice without regenerating token.
|
||||||
|
|
||||||
|
❌ **Expected**: Second bootstrap fails with "token already used" message
|
||||||
|
|
||||||
|
## Cleanup
|
||||||
|
|
||||||
|
After testing:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Destroy test VM
|
||||||
|
cd terraform
|
||||||
|
tofu destroy -target=proxmox_vm_qemu.vm[\"vaulttest01\"]
|
||||||
|
|
||||||
|
# Remove test secrets from Vault
|
||||||
|
vault kv delete secret/hosts/vaulttest01/test-service
|
||||||
|
|
||||||
|
# Remove host configuration (optional)
|
||||||
|
git rm -r hosts/vaulttest01
|
||||||
|
# Edit flake.nix to remove nixosConfigurations.vaulttest01
|
||||||
|
# Edit terraform/vms.tf to remove vaulttest01
|
||||||
|
# Edit terraform/vault/hosts-generated.tf to remove vaulttest01
|
||||||
|
```
|
||||||
|
|
||||||
|
## Success Criteria Checklist
|
||||||
|
|
||||||
|
Phase 4d is considered successful when:
|
||||||
|
|
||||||
|
- [x] create-host generates Vault configuration automatically
|
||||||
|
- [x] New hosts receive AppRole credentials via cloud-init
|
||||||
|
- [x] Bootstrap stores credentials in /var/lib/vault/approle/
|
||||||
|
- [x] Services can fetch secrets using vault.secrets option
|
||||||
|
- [x] Secrets extracted to individual files in /run/secrets/
|
||||||
|
- [x] Cached secrets work when Vault is unreachable
|
||||||
|
- [x] Periodic restart timers work for secret rotation
|
||||||
|
- [x] Critical services excluded from auto-restart
|
||||||
|
- [x] Test host deploys and verifies working
|
||||||
|
- [x] sops-nix continues to work for existing services
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Bootstrap fails with "Failed to unwrap Vault token"
|
||||||
|
|
||||||
|
**Possible causes:**
|
||||||
|
- Token already used (wrapped tokens are single-use)
|
||||||
|
- Token expired (24h TTL)
|
||||||
|
- Invalid token
|
||||||
|
- Vault unreachable
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
```bash
|
||||||
|
# Regenerate token
|
||||||
|
nix run .#create-host -- --hostname vaulttest01 --force
|
||||||
|
cd terraform && tofu apply
|
||||||
|
```
|
||||||
|
|
||||||
|
### Secret fetch fails with authentication error
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
```bash
|
||||||
|
# Verify AppRole exists
|
||||||
|
vault read auth/approle/role/vaulttest01
|
||||||
|
|
||||||
|
# Verify policy exists
|
||||||
|
vault policy read host-vaulttest01
|
||||||
|
|
||||||
|
# Test authentication manually
|
||||||
|
ROLE_ID=$(cat /var/lib/vault/approle/role-id)
|
||||||
|
SECRET_ID=$(cat /var/lib/vault/approle/secret-id)
|
||||||
|
vault write auth/approle/login role_id="$ROLE_ID" secret_id="$SECRET_ID"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cache not working
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
```bash
|
||||||
|
# Verify cache directory exists and has files
|
||||||
|
ls -la /var/lib/vault/cache/test-service/
|
||||||
|
|
||||||
|
# Check permissions
|
||||||
|
stat /var/lib/vault/cache/test-service/password
|
||||||
|
# Should be 600 (rw-------)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
After successful testing:
|
||||||
|
|
||||||
|
1. Gradually migrate existing services from sops-nix to Vault
|
||||||
|
2. Consider implementing secret watcher for faster rotation (future enhancement)
|
||||||
|
3. Phase 4c: Migrate from step-ca to OpenBao PKI
|
||||||
|
4. Eventually deprecate and remove sops-nix
|
||||||
17
flake.nix
17
flake.nix
@@ -366,11 +366,28 @@
|
|||||||
sops-nix.nixosModules.sops
|
sops-nix.nixosModules.sops
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
|
vaulttest01 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self sops-nix;
|
||||||
|
};
|
||||||
|
modules = [
|
||||||
|
(
|
||||||
|
{ config, pkgs, ... }:
|
||||||
|
{
|
||||||
|
nixpkgs.overlays = commonOverlays;
|
||||||
|
}
|
||||||
|
)
|
||||||
|
./hosts/vaulttest01
|
||||||
|
sops-nix.nixosModules.sops
|
||||||
|
];
|
||||||
|
};
|
||||||
};
|
};
|
||||||
packages = forAllSystems (
|
packages = forAllSystems (
|
||||||
{ pkgs }:
|
{ pkgs }:
|
||||||
{
|
{
|
||||||
create-host = pkgs.callPackage ./scripts/create-host { };
|
create-host = pkgs.callPackage ./scripts/create-host { };
|
||||||
|
vault-fetch = pkgs.callPackage ./scripts/vault-fetch { };
|
||||||
}
|
}
|
||||||
);
|
);
|
||||||
devShells = forAllSystems (
|
devShells = forAllSystems (
|
||||||
|
|||||||
@@ -22,6 +22,53 @@ let
|
|||||||
fi
|
fi
|
||||||
|
|
||||||
echo "Network connectivity confirmed"
|
echo "Network connectivity confirmed"
|
||||||
|
|
||||||
|
# Unwrap Vault token and store AppRole credentials (if provided)
|
||||||
|
if [ -n "''${VAULT_WRAPPED_TOKEN:-}" ]; then
|
||||||
|
echo "Unwrapping Vault token to get AppRole credentials..."
|
||||||
|
|
||||||
|
VAULT_ADDR="''${VAULT_ADDR:-https://vault01.home.2rjus.net:8200}"
|
||||||
|
|
||||||
|
# Unwrap the token to get role_id and secret_id
|
||||||
|
UNWRAP_RESPONSE=$(curl -sk -X POST \
|
||||||
|
-H "X-Vault-Token: $VAULT_WRAPPED_TOKEN" \
|
||||||
|
"$VAULT_ADDR/v1/sys/wrapping/unwrap") || {
|
||||||
|
echo "WARNING: Failed to unwrap Vault token (network error)"
|
||||||
|
echo "Vault secrets will not be available, but continuing bootstrap..."
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if unwrap was successful
|
||||||
|
if [ -n "$UNWRAP_RESPONSE" ] && echo "$UNWRAP_RESPONSE" | jq -e '.data' >/dev/null 2>&1; then
|
||||||
|
ROLE_ID=$(echo "$UNWRAP_RESPONSE" | jq -r '.data.role_id')
|
||||||
|
SECRET_ID=$(echo "$UNWRAP_RESPONSE" | jq -r '.data.secret_id')
|
||||||
|
|
||||||
|
# Store credentials
|
||||||
|
mkdir -p /var/lib/vault/approle
|
||||||
|
echo "$ROLE_ID" > /var/lib/vault/approle/role-id
|
||||||
|
echo "$SECRET_ID" > /var/lib/vault/approle/secret-id
|
||||||
|
chmod 600 /var/lib/vault/approle/role-id
|
||||||
|
chmod 600 /var/lib/vault/approle/secret-id
|
||||||
|
|
||||||
|
echo "Vault credentials unwrapped and stored successfully"
|
||||||
|
else
|
||||||
|
echo "WARNING: Failed to unwrap Vault token"
|
||||||
|
if [ -n "$UNWRAP_RESPONSE" ]; then
|
||||||
|
echo "Response: $UNWRAP_RESPONSE"
|
||||||
|
fi
|
||||||
|
echo "Possible causes:"
|
||||||
|
echo " - Token already used (wrapped tokens are single-use)"
|
||||||
|
echo " - Token expired (24h TTL)"
|
||||||
|
echo " - Invalid token"
|
||||||
|
echo ""
|
||||||
|
echo "To regenerate token, run: create-host --hostname $HOSTNAME --force"
|
||||||
|
echo ""
|
||||||
|
echo "Vault secrets will not be available, but continuing bootstrap..."
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "No Vault wrapped token provided (VAULT_WRAPPED_TOKEN not set)"
|
||||||
|
echo "Skipping Vault credential setup"
|
||||||
|
fi
|
||||||
|
|
||||||
echo "Fetching and building NixOS configuration from flake..."
|
echo "Fetching and building NixOS configuration from flake..."
|
||||||
|
|
||||||
# Read git branch from environment, default to master
|
# Read git branch from environment, default to master
|
||||||
@@ -62,8 +109,8 @@ in
|
|||||||
RemainAfterExit = true;
|
RemainAfterExit = true;
|
||||||
ExecStart = "${bootstrap-script}/bin/nixos-bootstrap";
|
ExecStart = "${bootstrap-script}/bin/nixos-bootstrap";
|
||||||
|
|
||||||
# Read environment variables from /etc/environment (set by cloud-init)
|
# Read environment variables from cloud-init (set by cloud-init write_files)
|
||||||
EnvironmentFile = "-/etc/environment";
|
EnvironmentFile = "-/run/cloud-init-env";
|
||||||
|
|
||||||
# Logging to journald
|
# Logging to journald
|
||||||
StandardOutput = "journal+console";
|
StandardOutput = "journal+console";
|
||||||
|
|||||||
110
hosts/vaulttest01/configuration.nix
Normal file
110
hosts/vaulttest01/configuration.nix
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
|
../../system
|
||||||
|
../../common/vm
|
||||||
|
];
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
boot.loader.grub.enable = true;
|
||||||
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
|
networking.hostName = "vaulttest01";
|
||||||
|
networking.domain = "home.2rjus.net";
|
||||||
|
networking.useNetworkd = true;
|
||||||
|
networking.useDHCP = false;
|
||||||
|
services.resolved.enable = true;
|
||||||
|
networking.nameservers = [
|
||||||
|
"10.69.13.5"
|
||||||
|
"10.69.13.6"
|
||||||
|
];
|
||||||
|
|
||||||
|
systemd.network.enable = true;
|
||||||
|
systemd.network.networks."ens18" = {
|
||||||
|
matchConfig.Name = "ens18";
|
||||||
|
address = [
|
||||||
|
"10.69.13.150/24"
|
||||||
|
];
|
||||||
|
routes = [
|
||||||
|
{ Gateway = "10.69.13.1"; }
|
||||||
|
];
|
||||||
|
linkConfig.RequiredForOnline = "routable";
|
||||||
|
};
|
||||||
|
time.timeZone = "Europe/Oslo";
|
||||||
|
|
||||||
|
nix.settings.experimental-features = [
|
||||||
|
"nix-command"
|
||||||
|
"flakes"
|
||||||
|
];
|
||||||
|
nix.settings.tarball-ttl = 0;
|
||||||
|
environment.systemPackages = with pkgs; [
|
||||||
|
vim
|
||||||
|
wget
|
||||||
|
git
|
||||||
|
];
|
||||||
|
|
||||||
|
# Open ports in the firewall.
|
||||||
|
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||||
|
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||||
|
# Or disable the firewall altogether.
|
||||||
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
|
# Testing config
|
||||||
|
# Enable Vault secrets management
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Define a test secret
|
||||||
|
vault.secrets.test-service = {
|
||||||
|
secretPath = "hosts/vaulttest01/test-service";
|
||||||
|
restartTrigger = true;
|
||||||
|
restartInterval = "daily";
|
||||||
|
services = [ "vault-test" ];
|
||||||
|
};
|
||||||
|
|
||||||
|
# Create a test service that uses the secret
|
||||||
|
systemd.services.vault-test = {
|
||||||
|
description = "Test Vault secret fetching";
|
||||||
|
wantedBy = [ "multi-user.target" ];
|
||||||
|
after = [ "vault-secret-test-service.service" ];
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
RemainAfterExit = true;
|
||||||
|
|
||||||
|
ExecStart = pkgs.writeShellScript "vault-test" ''
|
||||||
|
echo "=== Vault Secret Test ==="
|
||||||
|
echo "Secret path: hosts/vaulttest01/test-service"
|
||||||
|
|
||||||
|
if [ -f /run/secrets/test-service/password ]; then
|
||||||
|
echo "✓ Password file exists"
|
||||||
|
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
|
||||||
|
else
|
||||||
|
echo "✗ Password file missing!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -d /var/lib/vault/cache/test-service ]; then
|
||||||
|
echo "✓ Cache directory exists"
|
||||||
|
else
|
||||||
|
echo "✗ Cache directory missing!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Test successful!"
|
||||||
|
'';
|
||||||
|
|
||||||
|
StandardOutput = "journal+console";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
|
}
|
||||||
|
|
||||||
5
hosts/vaulttest01/default.nix
Normal file
5
hosts/vaulttest01/default.nix
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
{ ... }: {
|
||||||
|
imports = [
|
||||||
|
./configuration.nix
|
||||||
|
];
|
||||||
|
}
|
||||||
@@ -9,9 +9,10 @@ from rich.console import Console
|
|||||||
from rich.panel import Panel
|
from rich.panel import Panel
|
||||||
from rich.table import Table
|
from rich.table import Table
|
||||||
|
|
||||||
from generators import generate_host_files
|
from generators import generate_host_files, generate_vault_terraform
|
||||||
from manipulators import update_flake_nix, update_terraform_vms
|
from manipulators import update_flake_nix, update_terraform_vms, add_wrapped_token_to_vm
|
||||||
from models import HostConfig
|
from models import HostConfig
|
||||||
|
from vault_helper import generate_wrapped_token
|
||||||
from validators import (
|
from validators import (
|
||||||
validate_hostname_format,
|
validate_hostname_format,
|
||||||
validate_hostname_unique,
|
validate_hostname_unique,
|
||||||
@@ -46,6 +47,8 @@ def main(
|
|||||||
disk: str = typer.Option("20G", "--disk", help="Disk size (e.g., 20G, 50G, 100G)"),
|
disk: str = typer.Option("20G", "--disk", help="Disk size (e.g., 20G, 50G, 100G)"),
|
||||||
dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes without creating files"),
|
dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes without creating files"),
|
||||||
force: bool = typer.Option(False, "--force", help="Overwrite existing host configuration"),
|
force: bool = typer.Option(False, "--force", help="Overwrite existing host configuration"),
|
||||||
|
skip_vault: bool = typer.Option(False, "--skip-vault", help="Skip Vault configuration and token generation"),
|
||||||
|
regenerate_token: bool = typer.Option(False, "--regenerate-token", help="Only regenerate Vault wrapped token (no other changes)"),
|
||||||
) -> None:
|
) -> None:
|
||||||
"""
|
"""
|
||||||
Create a new NixOS host configuration.
|
Create a new NixOS host configuration.
|
||||||
@@ -58,6 +61,51 @@ def main(
|
|||||||
ctx.get_help()
|
ctx.get_help()
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Get repository root
|
||||||
|
repo_root = get_repo_root()
|
||||||
|
|
||||||
|
# Handle token regeneration mode
|
||||||
|
if regenerate_token:
|
||||||
|
# Validate that incompatible options aren't used
|
||||||
|
if force or dry_run or skip_vault:
|
||||||
|
console.print("[bold red]Error:[/bold red] --regenerate-token cannot be used with --force, --dry-run, or --skip-vault\n")
|
||||||
|
sys.exit(1)
|
||||||
|
if ip or cpu != 2 or memory != 2048 or disk != "20G":
|
||||||
|
console.print("[bold red]Error:[/bold red] --regenerate-token only regenerates the token. Other options (--ip, --cpu, --memory, --disk) are ignored.\n")
|
||||||
|
console.print("[yellow]Tip:[/yellow] Use without those options, or use --force to update the entire configuration.\n")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
try:
|
||||||
|
console.print(f"\n[bold blue]Regenerating Vault token for {hostname}...[/bold blue]")
|
||||||
|
|
||||||
|
# Validate hostname exists
|
||||||
|
host_dir = repo_root / "hosts" / hostname
|
||||||
|
if not host_dir.exists():
|
||||||
|
console.print(f"[bold red]Error:[/bold red] Host {hostname} does not exist")
|
||||||
|
console.print(f"Host directory not found: {host_dir}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Generate new wrapped token
|
||||||
|
wrapped_token = generate_wrapped_token(hostname, repo_root)
|
||||||
|
|
||||||
|
# Update only the wrapped token in vms.tf
|
||||||
|
add_wrapped_token_to_vm(hostname, wrapped_token, repo_root)
|
||||||
|
console.print("[green]✓[/green] Regenerated and updated wrapped token in terraform/vms.tf")
|
||||||
|
|
||||||
|
console.print("\n[bold green]✓ Token regenerated successfully![/bold green]")
|
||||||
|
console.print(f"\n[yellow]⚠️[/yellow] Token expires in 24 hours")
|
||||||
|
console.print(f"[yellow]⚠️[/yellow] Deploy the VM within 24h or regenerate token again\n")
|
||||||
|
|
||||||
|
console.print("[bold cyan]Next steps:[/bold cyan]")
|
||||||
|
console.print(f" cd terraform && tofu apply")
|
||||||
|
console.print(f" # Then redeploy VM to pick up new token\n")
|
||||||
|
|
||||||
|
return
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
console.print(f"\n[bold red]Error regenerating token:[/bold red] {e}\n")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Build configuration
|
# Build configuration
|
||||||
config = HostConfig(
|
config = HostConfig(
|
||||||
@@ -68,9 +116,6 @@ def main(
|
|||||||
disk=disk,
|
disk=disk,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Get repository root
|
|
||||||
repo_root = get_repo_root()
|
|
||||||
|
|
||||||
# Validate configuration
|
# Validate configuration
|
||||||
console.print("\n[bold blue]Validating configuration...[/bold blue]")
|
console.print("\n[bold blue]Validating configuration...[/bold blue]")
|
||||||
|
|
||||||
@@ -116,11 +161,34 @@ def main(
|
|||||||
update_terraform_vms(config, repo_root, force=force)
|
update_terraform_vms(config, repo_root, force=force)
|
||||||
console.print("[green]✓[/green] Updated terraform/vms.tf")
|
console.print("[green]✓[/green] Updated terraform/vms.tf")
|
||||||
|
|
||||||
|
# Generate Vault configuration if not skipped
|
||||||
|
if not skip_vault:
|
||||||
|
console.print("\n[bold blue]Configuring Vault integration...[/bold blue]")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Generate Vault Terraform configuration
|
||||||
|
generate_vault_terraform(hostname, repo_root)
|
||||||
|
console.print("[green]✓[/green] Updated terraform/vault/hosts-generated.tf")
|
||||||
|
|
||||||
|
# Generate wrapped token
|
||||||
|
wrapped_token = generate_wrapped_token(hostname, repo_root)
|
||||||
|
|
||||||
|
# Add wrapped token to VM configuration
|
||||||
|
add_wrapped_token_to_vm(hostname, wrapped_token, repo_root)
|
||||||
|
console.print("[green]✓[/green] Added wrapped token to terraform/vms.tf")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
console.print(f"\n[yellow]⚠️ Vault configuration failed: {e}[/yellow]")
|
||||||
|
console.print("[yellow]Host configuration created without Vault integration[/yellow]")
|
||||||
|
console.print("[yellow]You can add Vault support later by re-running with --force[/yellow]\n")
|
||||||
|
else:
|
||||||
|
console.print("\n[yellow]Skipped Vault configuration (--skip-vault)[/yellow]")
|
||||||
|
|
||||||
# Success message
|
# Success message
|
||||||
console.print("\n[bold green]✓ Host configuration generated successfully![/bold green]\n")
|
console.print("\n[bold green]✓ Host configuration generated successfully![/bold green]\n")
|
||||||
|
|
||||||
# Display next steps
|
# Display next steps
|
||||||
display_next_steps(hostname)
|
display_next_steps(hostname, skip_vault=skip_vault)
|
||||||
|
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
console.print(f"\n[bold red]Error:[/bold red] {e}\n", style="red")
|
console.print(f"\n[bold red]Error:[/bold red] {e}\n", style="red")
|
||||||
@@ -164,8 +232,18 @@ def display_dry_run_summary(config: HostConfig, repo_root: Path) -> None:
|
|||||||
console.print(f" • {repo_root}/terraform/vms.tf (add VM definition)")
|
console.print(f" • {repo_root}/terraform/vms.tf (add VM definition)")
|
||||||
|
|
||||||
|
|
||||||
def display_next_steps(hostname: str) -> None:
|
def display_next_steps(hostname: str, skip_vault: bool = False) -> None:
|
||||||
"""Display next steps after successful generation."""
|
"""Display next steps after successful generation."""
|
||||||
|
vault_files = "" if skip_vault else " terraform/vault/hosts-generated.tf"
|
||||||
|
vault_apply = ""
|
||||||
|
|
||||||
|
if not skip_vault:
|
||||||
|
vault_apply = """
|
||||||
|
4a. Apply Vault configuration:
|
||||||
|
[white]cd terraform/vault
|
||||||
|
tofu apply[/white]
|
||||||
|
"""
|
||||||
|
|
||||||
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
|
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
|
||||||
|
|
||||||
1. Review changes:
|
1. Review changes:
|
||||||
@@ -181,14 +259,16 @@ def display_next_steps(hostname: str) -> None:
|
|||||||
tofu plan[/white]
|
tofu plan[/white]
|
||||||
|
|
||||||
4. Commit changes:
|
4. Commit changes:
|
||||||
[white]git add hosts/{hostname} flake.nix terraform/vms.tf
|
[white]git add hosts/{hostname} flake.nix terraform/vms.tf{vault_files}
|
||||||
git commit -m "hosts: add {hostname} configuration"[/white]
|
git commit -m "hosts: add {hostname} configuration"[/white]
|
||||||
|
{vault_apply}
|
||||||
5. Deploy VM (after merging to master):
|
5. Deploy VM (after merging to master or within 24h of token generation):
|
||||||
[white]cd terraform
|
[white]cd terraform
|
||||||
tofu apply[/white]
|
tofu apply[/white]
|
||||||
|
|
||||||
6. Bootstrap the host (see Phase 3 of deployment pipeline)
|
6. Host will bootstrap automatically on first boot
|
||||||
|
- Wrapped token expires in 24 hours
|
||||||
|
- If expired, re-run: create-host --hostname {hostname} --force
|
||||||
"""
|
"""
|
||||||
console.print(Panel(next_steps, border_style="cyan"))
|
console.print(Panel(next_steps, border_style="cyan"))
|
||||||
|
|
||||||
|
|||||||
@@ -19,6 +19,7 @@ python3Packages.buildPythonApplication {
|
|||||||
typer
|
typer
|
||||||
jinja2
|
jinja2
|
||||||
rich
|
rich
|
||||||
|
hvac # Python Vault/OpenBao client library
|
||||||
];
|
];
|
||||||
|
|
||||||
# Install templates to share directory
|
# Install templates to share directory
|
||||||
|
|||||||
@@ -86,3 +86,114 @@ def generate_host_files(config: HostConfig, repo_root: Path) -> None:
|
|||||||
state_version=config.state_version,
|
state_version=config.state_version,
|
||||||
)
|
)
|
||||||
(host_dir / "configuration.nix").write_text(config_content)
|
(host_dir / "configuration.nix").write_text(config_content)
|
||||||
|
|
||||||
|
|
||||||
|
def generate_vault_terraform(hostname: str, repo_root: Path) -> None:
|
||||||
|
"""
|
||||||
|
Generate or update Vault Terraform configuration for a new host.
|
||||||
|
|
||||||
|
Creates/updates terraform/vault/hosts-generated.tf with:
|
||||||
|
- Host policy granting access to hosts/<hostname>/* secrets
|
||||||
|
- AppRole configuration for the host
|
||||||
|
- Placeholder secret entry (user adds actual secrets separately)
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Hostname for the new host
|
||||||
|
repo_root: Path to repository root
|
||||||
|
"""
|
||||||
|
vault_tf_path = repo_root / "terraform" / "vault" / "hosts-generated.tf"
|
||||||
|
|
||||||
|
# Read existing file if it exists, otherwise start with empty structure
|
||||||
|
if vault_tf_path.exists():
|
||||||
|
content = vault_tf_path.read_text()
|
||||||
|
else:
|
||||||
|
# Create initial file structure
|
||||||
|
content = """# WARNING: Auto-generated by create-host tool
|
||||||
|
# Manual edits will be overwritten when create-host is run
|
||||||
|
|
||||||
|
# Generated host policies
|
||||||
|
# Each host gets access to its own secrets under hosts/<hostname>/*
|
||||||
|
locals {
|
||||||
|
generated_host_policies = {
|
||||||
|
}
|
||||||
|
|
||||||
|
# Placeholder secrets - user should add actual secrets manually or via tofu
|
||||||
|
generated_secrets = {
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create policies for generated hosts
|
||||||
|
resource "vault_policy" "generated_host_policies" {
|
||||||
|
for_each = local.generated_host_policies
|
||||||
|
|
||||||
|
name = "host-\${each.key}"
|
||||||
|
|
||||||
|
policy = <<-EOT
|
||||||
|
# Allow host to read its own secrets
|
||||||
|
%{for path in each.value.paths~}
|
||||||
|
path "${path}" {
|
||||||
|
capabilities = ["read", "list"]
|
||||||
|
}
|
||||||
|
%{endfor~}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create AppRoles for generated hosts
|
||||||
|
resource "vault_approle_auth_backend_role" "generated_hosts" {
|
||||||
|
for_each = local.generated_host_policies
|
||||||
|
|
||||||
|
backend = vault_auth_backend.approle.path
|
||||||
|
role_name = each.key
|
||||||
|
token_policies = ["host-\${each.key}"]
|
||||||
|
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
||||||
|
token_ttl = 3600
|
||||||
|
token_max_ttl = 3600
|
||||||
|
secret_id_num_uses = 0 # Unlimited uses
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Parse existing policies from the file
|
||||||
|
import re
|
||||||
|
|
||||||
|
policies_match = re.search(
|
||||||
|
r'generated_host_policies = \{(.*?)\n \}',
|
||||||
|
content,
|
||||||
|
re.DOTALL
|
||||||
|
)
|
||||||
|
|
||||||
|
if policies_match:
|
||||||
|
policies_content = policies_match.group(1)
|
||||||
|
else:
|
||||||
|
policies_content = ""
|
||||||
|
|
||||||
|
# Check if hostname already exists
|
||||||
|
if f'"{hostname}"' in policies_content:
|
||||||
|
# Already exists, don't duplicate
|
||||||
|
return
|
||||||
|
|
||||||
|
# Add new policy entry
|
||||||
|
new_policy = f'''
|
||||||
|
"{hostname}" = {{
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/{hostname}/*",
|
||||||
|
]
|
||||||
|
}}'''
|
||||||
|
|
||||||
|
# Insert before the closing brace
|
||||||
|
if policies_content.strip():
|
||||||
|
# There are existing entries, add after them
|
||||||
|
new_policies_content = policies_content.rstrip() + new_policy + "\n "
|
||||||
|
else:
|
||||||
|
# First entry
|
||||||
|
new_policies_content = new_policy + "\n "
|
||||||
|
|
||||||
|
# Replace the policies map
|
||||||
|
new_content = re.sub(
|
||||||
|
r'(generated_host_policies = \{)(.*?)(\n \})',
|
||||||
|
rf'\1{new_policies_content}\3',
|
||||||
|
content,
|
||||||
|
flags=re.DOTALL
|
||||||
|
)
|
||||||
|
|
||||||
|
# Write the updated file
|
||||||
|
vault_tf_path.write_text(new_content)
|
||||||
|
|||||||
@@ -122,3 +122,63 @@ def update_terraform_vms(config: HostConfig, repo_root: Path, force: bool = Fals
|
|||||||
)
|
)
|
||||||
|
|
||||||
terraform_path.write_text(new_content)
|
terraform_path.write_text(new_content)
|
||||||
|
|
||||||
|
|
||||||
|
def add_wrapped_token_to_vm(hostname: str, wrapped_token: str, repo_root: Path) -> None:
|
||||||
|
"""
|
||||||
|
Add or update the vault_wrapped_token field in an existing VM entry.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Hostname of the VM
|
||||||
|
wrapped_token: The wrapped token to add
|
||||||
|
repo_root: Path to repository root
|
||||||
|
"""
|
||||||
|
terraform_path = repo_root / "terraform" / "vms.tf"
|
||||||
|
content = terraform_path.read_text()
|
||||||
|
|
||||||
|
# Find the VM entry
|
||||||
|
hostname_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
|
||||||
|
match = re.search(hostname_pattern, content, re.MULTILINE)
|
||||||
|
|
||||||
|
if not match:
|
||||||
|
raise ValueError(f"Could not find VM entry for {hostname} in terraform/vms.tf")
|
||||||
|
|
||||||
|
# Find the full VM block
|
||||||
|
block_pattern = rf'(^\s+"{re.escape(hostname)}" = \{{)(.*?)(^\s+\}})'
|
||||||
|
block_match = re.search(block_pattern, content, re.MULTILINE | re.DOTALL)
|
||||||
|
|
||||||
|
if not block_match:
|
||||||
|
raise ValueError(f"Could not parse VM block for {hostname}")
|
||||||
|
|
||||||
|
block_start = block_match.group(1)
|
||||||
|
block_content = block_match.group(2)
|
||||||
|
block_end = block_match.group(3)
|
||||||
|
|
||||||
|
# Check if vault_wrapped_token already exists
|
||||||
|
if "vault_wrapped_token" in block_content:
|
||||||
|
# Update existing token
|
||||||
|
block_content = re.sub(
|
||||||
|
r'vault_wrapped_token\s*=\s*"[^"]*"',
|
||||||
|
f'vault_wrapped_token = "{wrapped_token}"',
|
||||||
|
block_content
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# Add new token field (add before closing brace)
|
||||||
|
# Find the last field and add after it
|
||||||
|
block_content = block_content.rstrip()
|
||||||
|
if block_content and not block_content.endswith("\n"):
|
||||||
|
block_content += "\n"
|
||||||
|
block_content += f' vault_wrapped_token = "{wrapped_token}"\n'
|
||||||
|
|
||||||
|
# Reconstruct the block
|
||||||
|
new_block = block_start + block_content + block_end
|
||||||
|
|
||||||
|
# Replace in content
|
||||||
|
new_content = re.sub(
|
||||||
|
rf'^\s+"{re.escape(hostname)}" = \{{.*?^\s+\}}',
|
||||||
|
new_block,
|
||||||
|
content,
|
||||||
|
flags=re.MULTILINE | re.DOTALL
|
||||||
|
)
|
||||||
|
|
||||||
|
terraform_path.write_text(new_content)
|
||||||
|
|||||||
@@ -14,6 +14,7 @@ setup(
|
|||||||
"validators",
|
"validators",
|
||||||
"generators",
|
"generators",
|
||||||
"manipulators",
|
"manipulators",
|
||||||
|
"vault_helper",
|
||||||
],
|
],
|
||||||
include_package_data=True,
|
include_package_data=True,
|
||||||
data_files=[
|
data_files=[
|
||||||
@@ -23,6 +24,7 @@ setup(
|
|||||||
"typer",
|
"typer",
|
||||||
"jinja2",
|
"jinja2",
|
||||||
"rich",
|
"rich",
|
||||||
|
"hvac",
|
||||||
],
|
],
|
||||||
entry_points={
|
entry_points={
|
||||||
"console_scripts": [
|
"console_scripts": [
|
||||||
|
|||||||
178
scripts/create-host/vault_helper.py
Normal file
178
scripts/create-host/vault_helper.py
Normal file
@@ -0,0 +1,178 @@
|
|||||||
|
"""Helper functions for Vault/OpenBao API interactions."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import hvac
|
||||||
|
import typer
|
||||||
|
|
||||||
|
|
||||||
|
def get_vault_client(vault_addr: Optional[str] = None, vault_token: Optional[str] = None) -> hvac.Client:
|
||||||
|
"""
|
||||||
|
Get a Vault client instance.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
vault_addr: Vault server address (defaults to BAO_ADDR env var or hardcoded default)
|
||||||
|
vault_token: Vault token (defaults to BAO_TOKEN env var or prompts user)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Configured hvac.Client instance
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
typer.Exit: If unable to create client or authenticate
|
||||||
|
"""
|
||||||
|
# Get Vault address
|
||||||
|
if vault_addr is None:
|
||||||
|
vault_addr = os.getenv("BAO_ADDR", "https://vault01.home.2rjus.net:8200")
|
||||||
|
|
||||||
|
# Get Vault token
|
||||||
|
if vault_token is None:
|
||||||
|
vault_token = os.getenv("BAO_TOKEN")
|
||||||
|
|
||||||
|
if not vault_token:
|
||||||
|
typer.echo("\n⚠️ Vault token required. Set BAO_TOKEN environment variable or enter it below.")
|
||||||
|
vault_token = typer.prompt("Vault token (BAO_TOKEN)", hide_input=True)
|
||||||
|
|
||||||
|
# Create client
|
||||||
|
try:
|
||||||
|
client = hvac.Client(url=vault_addr, token=vault_token, verify=False)
|
||||||
|
|
||||||
|
# Verify authentication
|
||||||
|
if not client.is_authenticated():
|
||||||
|
typer.echo(f"\n❌ Failed to authenticate to Vault at {vault_addr}", err=True)
|
||||||
|
typer.echo("Check your BAO_TOKEN and ensure Vault is accessible", err=True)
|
||||||
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
return client
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
typer.echo(f"\n❌ Error connecting to Vault: {e}", err=True)
|
||||||
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
|
||||||
|
def generate_wrapped_token(hostname: str, repo_root: Path) -> str:
|
||||||
|
"""
|
||||||
|
Generate a wrapped token containing AppRole credentials for a host.
|
||||||
|
|
||||||
|
This function:
|
||||||
|
1. Applies Terraform to ensure the AppRole exists
|
||||||
|
2. Reads the role_id for the host
|
||||||
|
3. Generates a secret_id
|
||||||
|
4. Wraps both credentials in a cubbyhole token (24h TTL, single-use)
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: The host to generate credentials for
|
||||||
|
repo_root: Path to repository root (for running terraform)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Wrapped token string (hvs.CAES...)
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
typer.Exit: If Terraform fails or Vault operations fail
|
||||||
|
"""
|
||||||
|
from rich.console import Console
|
||||||
|
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# Get Vault client
|
||||||
|
client = get_vault_client()
|
||||||
|
|
||||||
|
# First, apply Terraform to ensure AppRole exists
|
||||||
|
console.print(f"\n[bold blue]Applying Vault Terraform configuration...[/bold blue]")
|
||||||
|
terraform_dir = repo_root / "terraform" / "vault"
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["tofu", "apply", "-auto-approve"],
|
||||||
|
cwd=terraform_dir,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
console.print(f"[red]❌ Terraform apply failed:[/red]")
|
||||||
|
console.print(result.stderr)
|
||||||
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
console.print("[green]✓[/green] Terraform applied successfully")
|
||||||
|
|
||||||
|
except FileNotFoundError:
|
||||||
|
console.print(f"[red]❌ Error: 'tofu' command not found[/red]")
|
||||||
|
console.print("Ensure OpenTofu is installed and in PATH")
|
||||||
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
# Read role_id
|
||||||
|
try:
|
||||||
|
console.print(f"[bold blue]Reading AppRole credentials for {hostname}...[/bold blue]")
|
||||||
|
role_id_response = client.read(f"auth/approle/role/{hostname}/role-id")
|
||||||
|
role_id = role_id_response["data"]["role_id"]
|
||||||
|
console.print(f"[green]✓[/green] Retrieved role_id")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
console.print(f"[red]❌ Failed to read role_id for {hostname}:[/red] {e}")
|
||||||
|
console.print(f"\nEnsure the AppRole '{hostname}' exists in Vault")
|
||||||
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
# Generate secret_id
|
||||||
|
try:
|
||||||
|
secret_id_response = client.write(f"auth/approle/role/{hostname}/secret-id")
|
||||||
|
secret_id = secret_id_response["data"]["secret_id"]
|
||||||
|
console.print(f"[green]✓[/green] Generated secret_id")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
console.print(f"[red]❌ Failed to generate secret_id:[/red] {e}")
|
||||||
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
# Wrap the credentials in a cubbyhole token
|
||||||
|
try:
|
||||||
|
console.print(f"[bold blue]Creating wrapped token (24h TTL, single-use)...[/bold blue]")
|
||||||
|
|
||||||
|
# Use the response wrapping feature to wrap our credentials
|
||||||
|
# This creates a temporary token that can only be used once to retrieve the actual credentials
|
||||||
|
wrap_response = client.write(
|
||||||
|
"sys/wrapping/wrap",
|
||||||
|
wrap_ttl="24h",
|
||||||
|
# The data we're wrapping
|
||||||
|
role_id=role_id,
|
||||||
|
secret_id=secret_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
wrapped_token = wrap_response["wrap_info"]["token"]
|
||||||
|
console.print(f"[green]✓[/green] Created wrapped token: {wrapped_token[:20]}...")
|
||||||
|
console.print(f"[yellow]⚠️[/yellow] Token expires in 24 hours")
|
||||||
|
console.print(f"[yellow]⚠️[/yellow] Token can only be used once")
|
||||||
|
|
||||||
|
return wrapped_token
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
console.print(f"[red]❌ Failed to create wrapped token:[/red] {e}")
|
||||||
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
|
||||||
|
def verify_vault_setup(hostname: str) -> bool:
|
||||||
|
"""
|
||||||
|
Verify that Vault is properly configured for a host.
|
||||||
|
|
||||||
|
Checks:
|
||||||
|
- Vault is accessible
|
||||||
|
- AppRole exists for the hostname
|
||||||
|
- Can read role_id
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: The host to verify
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if everything is configured correctly, False otherwise
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
client = get_vault_client()
|
||||||
|
|
||||||
|
# Try to read the role_id
|
||||||
|
client.read(f"auth/approle/role/{hostname}/role-id")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
78
scripts/vault-fetch/README.md
Normal file
78
scripts/vault-fetch/README.md
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
# vault-fetch
|
||||||
|
|
||||||
|
A helper script for fetching secrets from OpenBao/Vault and writing them to the filesystem.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **AppRole Authentication**: Uses role_id and secret_id from `/var/lib/vault/approle/`
|
||||||
|
- **Individual Secret Files**: Writes each secret key as a separate file for easy consumption
|
||||||
|
- **Caching**: Maintains a cache of secrets for fallback when Vault is unreachable
|
||||||
|
- **Graceful Degradation**: Falls back to cached secrets if Vault authentication fails
|
||||||
|
- **Secure Permissions**: Sets 600 permissions on all secret files
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vault-fetch <secret-path> <output-directory> [cache-directory]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Fetch Grafana admin secrets
|
||||||
|
vault-fetch hosts/monitoring01/grafana-admin /run/secrets/grafana /var/lib/vault/cache/grafana
|
||||||
|
|
||||||
|
# Use default cache location
|
||||||
|
vault-fetch hosts/monitoring01/grafana-admin /run/secrets/grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
1. **Read Credentials**: Loads `role_id` and `secret_id` from `/var/lib/vault/approle/`
|
||||||
|
2. **Authenticate**: Calls `POST /v1/auth/approle/login` to get a Vault token
|
||||||
|
3. **Fetch Secret**: Retrieves secret from `GET /v1/secret/data/{path}`
|
||||||
|
4. **Extract Keys**: Parses JSON response and extracts individual secret keys
|
||||||
|
5. **Write Files**: Creates one file per secret key in output directory
|
||||||
|
6. **Update Cache**: Copies secrets to cache directory for fallback
|
||||||
|
7. **Set Permissions**: Ensures all files have 600 permissions (owner read/write only)
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
If Vault is unreachable or authentication fails:
|
||||||
|
- Script logs a warning to stderr
|
||||||
|
- Falls back to cached secrets from previous successful fetch
|
||||||
|
- Exits with error code 1 if no cache is available
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
- `VAULT_ADDR`: Vault server address (default: `https://vault01.home.2rjus.net:8200`)
|
||||||
|
- `VAULT_SKIP_VERIFY`: Skip TLS verification (default: `1`)
|
||||||
|
|
||||||
|
## Integration with NixOS
|
||||||
|
|
||||||
|
This tool is designed to be called from systemd service `ExecStartPre` hooks via the `vault.secrets` NixOS module:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
vault.secrets.grafana-admin = {
|
||||||
|
secretPath = "hosts/monitoring01/grafana-admin";
|
||||||
|
};
|
||||||
|
|
||||||
|
# Service automatically gets secrets fetched before start
|
||||||
|
systemd.services.grafana.serviceConfig = {
|
||||||
|
EnvironmentFile = "/run/secrets/grafana-admin/password";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- `curl`: For Vault API calls
|
||||||
|
- `jq`: For JSON parsing
|
||||||
|
- `coreutils`: For file operations
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
- AppRole credentials stored at `/var/lib/vault/approle/` should be root-owned with 600 permissions
|
||||||
|
- Tokens are ephemeral and not stored - fresh authentication on each fetch
|
||||||
|
- Secrets written to tmpfs (`/run/secrets/`) are lost on reboot
|
||||||
|
- Cache directory persists across reboots for service availability
|
||||||
|
- All secret files have restrictive permissions (600)
|
||||||
18
scripts/vault-fetch/default.nix
Normal file
18
scripts/vault-fetch/default.nix
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
{ pkgs, lib, ... }:
|
||||||
|
|
||||||
|
pkgs.writeShellApplication {
|
||||||
|
name = "vault-fetch";
|
||||||
|
|
||||||
|
runtimeInputs = with pkgs; [
|
||||||
|
curl # Vault API calls
|
||||||
|
jq # JSON parsing
|
||||||
|
coreutils # File operations
|
||||||
|
];
|
||||||
|
|
||||||
|
text = builtins.readFile ./vault-fetch.sh;
|
||||||
|
|
||||||
|
meta = with lib; {
|
||||||
|
description = "Fetch secrets from OpenBao/Vault and write to filesystem";
|
||||||
|
license = licenses.mit;
|
||||||
|
};
|
||||||
|
}
|
||||||
152
scripts/vault-fetch/vault-fetch.sh
Normal file
152
scripts/vault-fetch/vault-fetch.sh
Normal file
@@ -0,0 +1,152 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# vault-fetch: Fetch secrets from OpenBao/Vault and write to filesystem
|
||||||
|
#
|
||||||
|
# Usage: vault-fetch <secret-path> <output-directory> [cache-directory]
|
||||||
|
#
|
||||||
|
# Example: vault-fetch hosts/monitoring01/grafana-admin /run/secrets/grafana /var/lib/vault/cache/grafana
|
||||||
|
#
|
||||||
|
# This script:
|
||||||
|
# 1. Authenticates to Vault using AppRole credentials from /var/lib/vault/approle/
|
||||||
|
# 2. Fetches secrets from the specified path
|
||||||
|
# 3. Writes each secret key as an individual file in the output directory
|
||||||
|
# 4. Updates cache for fallback when Vault is unreachable
|
||||||
|
# 5. Falls back to cache if Vault authentication fails or is unreachable
|
||||||
|
|
||||||
|
# Parse arguments
|
||||||
|
if [ $# -lt 2 ]; then
|
||||||
|
echo "Usage: vault-fetch <secret-path> <output-directory> [cache-directory]" >&2
|
||||||
|
echo "Example: vault-fetch hosts/monitoring01/grafana /run/secrets/grafana /var/lib/vault/cache/grafana" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
SECRET_PATH="$1"
|
||||||
|
OUTPUT_DIR="$2"
|
||||||
|
CACHE_DIR="${3:-/var/lib/vault/cache/$(basename "$OUTPUT_DIR")}"
|
||||||
|
|
||||||
|
# Vault configuration
|
||||||
|
VAULT_ADDR="${VAULT_ADDR:-https://vault01.home.2rjus.net:8200}"
|
||||||
|
VAULT_SKIP_VERIFY="${VAULT_SKIP_VERIFY:-1}"
|
||||||
|
APPROLE_DIR="/var/lib/vault/approle"
|
||||||
|
|
||||||
|
# TLS verification flag for curl
|
||||||
|
if [ "$VAULT_SKIP_VERIFY" = "1" ]; then
|
||||||
|
CURL_TLS_FLAG="-k"
|
||||||
|
else
|
||||||
|
CURL_TLS_FLAG=""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Logging helper
|
||||||
|
log() {
|
||||||
|
echo "[vault-fetch] $*" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
# Error handler
|
||||||
|
error() {
|
||||||
|
log "ERROR: $*"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if cache is available
|
||||||
|
has_cache() {
|
||||||
|
[ -d "$CACHE_DIR" ] && [ -n "$(ls -A "$CACHE_DIR" 2>/dev/null)" ]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Use cached secrets
|
||||||
|
use_cache() {
|
||||||
|
if ! has_cache; then
|
||||||
|
error "No cache available and Vault is unreachable"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log "WARNING: Using cached secrets from $CACHE_DIR"
|
||||||
|
mkdir -p "$OUTPUT_DIR"
|
||||||
|
cp -r "$CACHE_DIR"/* "$OUTPUT_DIR/"
|
||||||
|
chmod -R u=rw,go= "$OUTPUT_DIR"/*
|
||||||
|
}
|
||||||
|
|
||||||
|
# Fetch secrets from Vault
|
||||||
|
fetch_from_vault() {
|
||||||
|
# Read AppRole credentials
|
||||||
|
if [ ! -f "$APPROLE_DIR/role-id" ] || [ ! -f "$APPROLE_DIR/secret-id" ]; then
|
||||||
|
log "WARNING: AppRole credentials not found at $APPROLE_DIR"
|
||||||
|
use_cache
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
ROLE_ID=$(cat "$APPROLE_DIR/role-id")
|
||||||
|
SECRET_ID=$(cat "$APPROLE_DIR/secret-id")
|
||||||
|
|
||||||
|
# Authenticate to Vault
|
||||||
|
log "Authenticating to Vault at $VAULT_ADDR"
|
||||||
|
AUTH_RESPONSE=$(curl -s $CURL_TLS_FLAG -X POST \
|
||||||
|
-d "{\"role_id\":\"$ROLE_ID\",\"secret_id\":\"$SECRET_ID\"}" \
|
||||||
|
"$VAULT_ADDR/v1/auth/approle/login" 2>&1) || {
|
||||||
|
log "WARNING: Failed to connect to Vault"
|
||||||
|
use_cache
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check for errors in response
|
||||||
|
if echo "$AUTH_RESPONSE" | jq -e '.errors' >/dev/null 2>&1; then
|
||||||
|
ERRORS=$(echo "$AUTH_RESPONSE" | jq -r '.errors[]' 2>/dev/null || echo "Unknown error")
|
||||||
|
log "WARNING: Vault authentication failed: $ERRORS"
|
||||||
|
use_cache
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Extract token
|
||||||
|
VAULT_TOKEN=$(echo "$AUTH_RESPONSE" | jq -r '.auth.client_token' 2>/dev/null)
|
||||||
|
if [ -z "$VAULT_TOKEN" ] || [ "$VAULT_TOKEN" = "null" ]; then
|
||||||
|
log "WARNING: Failed to extract Vault token from response"
|
||||||
|
use_cache
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
log "Successfully authenticated to Vault"
|
||||||
|
|
||||||
|
# Fetch secret
|
||||||
|
log "Fetching secret from path: $SECRET_PATH"
|
||||||
|
SECRET_RESPONSE=$(curl -s $CURL_TLS_FLAG \
|
||||||
|
-H "X-Vault-Token: $VAULT_TOKEN" \
|
||||||
|
"$VAULT_ADDR/v1/secret/data/$SECRET_PATH" 2>&1) || {
|
||||||
|
log "WARNING: Failed to fetch secret from Vault"
|
||||||
|
use_cache
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check for errors
|
||||||
|
if echo "$SECRET_RESPONSE" | jq -e '.errors' >/dev/null 2>&1; then
|
||||||
|
ERRORS=$(echo "$SECRET_RESPONSE" | jq -r '.errors[]' 2>/dev/null || echo "Unknown error")
|
||||||
|
log "WARNING: Failed to fetch secret: $ERRORS"
|
||||||
|
use_cache
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Extract secret data
|
||||||
|
SECRET_DATA=$(echo "$SECRET_RESPONSE" | jq -r '.data.data' 2>/dev/null)
|
||||||
|
if [ -z "$SECRET_DATA" ] || [ "$SECRET_DATA" = "null" ]; then
|
||||||
|
log "WARNING: No secret data found at path $SECRET_PATH"
|
||||||
|
use_cache
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create output and cache directories
|
||||||
|
mkdir -p "$OUTPUT_DIR"
|
||||||
|
mkdir -p "$CACHE_DIR"
|
||||||
|
|
||||||
|
# Write each secret key to a separate file
|
||||||
|
log "Writing secrets to $OUTPUT_DIR"
|
||||||
|
echo "$SECRET_DATA" | jq -r 'to_entries[] | "\(.key)\n\(.value)"' | while read -r key; read -r value; do
|
||||||
|
echo -n "$value" > "$OUTPUT_DIR/$key"
|
||||||
|
echo -n "$value" > "$CACHE_DIR/$key"
|
||||||
|
chmod 600 "$OUTPUT_DIR/$key"
|
||||||
|
chmod 600 "$CACHE_DIR/$key"
|
||||||
|
log " - Wrote secret key: $key"
|
||||||
|
done
|
||||||
|
|
||||||
|
log "Successfully fetched and cached secrets"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Main execution
|
||||||
|
fetch_from_vault
|
||||||
@@ -10,5 +10,6 @@
|
|||||||
./root-ca.nix
|
./root-ca.nix
|
||||||
./sops.nix
|
./sops.nix
|
||||||
./sshd.nix
|
./sshd.nix
|
||||||
|
./vault-secrets.nix
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
|
|||||||
223
system/vault-secrets.nix
Normal file
223
system/vault-secrets.nix
Normal file
@@ -0,0 +1,223 @@
|
|||||||
|
{ config, lib, pkgs, ... }:
|
||||||
|
|
||||||
|
with lib;
|
||||||
|
|
||||||
|
let
|
||||||
|
cfg = config.vault;
|
||||||
|
|
||||||
|
# Import vault-fetch package
|
||||||
|
vault-fetch = pkgs.callPackage ../scripts/vault-fetch { };
|
||||||
|
|
||||||
|
# Secret configuration type
|
||||||
|
secretType = types.submodule ({ name, config, ... }: {
|
||||||
|
options = {
|
||||||
|
secretPath = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
description = ''
|
||||||
|
Path to the secret in Vault (without /v1/secret/data/ prefix).
|
||||||
|
Example: "hosts/monitoring01/grafana-admin"
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
outputDir = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
default = "/run/secrets/${name}";
|
||||||
|
description = ''
|
||||||
|
Directory where secret files will be written.
|
||||||
|
Each key in the secret becomes a separate file.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
cacheDir = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
default = "/var/lib/vault/cache/${name}";
|
||||||
|
description = ''
|
||||||
|
Directory for caching secrets when Vault is unreachable.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
owner = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
default = "root";
|
||||||
|
description = "Owner of the secret files";
|
||||||
|
};
|
||||||
|
|
||||||
|
group = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
default = "root";
|
||||||
|
description = "Group of the secret files";
|
||||||
|
};
|
||||||
|
|
||||||
|
mode = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
default = "0400";
|
||||||
|
description = "Permissions mode for secret files";
|
||||||
|
};
|
||||||
|
|
||||||
|
restartTrigger = mkOption {
|
||||||
|
type = types.bool;
|
||||||
|
default = false;
|
||||||
|
description = ''
|
||||||
|
Whether to create a systemd timer that periodically restarts
|
||||||
|
services using this secret to rotate credentials.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
restartInterval = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
default = "weekly";
|
||||||
|
description = ''
|
||||||
|
How often to restart services for secret rotation.
|
||||||
|
Uses systemd.time format (e.g., "daily", "weekly", "monthly").
|
||||||
|
Only applies if restartTrigger is true.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
services = mkOption {
|
||||||
|
type = types.listOf types.str;
|
||||||
|
default = [];
|
||||||
|
description = ''
|
||||||
|
List of systemd service names that depend on this secret.
|
||||||
|
Used for periodic restart if restartTrigger is enabled.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
};
|
||||||
|
});
|
||||||
|
|
||||||
|
in
|
||||||
|
{
|
||||||
|
options.vault = {
|
||||||
|
enable = mkEnableOption "Vault secrets management" // {
|
||||||
|
default = false;
|
||||||
|
};
|
||||||
|
|
||||||
|
secrets = mkOption {
|
||||||
|
type = types.attrsOf secretType;
|
||||||
|
default = {};
|
||||||
|
description = ''
|
||||||
|
Secrets to fetch from Vault.
|
||||||
|
Each attribute name becomes a secret identifier.
|
||||||
|
'';
|
||||||
|
example = literalExpression ''
|
||||||
|
{
|
||||||
|
grafana-admin = {
|
||||||
|
secretPath = "hosts/monitoring01/grafana-admin";
|
||||||
|
owner = "grafana";
|
||||||
|
group = "grafana";
|
||||||
|
restartTrigger = true;
|
||||||
|
restartInterval = "daily";
|
||||||
|
services = [ "grafana" ];
|
||||||
|
};
|
||||||
|
}
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
criticalServices = mkOption {
|
||||||
|
type = types.listOf types.str;
|
||||||
|
default = [ "bind" "openbao" "step-ca" ];
|
||||||
|
description = ''
|
||||||
|
Services that should never get auto-restart timers for secret rotation.
|
||||||
|
These are critical infrastructure services where automatic restarts
|
||||||
|
could cause cascading failures.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
vaultAddress = mkOption {
|
||||||
|
type = types.str;
|
||||||
|
default = "https://vault01.home.2rjus.net:8200";
|
||||||
|
description = "Vault server address";
|
||||||
|
};
|
||||||
|
|
||||||
|
skipTlsVerify = mkOption {
|
||||||
|
type = types.bool;
|
||||||
|
default = true;
|
||||||
|
description = "Skip TLS certificate verification (useful for self-signed certs)";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
config = mkIf (cfg.enable && cfg.secrets != {}) {
|
||||||
|
# Create systemd services for fetching secrets and rotation
|
||||||
|
systemd.services =
|
||||||
|
# Fetch services
|
||||||
|
(mapAttrs' (name: secretCfg: nameValuePair "vault-secret-${name}" {
|
||||||
|
description = "Fetch Vault secret: ${name}";
|
||||||
|
before = map (svc: "${svc}.service") secretCfg.services;
|
||||||
|
wantedBy = [ "multi-user.target" ];
|
||||||
|
|
||||||
|
# Ensure vault-fetch is available
|
||||||
|
path = [ vault-fetch ];
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
RemainAfterExit = true;
|
||||||
|
|
||||||
|
# Fetch the secret
|
||||||
|
ExecStart = pkgs.writeShellScript "fetch-${name}" ''
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Set Vault environment variables
|
||||||
|
export VAULT_ADDR="${cfg.vaultAddress}"
|
||||||
|
export VAULT_SKIP_VERIFY="${if cfg.skipTlsVerify then "1" else "0"}"
|
||||||
|
|
||||||
|
# Fetch secret using vault-fetch
|
||||||
|
${vault-fetch}/bin/vault-fetch \
|
||||||
|
"${secretCfg.secretPath}" \
|
||||||
|
"${secretCfg.outputDir}" \
|
||||||
|
"${secretCfg.cacheDir}"
|
||||||
|
|
||||||
|
# Set ownership and permissions
|
||||||
|
chown -R ${secretCfg.owner}:${secretCfg.group} "${secretCfg.outputDir}"
|
||||||
|
chmod ${secretCfg.mode} "${secretCfg.outputDir}"/*
|
||||||
|
'';
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
StandardOutput = "journal";
|
||||||
|
StandardError = "journal";
|
||||||
|
};
|
||||||
|
}) cfg.secrets)
|
||||||
|
//
|
||||||
|
# Rotation services
|
||||||
|
(mapAttrs' (name: secretCfg: nameValuePair "vault-secret-rotate-${name}"
|
||||||
|
(mkIf (secretCfg.restartTrigger && secretCfg.services != [] &&
|
||||||
|
!any (svc: elem svc cfg.criticalServices) secretCfg.services) {
|
||||||
|
description = "Rotate Vault secret and restart services: ${name}";
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
};
|
||||||
|
|
||||||
|
script = ''
|
||||||
|
# Restart the secret fetch service
|
||||||
|
systemctl restart vault-secret-${name}.service
|
||||||
|
|
||||||
|
# Restart all dependent services
|
||||||
|
${concatMapStringsSep "\n" (svc: "systemctl restart ${svc}.service") secretCfg.services}
|
||||||
|
'';
|
||||||
|
})
|
||||||
|
) cfg.secrets);
|
||||||
|
|
||||||
|
# Create systemd timers for periodic secret rotation (if enabled)
|
||||||
|
systemd.timers = mapAttrs' (name: secretCfg: nameValuePair "vault-secret-rotate-${name}"
|
||||||
|
(mkIf (secretCfg.restartTrigger && secretCfg.services != [] &&
|
||||||
|
!any (svc: elem svc cfg.criticalServices) secretCfg.services) {
|
||||||
|
description = "Rotate Vault secret and restart services: ${name}";
|
||||||
|
wantedBy = [ "timers.target" ];
|
||||||
|
|
||||||
|
timerConfig = {
|
||||||
|
OnCalendar = secretCfg.restartInterval;
|
||||||
|
Persistent = true;
|
||||||
|
RandomizedDelaySec = "1h";
|
||||||
|
};
|
||||||
|
})
|
||||||
|
) cfg.secrets;
|
||||||
|
|
||||||
|
# Ensure runtime and cache directories exist
|
||||||
|
systemd.tmpfiles.rules =
|
||||||
|
[ "d /run/secrets 0755 root root -" ] ++
|
||||||
|
[ "d /var/lib/vault/cache 0700 root root -" ] ++
|
||||||
|
flatten (mapAttrsToList (name: secretCfg: [
|
||||||
|
"d ${secretCfg.outputDir} 0755 root root -"
|
||||||
|
"d ${secretCfg.cacheDir} 0700 root root -"
|
||||||
|
]) cfg.secrets);
|
||||||
|
};
|
||||||
|
}
|
||||||
@@ -10,18 +10,25 @@ resource "proxmox_cloud_init_disk" "ci" {
|
|||||||
pve_node = each.value.target_node
|
pve_node = each.value.target_node
|
||||||
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets
|
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets
|
||||||
|
|
||||||
# User data includes SSH keys and optionally NIXOS_FLAKE_BRANCH
|
# User data includes SSH keys and optionally NIXOS_FLAKE_BRANCH and Vault credentials
|
||||||
user_data = <<-EOT
|
user_data = <<-EOT
|
||||||
#cloud-config
|
#cloud-config
|
||||||
ssh_authorized_keys:
|
ssh_authorized_keys:
|
||||||
- ${each.value.ssh_public_key}
|
- ${each.value.ssh_public_key}
|
||||||
${each.value.flake_branch != null ? <<-BRANCH
|
${each.value.flake_branch != null || each.value.vault_wrapped_token != null ? <<-FILES
|
||||||
write_files:
|
write_files:
|
||||||
- path: /etc/environment
|
- path: /run/cloud-init-env
|
||||||
content: |
|
content: |
|
||||||
|
%{~if each.value.flake_branch != null~}
|
||||||
NIXOS_FLAKE_BRANCH=${each.value.flake_branch}
|
NIXOS_FLAKE_BRANCH=${each.value.flake_branch}
|
||||||
append: true
|
%{~endif~}
|
||||||
BRANCH
|
%{~if each.value.vault_wrapped_token != null~}
|
||||||
|
VAULT_ADDR=https://vault01.home.2rjus.net:8200
|
||||||
|
VAULT_WRAPPED_TOKEN=${each.value.vault_wrapped_token}
|
||||||
|
VAULT_SKIP_VERIFY=1
|
||||||
|
%{~endif~}
|
||||||
|
permissions: '0600'
|
||||||
|
FILES
|
||||||
: ""}
|
: ""}
|
||||||
EOT
|
EOT
|
||||||
|
|
||||||
|
|||||||
@@ -33,7 +33,7 @@ variable "default_target_node" {
|
|||||||
variable "default_template_name" {
|
variable "default_template_name" {
|
||||||
description = "Default template VM name to clone from"
|
description = "Default template VM name to clone from"
|
||||||
type = string
|
type = string
|
||||||
default = "nixos-25.11.20260128.fa83fd8"
|
default = "nixos-25.11.20260131.41e216c"
|
||||||
}
|
}
|
||||||
|
|
||||||
variable "default_ssh_public_key" {
|
variable "default_ssh_public_key" {
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ Manages the following OpenBao resources:
|
|||||||
|
|
||||||
2. **Edit `terraform.tfvars` with your OpenBao credentials:**
|
2. **Edit `terraform.tfvars` with your OpenBao credentials:**
|
||||||
```hcl
|
```hcl
|
||||||
vault_address = "https://vault.home.2rjus.net:8200"
|
vault_address = "https://vault01.home.2rjus.net:8200"
|
||||||
vault_token = "hvs.your-root-token-here"
|
vault_token = "hvs.your-root-token-here"
|
||||||
vault_skip_tls_verify = true
|
vault_skip_tls_verify = true
|
||||||
```
|
```
|
||||||
@@ -120,7 +120,7 @@ bao write pki_int/config/acme enabled=true
|
|||||||
|
|
||||||
ACME directory endpoint:
|
ACME directory endpoint:
|
||||||
```
|
```
|
||||||
https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
|
https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory
|
||||||
```
|
```
|
||||||
|
|
||||||
Use with ACME clients (lego, certbot, cert-manager, etc.):
|
Use with ACME clients (lego, certbot, cert-manager, etc.):
|
||||||
@@ -128,7 +128,7 @@ Use with ACME clients (lego, certbot, cert-manager, etc.):
|
|||||||
# Example with lego
|
# Example with lego
|
||||||
lego --email admin@home.2rjus.net \
|
lego --email admin@home.2rjus.net \
|
||||||
--dns manual \
|
--dns manual \
|
||||||
--server https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory \
|
--server https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory \
|
||||||
--accept-tos \
|
--accept-tos \
|
||||||
run -d test.home.2rjus.net
|
run -d test.home.2rjus.net
|
||||||
```
|
```
|
||||||
@@ -239,18 +239,18 @@ After deploying this configuration, perform these one-time setup tasks:
|
|||||||
|
|
||||||
### 1. Enable ACME
|
### 1. Enable ACME
|
||||||
```bash
|
```bash
|
||||||
export BAO_ADDR='https://vault.home.2rjus.net:8200'
|
export BAO_ADDR='https://vault01.home.2rjus.net:8200'
|
||||||
export BAO_TOKEN='your-root-token'
|
export BAO_TOKEN='your-root-token'
|
||||||
export BAO_SKIP_VERIFY=1
|
export BAO_SKIP_VERIFY=1
|
||||||
|
|
||||||
# Configure cluster path (required for ACME)
|
# Configure cluster path (required for ACME)
|
||||||
bao write pki_int/config/cluster path=https://vault.home.2rjus.net:8200/v1/pki_int
|
bao write pki_int/config/cluster path=https://vault01.home.2rjus.net:8200/v1/pki_int
|
||||||
|
|
||||||
# Enable ACME on intermediate CA
|
# Enable ACME on intermediate CA
|
||||||
bao write pki_int/config/acme enabled=true
|
bao write pki_int/config/acme enabled=true
|
||||||
|
|
||||||
# Verify ACME is enabled
|
# Verify ACME is enabled
|
||||||
curl -k https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
|
curl -k https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Download Root CA Certificate
|
### 2. Download Root CA Certificate
|
||||||
|
|||||||
48
terraform/vault/hosts-generated.tf
Normal file
48
terraform/vault/hosts-generated.tf
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
# WARNING: Auto-generated by create-host tool
|
||||||
|
# Manual edits will be overwritten when create-host is run
|
||||||
|
|
||||||
|
# Generated host policies
|
||||||
|
# Each host gets access to its own secrets under hosts/<hostname>/*
|
||||||
|
locals {
|
||||||
|
generated_host_policies = {
|
||||||
|
"vaulttest01" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/vaulttest01/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
# Placeholder secrets - user should add actual secrets manually or via tofu
|
||||||
|
generated_secrets = {
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create policies for generated hosts
|
||||||
|
resource "vault_policy" "generated_host_policies" {
|
||||||
|
for_each = local.generated_host_policies
|
||||||
|
|
||||||
|
name = "host-${each.key}"
|
||||||
|
|
||||||
|
policy = <<-EOT
|
||||||
|
# Allow host to read its own secrets
|
||||||
|
%{for path in each.value.paths~}
|
||||||
|
path "${path}" {
|
||||||
|
capabilities = ["read", "list"]
|
||||||
|
}
|
||||||
|
%{endfor~}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create AppRoles for generated hosts
|
||||||
|
resource "vault_approle_auth_backend_role" "generated_hosts" {
|
||||||
|
for_each = local.generated_host_policies
|
||||||
|
|
||||||
|
backend = vault_auth_backend.approle.path
|
||||||
|
role_name = each.key
|
||||||
|
token_policies = ["host-${each.key}"]
|
||||||
|
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
||||||
|
token_ttl = 3600
|
||||||
|
token_max_ttl = 3600
|
||||||
|
secret_id_num_uses = 0 # Unlimited uses
|
||||||
|
}
|
||||||
@@ -16,7 +16,7 @@
|
|||||||
#
|
#
|
||||||
# 1. ACME (Automated Certificate Management Environment)
|
# 1. ACME (Automated Certificate Management Environment)
|
||||||
# - Services fetch certificates automatically using ACME protocol
|
# - Services fetch certificates automatically using ACME protocol
|
||||||
# - ACME directory: https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
|
# - ACME directory: https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory
|
||||||
# - Enable ACME: bao write pki_int/config/acme enabled=true
|
# - Enable ACME: bao write pki_int/config/acme enabled=true
|
||||||
# - Compatible with cert-manager, lego, certbot, etc.
|
# - Compatible with cert-manager, lego, certbot, etc.
|
||||||
#
|
#
|
||||||
@@ -149,7 +149,7 @@ locals {
|
|||||||
static_certificates = {
|
static_certificates = {
|
||||||
# Example: Issue a certificate for a specific service
|
# Example: Issue a certificate for a specific service
|
||||||
# "vault" = {
|
# "vault" = {
|
||||||
# common_name = "vault.home.2rjus.net"
|
# common_name = "vault01.home.2rjus.net"
|
||||||
# alt_names = ["vault01.home.2rjus.net"]
|
# alt_names = ["vault01.home.2rjus.net"]
|
||||||
# ip_sans = ["10.69.13.19"]
|
# ip_sans = ["10.69.13.19"]
|
||||||
# ttl = "8760h" # 1 year
|
# ttl = "8760h" # 1 year
|
||||||
|
|||||||
@@ -46,7 +46,11 @@ locals {
|
|||||||
auto_generate = true
|
auto_generate = true
|
||||||
password_length = 24
|
password_length = 24
|
||||||
}
|
}
|
||||||
|
# TODO: Remove after testing
|
||||||
|
"hosts/vaulttest01/test-service" = {
|
||||||
|
auto_generate = true
|
||||||
|
password_length = 32
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# Copy this file to terraform.tfvars and fill in your values
|
# Copy this file to terraform.tfvars and fill in your values
|
||||||
# terraform.tfvars is gitignored to keep credentials safe
|
# terraform.tfvars is gitignored to keep credentials safe
|
||||||
|
|
||||||
vault_address = "https://vault.home.2rjus.net:8200"
|
vault_address = "https://vault01.home.2rjus.net:8200"
|
||||||
vault_token = "hvs.XXXXXXXXXXXXXXXXXXXX"
|
vault_token = "hvs.XXXXXXXXXXXXXXXXXXXX"
|
||||||
vault_skip_tls_verify = true
|
vault_skip_tls_verify = true
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
variable "vault_address" {
|
variable "vault_address" {
|
||||||
description = "OpenBao server address"
|
description = "OpenBao server address"
|
||||||
type = string
|
type = string
|
||||||
default = "https://vault.home.2rjus.net:8200"
|
default = "https://vault01.home.2rjus.net:8200"
|
||||||
}
|
}
|
||||||
|
|
||||||
variable "vault_token" {
|
variable "vault_token" {
|
||||||
|
|||||||
@@ -45,6 +45,14 @@ locals {
|
|||||||
disk_size = "20G"
|
disk_size = "20G"
|
||||||
flake_branch = "vault-setup" # Bootstrap from this branch instead of master
|
flake_branch = "vault-setup" # Bootstrap from this branch instead of master
|
||||||
}
|
}
|
||||||
|
"vaulttest01" = {
|
||||||
|
ip = "10.69.13.150/24"
|
||||||
|
cpu_cores = 2
|
||||||
|
memory = 2048
|
||||||
|
disk_size = "20G"
|
||||||
|
flake_branch = "vault-bootstrap-integration"
|
||||||
|
vault_wrapped_token = "s.HwNenAYvXBsPs8uICh4CbE11"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
# Compute VM configurations with defaults applied
|
# Compute VM configurations with defaults applied
|
||||||
@@ -66,6 +74,8 @@ locals {
|
|||||||
gateway = lookup(vm, "gateway", var.default_gateway)
|
gateway = lookup(vm, "gateway", var.default_gateway)
|
||||||
# Branch configuration for bootstrap (optional, uses master if not set)
|
# Branch configuration for bootstrap (optional, uses master if not set)
|
||||||
flake_branch = lookup(vm, "flake_branch", null)
|
flake_branch = lookup(vm, "flake_branch", null)
|
||||||
|
# Vault configuration (optional, for automatic secret provisioning)
|
||||||
|
vault_wrapped_token = lookup(vm, "vault_wrapped_token", null)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -138,4 +148,12 @@ resource "proxmox_vm_qemu" "vm" {
|
|||||||
source = "/dev/urandom"
|
source = "/dev/urandom"
|
||||||
period = 1000
|
period = 1000
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# Lifecycle configuration
|
||||||
|
lifecycle {
|
||||||
|
ignore_changes = [
|
||||||
|
clone, # Template name can change without recreating VMs
|
||||||
|
startup_shutdown, # Proxmox sets defaults (-1) that we don't need to manage
|
||||||
|
]
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user