chore: update TODO.md
This commit is contained in:
290
TODO.md
290
TODO.md
@@ -153,7 +153,9 @@ create-host \
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Phase 4: Secrets Management with HashiCorp Vault
|
### Phase 4: Secrets Management with OpenBao (Vault)
|
||||||
|
|
||||||
|
**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress
|
||||||
|
|
||||||
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
||||||
|
|
||||||
@@ -164,161 +166,225 @@ create-host \
|
|||||||
4. User commits, pushes
|
4. User commits, pushes
|
||||||
5. VM can now decrypt secrets
|
5. VM can now decrypt secrets
|
||||||
|
|
||||||
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
|
**Selected approach:** Migrate to OpenBao (Vault fork) for centralized secrets management
|
||||||
|
|
||||||
|
**Why OpenBao instead of HashiCorp Vault:**
|
||||||
|
- HashiCorp Vault switched to BSL (Business Source License), unavailable in NixOS cache
|
||||||
|
- OpenBao is the community fork maintaining the pre-BSL MPL 2.0 license
|
||||||
|
- API-compatible with Vault, uses same Terraform provider
|
||||||
|
- Maintains all Vault features we need
|
||||||
|
|
||||||
**Benefits:**
|
**Benefits:**
|
||||||
- Industry-standard secrets management (Vault experience transferable to work)
|
- Industry-standard secrets management (Vault-compatible experience)
|
||||||
- Eliminates manual age key distribution step
|
- Eliminates manual age key distribution step
|
||||||
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
|
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
|
||||||
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
|
- Centralized PKI management with ACME support (ready to replace step-ca)
|
||||||
- Automatic secret rotation capabilities
|
- Automatic secret rotation capabilities
|
||||||
- Audit logging for all secret access
|
- Audit logging for all secret access (not yet enabled)
|
||||||
- AppRole authentication enables automated bootstrap
|
- AppRole authentication enables automated bootstrap
|
||||||
|
|
||||||
**Architecture:**
|
**Current Architecture:**
|
||||||
```
|
```
|
||||||
vault.home.2rjus.net
|
vault.home.2rjus.net (10.69.13.19)
|
||||||
├─ KV Secrets Engine (replaces sops-nix)
|
├─ KV Secrets Engine (ready to replace sops-nix)
|
||||||
├─ PKI Engine (replaces step-ca for TLS)
|
│ ├─ secret/hosts/{hostname}/*
|
||||||
├─ SSH CA Engine (replaces step-ca SSH CA)
|
│ ├─ secret/services/{service}/*
|
||||||
└─ AppRole Auth (per-host authentication)
|
│ └─ secret/shared/{category}/*
|
||||||
|
├─ PKI Engine (ready to replace step-ca for TLS)
|
||||||
|
│ ├─ Root CA (EC P-384, 10 year)
|
||||||
|
│ ├─ Intermediate CA (EC P-384, 5 year)
|
||||||
|
│ └─ ACME endpoint enabled
|
||||||
|
├─ SSH CA Engine (TODO: Phase 4c)
|
||||||
|
└─ AppRole Auth (per-host authentication configured)
|
||||||
↓
|
↓
|
||||||
New hosts authenticate on first boot
|
[Phase 4d] New hosts authenticate on first boot
|
||||||
Fetch secrets via Vault API
|
[Phase 4d] Fetch secrets via Vault API
|
||||||
No manual key distribution needed
|
No manual key distribution needed
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Completed:**
|
||||||
|
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
|
||||||
|
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
|
||||||
|
|
||||||
|
**Next Steps:**
|
||||||
|
- Phase 4c: Migrate from step-ca to OpenBao PKI
|
||||||
|
- Phase 4d: Bootstrap integration for automated secrets access
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### Phase 4a: Vault Server Setup
|
#### Phase 4a: Vault Server Setup ✅ COMPLETED
|
||||||
|
|
||||||
|
**Status:** ✅ Fully implemented and tested
|
||||||
|
**Completed:** 2026-02-02
|
||||||
|
|
||||||
**Goal:** Deploy and configure Vault server with auto-unseal
|
**Goal:** Deploy and configure Vault server with auto-unseal
|
||||||
|
|
||||||
**Tasks:**
|
**Implementation:**
|
||||||
- [ ] Create `hosts/vault01/` configuration
|
- Used **OpenBao** (Vault fork) instead of HashiCorp Vault due to BSL licensing concerns
|
||||||
- [ ] Basic NixOS configuration (hostname, networking, etc.)
|
- TPM2-based auto-unseal using systemd's native `LoadCredentialEncrypted`
|
||||||
- [ ] Vault service configuration
|
- Self-signed bootstrap TLS certificates (avoiding circular dependency with step-ca)
|
||||||
- [ ] Firewall rules (8200 for API, 8201 for cluster)
|
- File-based storage backend at `/var/lib/openbao`
|
||||||
- [ ] Add to flake.nix and terraform
|
- Unix socket + TCP listener (0.0.0.0:8200) configuration
|
||||||
- [ ] Implement auto-unseal mechanism
|
|
||||||
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
|
|
||||||
- [ ] Use tpm2-tools to seal/unseal Vault keys
|
|
||||||
- [ ] Systemd service to unseal on boot
|
|
||||||
- [ ] **Fallback:** Shamir secret sharing with systemd automation
|
|
||||||
- [ ] Generate 3 keys, threshold 2
|
|
||||||
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
|
|
||||||
- [ ] Systemd service auto-unseals using 2 keys
|
|
||||||
- [ ] Initial Vault setup
|
|
||||||
- [ ] Initialize Vault
|
|
||||||
- [ ] Configure storage backend (integrated raft or file)
|
|
||||||
- [ ] Set up root token management
|
|
||||||
- [ ] Enable audit logging
|
|
||||||
- [ ] Deploy to infrastructure
|
|
||||||
- [ ] Add DNS entry for vault.home.2rjus.net
|
|
||||||
- [ ] Deploy VM via terraform
|
|
||||||
- [ ] Bootstrap and verify Vault is running
|
|
||||||
|
|
||||||
**Deliverable:** Running Vault server that auto-unseals on boot
|
**Tasks:**
|
||||||
|
- [x] Create `hosts/vault01/` configuration
|
||||||
|
- [x] Basic NixOS configuration (hostname: vault01, IP: 10.69.13.19/24)
|
||||||
|
- [x] Created reusable `services/vault` module
|
||||||
|
- [x] Firewall not needed (trusted network)
|
||||||
|
- [x] Already in flake.nix, deployed via terraform
|
||||||
|
- [x] Implement auto-unseal mechanism
|
||||||
|
- [x] **TPM2-based auto-unseal** (preferred option)
|
||||||
|
- [x] systemd `LoadCredentialEncrypted` with TPM2 binding
|
||||||
|
- [x] `writeShellApplication` script with proper runtime dependencies
|
||||||
|
- [x] Reads multiple unseal keys (one per line) until unsealed
|
||||||
|
- [x] Auto-unseals on service start via `ExecStartPost`
|
||||||
|
- [x] Initial Vault setup
|
||||||
|
- [x] Initialized OpenBao with Shamir secret sharing (5 keys, threshold 3)
|
||||||
|
- [x] File storage backend
|
||||||
|
- [x] Self-signed TLS certificates via LoadCredential
|
||||||
|
- [x] Deploy to infrastructure
|
||||||
|
- [x] DNS entry added for vault.home.2rjus.net
|
||||||
|
- [x] VM deployed via terraform
|
||||||
|
- [x] Verified OpenBao running and auto-unsealing
|
||||||
|
|
||||||
|
**Changes from Original Plan:**
|
||||||
|
- Used OpenBao instead of HashiCorp Vault (licensing)
|
||||||
|
- Used systemd's native TPM2 support instead of tpm2-tools directly
|
||||||
|
- Skipped audit logging (can be enabled later)
|
||||||
|
- Used self-signed certs initially (will migrate to OpenBao PKI later)
|
||||||
|
|
||||||
|
**Deliverable:** ✅ Running OpenBao server that auto-unseals on boot using TPM2
|
||||||
|
|
||||||
|
**Documentation:**
|
||||||
|
- `/services/vault/README.md` - Service module overview
|
||||||
|
- `/docs/vault/auto-unseal.md` - Complete TPM2 auto-unseal setup guide
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### Phase 4b: Vault-as-Code with OpenTofu
|
#### Phase 4b: Vault-as-Code with OpenTofu ✅ COMPLETED
|
||||||
|
|
||||||
|
**Status:** ✅ Fully implemented and tested
|
||||||
|
**Completed:** 2026-02-02
|
||||||
|
|
||||||
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
|
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
|
||||||
|
|
||||||
|
**Implementation:**
|
||||||
|
- Complete Terraform/OpenTofu configuration in `terraform/vault/`
|
||||||
|
- Locals-based pattern (similar to `vms.tf`) for declaring secrets and policies
|
||||||
|
- Auto-generation of secrets using `random_password` provider
|
||||||
|
- Three-tier secrets path hierarchy: `hosts/`, `services/`, `shared/`
|
||||||
|
- PKI infrastructure with **Elliptic Curve certificates** (P-384 for CAs, P-256 for leaf certs)
|
||||||
|
- ACME support enabled on intermediate CA
|
||||||
|
|
||||||
**Tasks:**
|
**Tasks:**
|
||||||
- [ ] Set up Vault Terraform provider
|
- [x] Set up Vault Terraform provider
|
||||||
- [ ] Create `terraform/vault/` directory
|
- [x] Created `terraform/vault/` directory
|
||||||
- [ ] Configure Vault provider (address, auth)
|
- [x] Configured Vault provider (uses HashiCorp provider, compatible with OpenBao)
|
||||||
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
|
- [x] Credentials in terraform.tfvars (gitignored)
|
||||||
- [ ] Enable and configure secrets engines
|
- [x] terraform.tfvars.example for reference
|
||||||
- [ ] Enable KV v2 secrets engine at `secret/`
|
- [x] Enable and configure secrets engines
|
||||||
- [ ] Define secret path structure (per-service, per-host)
|
- [x] KV v2 engine at `secret/`
|
||||||
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
|
- [x] Three-tier path structure:
|
||||||
- [ ] Define policies as code
|
- `secret/hosts/{hostname}/*` - Host-specific secrets
|
||||||
- [ ] Create policies for different service tiers
|
- `secret/services/{service}/*` - Service-wide secrets
|
||||||
- [ ] Principle of least privilege (hosts only read their secrets)
|
- `secret/shared/{category}/*` - Shared secrets (SMTP, backups, etc.)
|
||||||
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
|
- [x] Define policies as code
|
||||||
- [ ] Set up AppRole authentication
|
- [x] Policies auto-generated from `locals.host_policies`
|
||||||
- [ ] Enable AppRole auth backend
|
- [x] Per-host policies with read/list on designated paths
|
||||||
- [ ] Create role per host type (monitoring, dns, database, etc.)
|
- [x] Principle of least privilege enforced
|
||||||
- [ ] Bind policies to roles
|
- [x] Set up AppRole authentication
|
||||||
- [ ] Configure TTL and token policies
|
- [x] AppRole backend enabled at `approle/`
|
||||||
- [ ] Migrate existing secrets from sops-nix
|
- [x] Roles auto-generated per host from `locals.host_policies`
|
||||||
- [ ] Create migration script/playbook
|
- [x] Token TTL: 1 hour, max 24 hours
|
||||||
- [ ] Decrypt sops secrets and load into Vault KV
|
- [x] Policies bound to roles
|
||||||
- [ ] Verify all secrets migrated successfully
|
- [x] Implement secrets-as-code patterns
|
||||||
- [ ] Keep sops as backup during transition
|
- [x] Auto-generated secrets using `random_password` provider
|
||||||
- [ ] Implement secrets-as-code patterns
|
- [x] Manual secrets supported via variables in terraform.tfvars
|
||||||
- [ ] Secret values in gitignored terraform.tfvars
|
- [x] Secret structure versioned in .tf files
|
||||||
- [ ] Or use random_password for auto-generated secrets
|
- [x] Secret values excluded from git
|
||||||
- [ ] Secret structure/paths in version-controlled .tf files
|
- [x] Set up PKI infrastructure
|
||||||
|
- [x] Root CA (10 year TTL, EC P-384)
|
||||||
|
- [x] Intermediate CA (5 year TTL, EC P-384)
|
||||||
|
- [x] PKI role for `*.home.2rjus.net` (30 day max TTL, EC P-256)
|
||||||
|
- [x] ACME enabled on intermediate CA
|
||||||
|
- [x] Support for static certificate issuance via Terraform
|
||||||
|
- [x] CRL, OCSP, and issuing certificate URLs configured
|
||||||
|
|
||||||
**Example OpenTofu:**
|
**Changes from Original Plan:**
|
||||||
```hcl
|
- Used Elliptic Curve instead of RSA for all certificates (better performance, smaller keys)
|
||||||
resource "vault_kv_secret_v2" "monitoring_grafana" {
|
- Implemented PKI infrastructure in Phase 4b instead of Phase 4c (more logical grouping)
|
||||||
mount = "secret"
|
- ACME support configured immediately (ready for migration from step-ca)
|
||||||
name = "monitoring/grafana"
|
- Did not migrate existing sops-nix secrets yet (deferred to gradual migration)
|
||||||
data_json = jsonencode({
|
|
||||||
admin_password = var.grafana_admin_password
|
|
||||||
smtp_password = var.smtp_password
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "vault_policy" "monitoring" {
|
**Files:**
|
||||||
name = "monitoring-policy"
|
- `terraform/vault/main.tf` - Provider configuration
|
||||||
policy = <<EOT
|
- `terraform/vault/variables.tf` - Variable definitions
|
||||||
path "secret/data/monitoring/*" {
|
- `terraform/vault/approle.tf` - AppRole authentication (locals-based pattern)
|
||||||
capabilities = ["read"]
|
- `terraform/vault/pki.tf` - PKI infrastructure with EC certificates
|
||||||
}
|
- `terraform/vault/secrets.tf` - KV secrets engine (auto-generation support)
|
||||||
EOT
|
- `terraform/vault/README.md` - Complete documentation and usage examples
|
||||||
}
|
- `terraform/vault/terraform.tfvars.example` - Example credentials
|
||||||
|
|
||||||
resource "vault_approle_auth_backend_role" "monitoring01" {
|
**Deliverable:** ✅ All secrets, policies, AppRoles, and PKI managed as OpenTofu code in `terraform/vault/`
|
||||||
backend = "approle"
|
|
||||||
role_name = "monitoring01"
|
|
||||||
token_policies = ["monitoring-policy"]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
|
**Documentation:**
|
||||||
|
- `/terraform/vault/README.md` - Comprehensive guide covering:
|
||||||
|
- Setup and deployment
|
||||||
|
- AppRole usage and host access patterns
|
||||||
|
- PKI certificate issuance (ACME, static, manual)
|
||||||
|
- Secrets management patterns
|
||||||
|
- ACME configuration and troubleshooting
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### Phase 4c: PKI Migration (Replace step-ca)
|
#### Phase 4c: PKI Migration (Replace step-ca)
|
||||||
|
|
||||||
**Goal:** Consolidate PKI infrastructure into Vault
|
**Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates
|
||||||
|
|
||||||
|
**Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support)
|
||||||
|
|
||||||
**Tasks:**
|
**Tasks:**
|
||||||
- [ ] Set up Vault PKI engines
|
- [x] Set up OpenBao PKI engines (completed in Phase 4b)
|
||||||
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
|
- [x] Root CA (`pki/` mount, 10 year TTL, EC P-384)
|
||||||
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
|
- [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384)
|
||||||
- [ ] Sign intermediate with root CA
|
- [x] Signed intermediate with root CA
|
||||||
- [ ] Configure CRL and OCSP
|
- [x] Configured CRL, OCSP, and issuing certificate URLs
|
||||||
- [ ] Enable ACME support
|
- [x] Enable ACME support (completed in Phase 4b)
|
||||||
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
|
- [x] Enabled ACME on intermediate CA
|
||||||
- [ ] Create PKI role for homelab domain
|
- [x] Created PKI role for `*.home.2rjus.net`
|
||||||
- [ ] Set certificate TTLs and allowed domains
|
- [x] Set certificate TTLs (30 day max) and allowed domains
|
||||||
- [ ] Configure SSH CA in Vault
|
- [x] ACME directory: `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||||
|
- [ ] Download and distribute root CA certificate
|
||||||
|
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
|
||||||
|
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
|
||||||
|
- [ ] Deploy via auto-upgrade
|
||||||
|
- [ ] Test certificate issuance
|
||||||
|
- [ ] Issue test certificate using ACME client (lego/certbot)
|
||||||
|
- [ ] Or issue static certificate via OpenBao CLI
|
||||||
|
- [ ] Verify certificate chain and trust
|
||||||
|
- [ ] Migrate vault01's own certificate
|
||||||
|
- [ ] Issue new certificate from OpenBao PKI (self-issued)
|
||||||
|
- [ ] Replace self-signed bootstrap certificate
|
||||||
|
- [ ] Update service configuration
|
||||||
|
- [ ] Migrate hosts from step-ca to OpenBao
|
||||||
|
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
|
||||||
|
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||||
|
- [ ] Test on one host (non-critical service)
|
||||||
|
- [ ] Roll out to all hosts via auto-upgrade
|
||||||
|
- [ ] Configure SSH CA in OpenBao (optional, future work)
|
||||||
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
||||||
- [ ] Generate SSH signing keys
|
- [ ] Generate SSH signing keys
|
||||||
- [ ] Create roles for host and user certificates
|
- [ ] Create roles for host and user certificates
|
||||||
- [ ] Configure TTLs and allowed principals
|
- [ ] Configure TTLs and allowed principals
|
||||||
- [ ] Migrate hosts from step-ca to Vault
|
- [ ] Distribute SSH CA public key to all hosts
|
||||||
- [ ] Update system/acme.nix to use Vault ACME endpoint
|
- [ ] Update sshd_config to trust OpenBao CA
|
||||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
|
||||||
- [ ] Test certificate issuance on one host
|
|
||||||
- [ ] Roll out to all hosts via auto-upgrade
|
|
||||||
- [ ] Migrate SSH CA trust
|
|
||||||
- [ ] Distribute Vault SSH CA public key to all hosts
|
|
||||||
- [ ] Update sshd_config to trust Vault CA
|
|
||||||
- [ ] Test SSH certificate authentication
|
|
||||||
- [ ] Decommission step-ca
|
- [ ] Decommission step-ca
|
||||||
- [ ] Verify all services migrated
|
- [ ] Verify all ACME services migrated and working
|
||||||
- [ ] Stop step-ca service on ca host
|
- [ ] Stop step-ca service on ca host
|
||||||
- [ ] Archive step-ca configuration for backup
|
- [ ] Archive step-ca configuration for backup
|
||||||
|
- [ ] Update documentation
|
||||||
|
|
||||||
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
|
**Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user