chore: update TODO.md
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m53s
Run nix flake check / flake-check (pull_request) Successful in 2m16s

This commit is contained in:
2026-02-02 00:47:31 +01:00
parent c694b9889a
commit 238ad45c14

290
TODO.md
View File

@@ -153,7 +153,9 @@ create-host \
---
### Phase 4: Secrets Management with HashiCorp Vault
### Phase 4: Secrets Management with OpenBao (Vault)
**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
@@ -164,161 +166,225 @@ create-host \
4. User commits, pushes
5. VM can now decrypt secrets
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
**Selected approach:** Migrate to OpenBao (Vault fork) for centralized secrets management
**Why OpenBao instead of HashiCorp Vault:**
- HashiCorp Vault switched to BSL (Business Source License), unavailable in NixOS cache
- OpenBao is the community fork maintaining the pre-BSL MPL 2.0 license
- API-compatible with Vault, uses same Terraform provider
- Maintains all Vault features we need
**Benefits:**
- Industry-standard secrets management (Vault experience transferable to work)
- Industry-standard secrets management (Vault-compatible experience)
- Eliminates manual age key distribution step
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
- Centralized PKI management with ACME support (ready to replace step-ca)
- Automatic secret rotation capabilities
- Audit logging for all secret access
- Audit logging for all secret access (not yet enabled)
- AppRole authentication enables automated bootstrap
**Architecture:**
**Current Architecture:**
```
vault.home.2rjus.net
├─ KV Secrets Engine (replaces sops-nix)
├─ PKI Engine (replaces step-ca for TLS)
├─ SSH CA Engine (replaces step-ca SSH CA)
└─ AppRole Auth (per-host authentication)
vault.home.2rjus.net (10.69.13.19)
├─ KV Secrets Engine (ready to replace sops-nix)
│ ├─ secret/hosts/{hostname}/*
│ ├─ secret/services/{service}/*
└─ secret/shared/{category}/*
├─ PKI Engine (ready to replace step-ca for TLS)
│ ├─ Root CA (EC P-384, 10 year)
│ ├─ Intermediate CA (EC P-384, 5 year)
│ └─ ACME endpoint enabled
├─ SSH CA Engine (TODO: Phase 4c)
└─ AppRole Auth (per-host authentication configured)
New hosts authenticate on first boot
Fetch secrets via Vault API
[Phase 4d] New hosts authenticate on first boot
[Phase 4d] Fetch secrets via Vault API
No manual key distribution needed
```
**Completed:**
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
**Next Steps:**
- Phase 4c: Migrate from step-ca to OpenBao PKI
- Phase 4d: Bootstrap integration for automated secrets access
---
#### Phase 4a: Vault Server Setup
#### Phase 4a: Vault Server Setup ✅ COMPLETED
**Status:** ✅ Fully implemented and tested
**Completed:** 2026-02-02
**Goal:** Deploy and configure Vault server with auto-unseal
**Tasks:**
- [ ] Create `hosts/vault01/` configuration
- [ ] Basic NixOS configuration (hostname, networking, etc.)
- [ ] Vault service configuration
- [ ] Firewall rules (8200 for API, 8201 for cluster)
- [ ] Add to flake.nix and terraform
- [ ] Implement auto-unseal mechanism
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
- [ ] Use tpm2-tools to seal/unseal Vault keys
- [ ] Systemd service to unseal on boot
- [ ] **Fallback:** Shamir secret sharing with systemd automation
- [ ] Generate 3 keys, threshold 2
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
- [ ] Systemd service auto-unseals using 2 keys
- [ ] Initial Vault setup
- [ ] Initialize Vault
- [ ] Configure storage backend (integrated raft or file)
- [ ] Set up root token management
- [ ] Enable audit logging
- [ ] Deploy to infrastructure
- [ ] Add DNS entry for vault.home.2rjus.net
- [ ] Deploy VM via terraform
- [ ] Bootstrap and verify Vault is running
**Implementation:**
- Used **OpenBao** (Vault fork) instead of HashiCorp Vault due to BSL licensing concerns
- TPM2-based auto-unseal using systemd's native `LoadCredentialEncrypted`
- Self-signed bootstrap TLS certificates (avoiding circular dependency with step-ca)
- File-based storage backend at `/var/lib/openbao`
- Unix socket + TCP listener (0.0.0.0:8200) configuration
**Deliverable:** Running Vault server that auto-unseals on boot
**Tasks:**
- [x] Create `hosts/vault01/` configuration
- [x] Basic NixOS configuration (hostname: vault01, IP: 10.69.13.19/24)
- [x] Created reusable `services/vault` module
- [x] Firewall not needed (trusted network)
- [x] Already in flake.nix, deployed via terraform
- [x] Implement auto-unseal mechanism
- [x] **TPM2-based auto-unseal** (preferred option)
- [x] systemd `LoadCredentialEncrypted` with TPM2 binding
- [x] `writeShellApplication` script with proper runtime dependencies
- [x] Reads multiple unseal keys (one per line) until unsealed
- [x] Auto-unseals on service start via `ExecStartPost`
- [x] Initial Vault setup
- [x] Initialized OpenBao with Shamir secret sharing (5 keys, threshold 3)
- [x] File storage backend
- [x] Self-signed TLS certificates via LoadCredential
- [x] Deploy to infrastructure
- [x] DNS entry added for vault.home.2rjus.net
- [x] VM deployed via terraform
- [x] Verified OpenBao running and auto-unsealing
**Changes from Original Plan:**
- Used OpenBao instead of HashiCorp Vault (licensing)
- Used systemd's native TPM2 support instead of tpm2-tools directly
- Skipped audit logging (can be enabled later)
- Used self-signed certs initially (will migrate to OpenBao PKI later)
**Deliverable:** ✅ Running OpenBao server that auto-unseals on boot using TPM2
**Documentation:**
- `/services/vault/README.md` - Service module overview
- `/docs/vault/auto-unseal.md` - Complete TPM2 auto-unseal setup guide
---
#### Phase 4b: Vault-as-Code with OpenTofu
#### Phase 4b: Vault-as-Code with OpenTofu ✅ COMPLETED
**Status:** ✅ Fully implemented and tested
**Completed:** 2026-02-02
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
**Implementation:**
- Complete Terraform/OpenTofu configuration in `terraform/vault/`
- Locals-based pattern (similar to `vms.tf`) for declaring secrets and policies
- Auto-generation of secrets using `random_password` provider
- Three-tier secrets path hierarchy: `hosts/`, `services/`, `shared/`
- PKI infrastructure with **Elliptic Curve certificates** (P-384 for CAs, P-256 for leaf certs)
- ACME support enabled on intermediate CA
**Tasks:**
- [ ] Set up Vault Terraform provider
- [ ] Create `terraform/vault/` directory
- [ ] Configure Vault provider (address, auth)
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
- [ ] Enable and configure secrets engines
- [ ] Enable KV v2 secrets engine at `secret/`
- [ ] Define secret path structure (per-service, per-host)
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
- [ ] Define policies as code
- [ ] Create policies for different service tiers
- [ ] Principle of least privilege (hosts only read their secrets)
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
- [ ] Set up AppRole authentication
- [ ] Enable AppRole auth backend
- [ ] Create role per host type (monitoring, dns, database, etc.)
- [ ] Bind policies to roles
- [ ] Configure TTL and token policies
- [ ] Migrate existing secrets from sops-nix
- [ ] Create migration script/playbook
- [ ] Decrypt sops secrets and load into Vault KV
- [ ] Verify all secrets migrated successfully
- [ ] Keep sops as backup during transition
- [ ] Implement secrets-as-code patterns
- [ ] Secret values in gitignored terraform.tfvars
- [ ] Or use random_password for auto-generated secrets
- [ ] Secret structure/paths in version-controlled .tf files
- [x] Set up Vault Terraform provider
- [x] Created `terraform/vault/` directory
- [x] Configured Vault provider (uses HashiCorp provider, compatible with OpenBao)
- [x] Credentials in terraform.tfvars (gitignored)
- [x] terraform.tfvars.example for reference
- [x] Enable and configure secrets engines
- [x] KV v2 engine at `secret/`
- [x] Three-tier path structure:
- `secret/hosts/{hostname}/*` - Host-specific secrets
- `secret/services/{service}/*` - Service-wide secrets
- `secret/shared/{category}/*` - Shared secrets (SMTP, backups, etc.)
- [x] Define policies as code
- [x] Policies auto-generated from `locals.host_policies`
- [x] Per-host policies with read/list on designated paths
- [x] Principle of least privilege enforced
- [x] Set up AppRole authentication
- [x] AppRole backend enabled at `approle/`
- [x] Roles auto-generated per host from `locals.host_policies`
- [x] Token TTL: 1 hour, max 24 hours
- [x] Policies bound to roles
- [x] Implement secrets-as-code patterns
- [x] Auto-generated secrets using `random_password` provider
- [x] Manual secrets supported via variables in terraform.tfvars
- [x] Secret structure versioned in .tf files
- [x] Secret values excluded from git
- [x] Set up PKI infrastructure
- [x] Root CA (10 year TTL, EC P-384)
- [x] Intermediate CA (5 year TTL, EC P-384)
- [x] PKI role for `*.home.2rjus.net` (30 day max TTL, EC P-256)
- [x] ACME enabled on intermediate CA
- [x] Support for static certificate issuance via Terraform
- [x] CRL, OCSP, and issuing certificate URLs configured
**Example OpenTofu:**
```hcl
resource "vault_kv_secret_v2" "monitoring_grafana" {
mount = "secret"
name = "monitoring/grafana"
data_json = jsonencode({
admin_password = var.grafana_admin_password
smtp_password = var.smtp_password
})
}
**Changes from Original Plan:**
- Used Elliptic Curve instead of RSA for all certificates (better performance, smaller keys)
- Implemented PKI infrastructure in Phase 4b instead of Phase 4c (more logical grouping)
- ACME support configured immediately (ready for migration from step-ca)
- Did not migrate existing sops-nix secrets yet (deferred to gradual migration)
resource "vault_policy" "monitoring" {
name = "monitoring-policy"
policy = <<EOT
path "secret/data/monitoring/*" {
capabilities = ["read"]
}
EOT
}
**Files:**
- `terraform/vault/main.tf` - Provider configuration
- `terraform/vault/variables.tf` - Variable definitions
- `terraform/vault/approle.tf` - AppRole authentication (locals-based pattern)
- `terraform/vault/pki.tf` - PKI infrastructure with EC certificates
- `terraform/vault/secrets.tf` - KV secrets engine (auto-generation support)
- `terraform/vault/README.md` - Complete documentation and usage examples
- `terraform/vault/terraform.tfvars.example` - Example credentials
resource "vault_approle_auth_backend_role" "monitoring01" {
backend = "approle"
role_name = "monitoring01"
token_policies = ["monitoring-policy"]
}
```
**Deliverable:** ✅ All secrets, policies, AppRoles, and PKI managed as OpenTofu code in `terraform/vault/`
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
**Documentation:**
- `/terraform/vault/README.md` - Comprehensive guide covering:
- Setup and deployment
- AppRole usage and host access patterns
- PKI certificate issuance (ACME, static, manual)
- Secrets management patterns
- ACME configuration and troubleshooting
---
#### Phase 4c: PKI Migration (Replace step-ca)
**Goal:** Consolidate PKI infrastructure into Vault
**Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates
**Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support)
**Tasks:**
- [ ] Set up Vault PKI engines
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
- [ ] Sign intermediate with root CA
- [ ] Configure CRL and OCSP
- [ ] Enable ACME support
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
- [ ] Create PKI role for homelab domain
- [ ] Set certificate TTLs and allowed domains
- [ ] Configure SSH CA in Vault
- [x] Set up OpenBao PKI engines (completed in Phase 4b)
- [x] Root CA (`pki/` mount, 10 year TTL, EC P-384)
- [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384)
- [x] Signed intermediate with root CA
- [x] Configured CRL, OCSP, and issuing certificate URLs
- [x] Enable ACME support (completed in Phase 4b)
- [x] Enabled ACME on intermediate CA
- [x] Created PKI role for `*.home.2rjus.net`
- [x] Set certificate TTLs (30 day max) and allowed domains
- [x] ACME directory: `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Download and distribute root CA certificate
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
- [ ] Deploy via auto-upgrade
- [ ] Test certificate issuance
- [ ] Issue test certificate using ACME client (lego/certbot)
- [ ] Or issue static certificate via OpenBao CLI
- [ ] Verify certificate chain and trust
- [ ] Migrate vault01's own certificate
- [ ] Issue new certificate from OpenBao PKI (self-issued)
- [ ] Replace self-signed bootstrap certificate
- [ ] Update service configuration
- [ ] Migrate hosts from step-ca to OpenBao
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test on one host (non-critical service)
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Configure SSH CA in OpenBao (optional, future work)
- [ ] Enable SSH secrets engine (`ssh/` mount)
- [ ] Generate SSH signing keys
- [ ] Create roles for host and user certificates
- [ ] Configure TTLs and allowed principals
- [ ] Migrate hosts from step-ca to Vault
- [ ] Update system/acme.nix to use Vault ACME endpoint
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test certificate issuance on one host
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Migrate SSH CA trust
- [ ] Distribute Vault SSH CA public key to all hosts
- [ ] Update sshd_config to trust Vault CA
- [ ] Test SSH certificate authentication
- [ ] Distribute SSH CA public key to all hosts
- [ ] Update sshd_config to trust OpenBao CA
- [ ] Decommission step-ca
- [ ] Verify all services migrated
- [ ] Verify all ACME services migrated and working
- [ ] Stop step-ca service on ca host
- [ ] Archive step-ca configuration for backup
- [ ] Update documentation
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
**Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired
---