From e0ad445341716737b50b053bcec4ffaddbcf0c8d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Sun, 1 Feb 2026 20:05:56 +0100 Subject: [PATCH] planning: update TODO.md --- TODO.md | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 205 insertions(+), 19 deletions(-) diff --git a/TODO.md b/TODO.md index 5ee72c5..46f314f 100644 --- a/TODO.md +++ b/TODO.md @@ -153,9 +153,9 @@ create-host \ --- -### Phase 4: Secrets Management Automation +### Phase 4: Secrets Management with HashiCorp Vault -**Challenge:** sops needs age key, but age key is generated on first boot +**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys **Current workflow:** 1. VM boots, generates age key at `/var/lib/sops-nix/key.txt` @@ -164,27 +164,213 @@ create-host \ 4. User commits, pushes 5. VM can now decrypt secrets -**Proposed solution:** +**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management -**Option A: Pre-generate age keys** -- [ ] Generate age key pair during `create-host-config.sh` -- [ ] Add public key to `.sops.yaml` immediately -- [ ] Store private key temporarily (secure location) -- [ ] Inject private key via cloud-init write_files or Terraform file provisioner -- [ ] VM uses pre-configured key from first boot +**Benefits:** +- Industry-standard secrets management (Vault experience transferable to work) +- Eliminates manual age key distribution step +- Secrets-as-code via OpenTofu (infrastructure-as-code aligned) +- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA) +- Automatic secret rotation capabilities +- Audit logging for all secret access +- AppRole authentication enables automated bootstrap -**Option B: Post-deployment secret injection** -- [ ] VM boots with template, generates its own key -- [ ] Fetch public key via SSH after first boot -- [ ] Automatically add to `.sops.yaml` and commit -- [ ] Trigger rebuild on VM to pick up secrets access +**Architecture:** +``` +vault.home.2rjus.net + ├─ KV Secrets Engine (replaces sops-nix) + ├─ PKI Engine (replaces step-ca for TLS) + ├─ SSH CA Engine (replaces step-ca SSH CA) + └─ AppRole Auth (per-host authentication) + ↓ + New hosts authenticate on first boot + Fetch secrets via Vault API + No manual key distribution needed +``` -**Option C: Separate secrets from initial deployment** -- [ ] Initial deployment works without secrets -- [ ] After VM is running, user manually adds age key -- [ ] Subsequent auto-upgrades pick up secrets +--- -**Decision needed:** Option A is most automated, but requires secure key handling +#### Phase 4a: Vault Server Setup + +**Goal:** Deploy and configure Vault server with auto-unseal + +**Tasks:** +- [ ] Create `hosts/vault01/` configuration + - [ ] Basic NixOS configuration (hostname, networking, etc.) + - [ ] Vault service configuration + - [ ] Firewall rules (8200 for API, 8201 for cluster) + - [ ] Add to flake.nix and terraform +- [ ] Implement auto-unseal mechanism + - [ ] **Preferred:** TPM-based auto-unseal if hardware supports it + - [ ] Use tpm2-tools to seal/unseal Vault keys + - [ ] Systemd service to unseal on boot + - [ ] **Fallback:** Shamir secret sharing with systemd automation + - [ ] Generate 3 keys, threshold 2 + - [ ] Store 2 keys on disk (encrypted), keep 1 offline + - [ ] Systemd service auto-unseals using 2 keys +- [ ] Initial Vault setup + - [ ] Initialize Vault + - [ ] Configure storage backend (integrated raft or file) + - [ ] Set up root token management + - [ ] Enable audit logging +- [ ] Deploy to infrastructure + - [ ] Add DNS entry for vault.home.2rjus.net + - [ ] Deploy VM via terraform + - [ ] Bootstrap and verify Vault is running + +**Deliverable:** Running Vault server that auto-unseals on boot + +--- + +#### Phase 4b: Vault-as-Code with OpenTofu + +**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code + +**Tasks:** +- [ ] Set up Vault Terraform provider + - [ ] Create `terraform/vault/` directory + - [ ] Configure Vault provider (address, auth) + - [ ] Store Vault token securely (terraform.tfvars, gitignored) +- [ ] Enable and configure secrets engines + - [ ] Enable KV v2 secrets engine at `secret/` + - [ ] Define secret path structure (per-service, per-host) + - [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1` +- [ ] Define policies as code + - [ ] Create policies for different service tiers + - [ ] Principle of least privilege (hosts only read their secrets) + - [ ] Example: monitoring-policy allows read on `secret/monitoring/*` +- [ ] Set up AppRole authentication + - [ ] Enable AppRole auth backend + - [ ] Create role per host type (monitoring, dns, database, etc.) + - [ ] Bind policies to roles + - [ ] Configure TTL and token policies +- [ ] Migrate existing secrets from sops-nix + - [ ] Create migration script/playbook + - [ ] Decrypt sops secrets and load into Vault KV + - [ ] Verify all secrets migrated successfully + - [ ] Keep sops as backup during transition +- [ ] Implement secrets-as-code patterns + - [ ] Secret values in gitignored terraform.tfvars + - [ ] Or use random_password for auto-generated secrets + - [ ] Secret structure/paths in version-controlled .tf files + +**Example OpenTofu:** +```hcl +resource "vault_kv_secret_v2" "monitoring_grafana" { + mount = "secret" + name = "monitoring/grafana" + data_json = jsonencode({ + admin_password = var.grafana_admin_password + smtp_password = var.smtp_password + }) +} + +resource "vault_policy" "monitoring" { + name = "monitoring-policy" + policy = <