planning: update TODO.md
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
This commit is contained in:
224
TODO.md
224
TODO.md
@@ -153,9 +153,9 @@ create-host \
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Secrets Management Automation
|
||||
### Phase 4: Secrets Management with HashiCorp Vault
|
||||
|
||||
**Challenge:** sops needs age key, but age key is generated on first boot
|
||||
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
||||
|
||||
**Current workflow:**
|
||||
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
|
||||
@@ -164,27 +164,213 @@ create-host \
|
||||
4. User commits, pushes
|
||||
5. VM can now decrypt secrets
|
||||
|
||||
**Proposed solution:**
|
||||
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
|
||||
|
||||
**Option A: Pre-generate age keys**
|
||||
- [ ] Generate age key pair during `create-host-config.sh`
|
||||
- [ ] Add public key to `.sops.yaml` immediately
|
||||
- [ ] Store private key temporarily (secure location)
|
||||
- [ ] Inject private key via cloud-init write_files or Terraform file provisioner
|
||||
- [ ] VM uses pre-configured key from first boot
|
||||
**Benefits:**
|
||||
- Industry-standard secrets management (Vault experience transferable to work)
|
||||
- Eliminates manual age key distribution step
|
||||
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
|
||||
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
|
||||
- Automatic secret rotation capabilities
|
||||
- Audit logging for all secret access
|
||||
- AppRole authentication enables automated bootstrap
|
||||
|
||||
**Option B: Post-deployment secret injection**
|
||||
- [ ] VM boots with template, generates its own key
|
||||
- [ ] Fetch public key via SSH after first boot
|
||||
- [ ] Automatically add to `.sops.yaml` and commit
|
||||
- [ ] Trigger rebuild on VM to pick up secrets access
|
||||
**Architecture:**
|
||||
```
|
||||
vault.home.2rjus.net
|
||||
├─ KV Secrets Engine (replaces sops-nix)
|
||||
├─ PKI Engine (replaces step-ca for TLS)
|
||||
├─ SSH CA Engine (replaces step-ca SSH CA)
|
||||
└─ AppRole Auth (per-host authentication)
|
||||
↓
|
||||
New hosts authenticate on first boot
|
||||
Fetch secrets via Vault API
|
||||
No manual key distribution needed
|
||||
```
|
||||
|
||||
**Option C: Separate secrets from initial deployment**
|
||||
- [ ] Initial deployment works without secrets
|
||||
- [ ] After VM is running, user manually adds age key
|
||||
- [ ] Subsequent auto-upgrades pick up secrets
|
||||
---
|
||||
|
||||
**Decision needed:** Option A is most automated, but requires secure key handling
|
||||
#### Phase 4a: Vault Server Setup
|
||||
|
||||
**Goal:** Deploy and configure Vault server with auto-unseal
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Create `hosts/vault01/` configuration
|
||||
- [ ] Basic NixOS configuration (hostname, networking, etc.)
|
||||
- [ ] Vault service configuration
|
||||
- [ ] Firewall rules (8200 for API, 8201 for cluster)
|
||||
- [ ] Add to flake.nix and terraform
|
||||
- [ ] Implement auto-unseal mechanism
|
||||
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
|
||||
- [ ] Use tpm2-tools to seal/unseal Vault keys
|
||||
- [ ] Systemd service to unseal on boot
|
||||
- [ ] **Fallback:** Shamir secret sharing with systemd automation
|
||||
- [ ] Generate 3 keys, threshold 2
|
||||
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
|
||||
- [ ] Systemd service auto-unseals using 2 keys
|
||||
- [ ] Initial Vault setup
|
||||
- [ ] Initialize Vault
|
||||
- [ ] Configure storage backend (integrated raft or file)
|
||||
- [ ] Set up root token management
|
||||
- [ ] Enable audit logging
|
||||
- [ ] Deploy to infrastructure
|
||||
- [ ] Add DNS entry for vault.home.2rjus.net
|
||||
- [ ] Deploy VM via terraform
|
||||
- [ ] Bootstrap and verify Vault is running
|
||||
|
||||
**Deliverable:** Running Vault server that auto-unseals on boot
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4b: Vault-as-Code with OpenTofu
|
||||
|
||||
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault Terraform provider
|
||||
- [ ] Create `terraform/vault/` directory
|
||||
- [ ] Configure Vault provider (address, auth)
|
||||
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
|
||||
- [ ] Enable and configure secrets engines
|
||||
- [ ] Enable KV v2 secrets engine at `secret/`
|
||||
- [ ] Define secret path structure (per-service, per-host)
|
||||
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
|
||||
- [ ] Define policies as code
|
||||
- [ ] Create policies for different service tiers
|
||||
- [ ] Principle of least privilege (hosts only read their secrets)
|
||||
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
|
||||
- [ ] Set up AppRole authentication
|
||||
- [ ] Enable AppRole auth backend
|
||||
- [ ] Create role per host type (monitoring, dns, database, etc.)
|
||||
- [ ] Bind policies to roles
|
||||
- [ ] Configure TTL and token policies
|
||||
- [ ] Migrate existing secrets from sops-nix
|
||||
- [ ] Create migration script/playbook
|
||||
- [ ] Decrypt sops secrets and load into Vault KV
|
||||
- [ ] Verify all secrets migrated successfully
|
||||
- [ ] Keep sops as backup during transition
|
||||
- [ ] Implement secrets-as-code patterns
|
||||
- [ ] Secret values in gitignored terraform.tfvars
|
||||
- [ ] Or use random_password for auto-generated secrets
|
||||
- [ ] Secret structure/paths in version-controlled .tf files
|
||||
|
||||
**Example OpenTofu:**
|
||||
```hcl
|
||||
resource "vault_kv_secret_v2" "monitoring_grafana" {
|
||||
mount = "secret"
|
||||
name = "monitoring/grafana"
|
||||
data_json = jsonencode({
|
||||
admin_password = var.grafana_admin_password
|
||||
smtp_password = var.smtp_password
|
||||
})
|
||||
}
|
||||
|
||||
resource "vault_policy" "monitoring" {
|
||||
name = "monitoring-policy"
|
||||
policy = <<EOT
|
||||
path "secret/data/monitoring/*" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
EOT
|
||||
}
|
||||
|
||||
resource "vault_approle_auth_backend_role" "monitoring01" {
|
||||
backend = "approle"
|
||||
role_name = "monitoring01"
|
||||
token_policies = ["monitoring-policy"]
|
||||
}
|
||||
```
|
||||
|
||||
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4c: PKI Migration (Replace step-ca)
|
||||
|
||||
**Goal:** Consolidate PKI infrastructure into Vault
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault PKI engines
|
||||
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
|
||||
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
|
||||
- [ ] Sign intermediate with root CA
|
||||
- [ ] Configure CRL and OCSP
|
||||
- [ ] Enable ACME support
|
||||
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
|
||||
- [ ] Create PKI role for homelab domain
|
||||
- [ ] Set certificate TTLs and allowed domains
|
||||
- [ ] Configure SSH CA in Vault
|
||||
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
||||
- [ ] Generate SSH signing keys
|
||||
- [ ] Create roles for host and user certificates
|
||||
- [ ] Configure TTLs and allowed principals
|
||||
- [ ] Migrate hosts from step-ca to Vault
|
||||
- [ ] Update system/acme.nix to use Vault ACME endpoint
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test certificate issuance on one host
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Migrate SSH CA trust
|
||||
- [ ] Distribute Vault SSH CA public key to all hosts
|
||||
- [ ] Update sshd_config to trust Vault CA
|
||||
- [ ] Test SSH certificate authentication
|
||||
- [ ] Decommission step-ca
|
||||
- [ ] Verify all services migrated
|
||||
- [ ] Stop step-ca service on ca host
|
||||
- [ ] Archive step-ca configuration for backup
|
||||
|
||||
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4d: Bootstrap Integration
|
||||
|
||||
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Update create-host tool
|
||||
- [ ] Generate AppRole role_id + secret_id for new host
|
||||
- [ ] Or create wrapped token for one-time bootstrap
|
||||
- [ ] Add host-specific policy to Vault (via terraform)
|
||||
- [ ] Store bootstrap credentials for cloud-init injection
|
||||
- [ ] Update template2 for Vault authentication
|
||||
- [ ] Create Vault authentication module
|
||||
- [ ] Reads bootstrap credentials from cloud-init
|
||||
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
|
||||
- [ ] Stores role_id + secret_id locally for services to use
|
||||
- [ ] Create NixOS Vault secrets module
|
||||
- [ ] Replacement for sops.secrets
|
||||
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
|
||||
- [ ] Or runtime secret fetching for services
|
||||
- [ ] Handle Vault token renewal
|
||||
- [ ] Update bootstrap service
|
||||
- [ ] After authenticating to Vault, fetch any bootstrap secrets
|
||||
- [ ] Run nixos-rebuild with host configuration
|
||||
- [ ] Services automatically fetch their secrets from Vault
|
||||
- [ ] Update terraform cloud-init
|
||||
- [ ] Inject Vault address and bootstrap credentials
|
||||
- [ ] Pass via cloud-init user-data or write_files
|
||||
- [ ] Credentials scoped to single use or short TTL
|
||||
- [ ] Test complete flow
|
||||
- [ ] Run create-host to generate new host config
|
||||
- [ ] Deploy with terraform
|
||||
- [ ] Verify host bootstraps and authenticates to Vault
|
||||
- [ ] Verify services can fetch secrets
|
||||
- [ ] Confirm no manual steps required
|
||||
|
||||
**Bootstrap flow:**
|
||||
```
|
||||
1. terraform apply (deploys VM with cloud-init)
|
||||
2. Cloud-init sets hostname + Vault bootstrap credentials
|
||||
3. nixos-bootstrap.service runs:
|
||||
- Authenticates to Vault with bootstrap credentials
|
||||
- Retrieves permanent AppRole credentials
|
||||
- Stores locally for service use
|
||||
- Runs nixos-rebuild
|
||||
4. Host services fetch secrets from Vault as needed
|
||||
5. Done - no manual intervention
|
||||
```
|
||||
|
||||
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user