From ce6d2b1d33653c8f6d4a0b60c720ae4c07444626 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Sat, 31 Jan 2026 22:22:19 +0100 Subject: [PATCH] docs: add TODO.md for automated deployment pipeline Document multi-phase plan for automating NixOS host creation, deployment, and configuration on Proxmox including OpenTofu parameterization, config generation, bootstrap mechanism, secrets management, and Nix-based DNS automation. Co-Authored-By: Claude Sonnet 4.5 --- TODO.md | 231 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 TODO.md diff --git a/TODO.md b/TODO.md new file mode 100644 index 0000000..0000856 --- /dev/null +++ b/TODO.md @@ -0,0 +1,231 @@ +# TODO: Automated Host Deployment Pipeline + +## Vision + +Automate the entire process of creating, configuring, and deploying new NixOS hosts on Proxmox from a single command or script. + +**Desired workflow:** +```bash +./scripts/create-host.sh --hostname myhost --ip 10.69.13.50 +# Script creates config, deploys VM, bootstraps NixOS, and you're ready to go +``` + +**Current manual workflow (from CLAUDE.md):** +1. Create `/hosts//` directory structure +2. Add host to `flake.nix` +3. Add DNS entries +4. Clone template VM manually +5. Run `prepare-host.sh` on new VM +6. Add generated age key to `.sops.yaml` +7. Configure networking +8. Commit and push +9. Run `nixos-rebuild boot --flake URL#` on host + +## The Plan + +### Phase 1: Parameterized OpenTofu Deployments ✓ (Partially Complete) + +**Status:** Template building works, single VM deployment works, need to parameterize + +**Tasks:** +- [ ] Create module/template structure in terraform for repeatable VM deployments +- [ ] Parameterize VM configuration (hostname, CPU, memory, disk, IP) +- [ ] Support both DHCP and static IP configuration via cloud-init +- [ ] Test deploying multiple VMs from same template + +**Deliverable:** Can deploy a VM with custom parameters via OpenTofu + +--- + +### Phase 2: Host Configuration Generator + +**Goal:** Automate creation of host configuration files + +Doesn't have to be a plain shell script, we could also use something like python, would probably make templating easier. + +**Tasks:** +- [ ] Create script `scripts/create-host-config.sh` + - [ ] Takes parameters: hostname, IP, CPU cores, memory, disk size + - [ ] Generates `/hosts//` directory structure from template + - [ ] Creates `configuration.nix` with proper hostname and networking + - [ ] Generates `default.nix` with standard imports + - [ ] Copies/links `hardware-configuration.nix` from template +- [ ] Add host entry to `flake.nix` programmatically + - [ ] Parse flake.nix + - [ ] Insert new nixosConfiguration entry + - [ ] Maintain formatting +- [ ] Generate corresponding OpenTofu configuration + - [ ] Create `terraform/hosts/.tf` with VM definition + - [ ] Use parameters from script input + +**Deliverable:** Script generates all config files for a new host + +--- + +### Phase 3: Bootstrap Mechanism + +**Goal:** Get freshly deployed VM to apply its specific host configuration + +**Challenge:** Chicken-and-egg problem - VM needs to know its hostname and pull the right config + +**Option A: Cloud-init bootstrap script** +- [ ] Add cloud-init `runcmd` to template2 that: + - [ ] Reads hostname from cloud-init metadata + - [ ] Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git#${hostname}` + - [ ] Reboots into the new configuration +- [ ] Test cloud-init script execution on fresh VM +- [ ] Handle failure cases (flake doesn't exist, network issues) + +**Option B: Terraform provisioner** +- [ ] Use OpenTofu's `remote-exec` provisioner +- [ ] SSH into new VM after creation +- [ ] Run `nixos-rebuild boot --flake #` +- [ ] Trigger reboot via SSH + +**Option C: Two-stage deployment** +- [ ] Deploy VM with template2 (minimal config) +- [ ] Run Ansible playbook to bootstrap specific config +- [ ] Similar to existing `run-upgrade.yml` pattern + +**Decision needed:** Which approach fits best? (Recommend Option A for automation) + +--- + +### Phase 4: Secrets Management Automation + +**Challenge:** sops needs age key, but age key is generated on first boot + +**Current workflow:** +1. VM boots, generates age key at `/var/lib/sops-nix/key.txt` +2. User runs `prepare-host.sh` which prints public key +3. User manually adds public key to `.sops.yaml` +4. User commits, pushes +5. VM can now decrypt secrets + +**Proposed solution:** + +**Option A: Pre-generate age keys** +- [ ] Generate age key pair during `create-host-config.sh` +- [ ] Add public key to `.sops.yaml` immediately +- [ ] Store private key temporarily (secure location) +- [ ] Inject private key via cloud-init write_files or Terraform file provisioner +- [ ] VM uses pre-configured key from first boot + +**Option B: Post-deployment secret injection** +- [ ] VM boots with template, generates its own key +- [ ] Fetch public key via SSH after first boot +- [ ] Automatically add to `.sops.yaml` and commit +- [ ] Trigger rebuild on VM to pick up secrets access + +**Option C: Separate secrets from initial deployment** +- [ ] Initial deployment works without secrets +- [ ] After VM is running, user manually adds age key +- [ ] Subsequent auto-upgrades pick up secrets + +**Decision needed:** Option A is most automated, but requires secure key handling + +--- + +### Phase 5: DNS Automation + +**Goal:** Automatically generate DNS entries from host configurations + +**Approach:** Leverage Nix to generate zone file entries from flake host configurations + +Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs. + +**Tasks:** +- [ ] Add optional CNAME field to host configurations + - [ ] Add `networking.cnames = [ "alias1" "alias2" ]` or similar option + - [ ] Document in host configuration template +- [ ] Create Nix function to extract DNS records from all hosts + - [ ] Parse each host's `networking.hostName` and IP configuration + - [ ] Collect any defined CNAMEs + - [ ] Generate zone file fragment with A and CNAME records +- [ ] Integrate auto-generated records into zone files + - [ ] Keep manual entries separate (for non-flake hosts/services) + - [ ] Include generated fragment in main zone file + - [ ] Add comments showing which records are auto-generated +- [ ] Update zone file serial number automatically +- [ ] Test zone file validity after generation +- [ ] Either: + - [ ] Automatically trigger DNS server reload (Ansible) + - [ ] Or document manual step: merge to master, run upgrade on ns1/ns2 + +**Deliverable:** DNS A records and CNAMEs automatically generated from host configs + +--- + +### Phase 6: Integration Script + +**Goal:** Single command to create and deploy a new host + +**Tasks:** +- [ ] Create `scripts/create-host.sh` master script that orchestrates: + 1. Prompts for: hostname, IP (or DHCP), CPU, memory, disk + 2. Validates inputs (IP not in use, hostname unique, etc.) + 3. Calls host config generator (Phase 2) + 4. Generates OpenTofu config (Phase 2) + 5. Handles secrets (Phase 4) + 6. Updates DNS (Phase 5) + 7. Commits all changes to git + 8. Runs `tofu apply` to deploy VM + 9. Waits for bootstrap to complete (Phase 3) + 10. Prints success message with IP and SSH command +- [ ] Add `--dry-run` flag to preview changes +- [ ] Add `--interactive` mode vs `--batch` mode +- [ ] Error handling and rollback on failures + +**Deliverable:** `./scripts/create-host.sh --hostname myhost --ip 10.69.13.50` creates a fully working host + +--- + +### Phase 7: Testing & Documentation + +**Tasks:** +- [ ] Test full pipeline end-to-end +- [ ] Create test host and verify all steps +- [ ] Document the new workflow in CLAUDE.md +- [ ] Add troubleshooting section +- [ ] Create examples for common scenarios (DHCP host, static IP host, etc.) + +--- + +## Open Questions + +1. **Bootstrap method:** Cloud-init runcmd vs Terraform provisioner vs Ansible? +2. **Secrets handling:** Pre-generate keys vs post-deployment injection? +3. **DNS automation:** Auto-commit or manual merge? +4. **Git workflow:** Auto-push changes or leave for user review? +5. **Template selection:** Single template2 or multiple templates for different host types? +6. **Networking:** Always DHCP initially, or support static IP from start? +7. **Error recovery:** What happens if bootstrap fails? Manual intervention or retry? + +## Implementation Order + +Recommended sequence: +1. Phase 1: Parameterize OpenTofu (foundation for testing) +2. Phase 3: Bootstrap mechanism (core automation) +3. Phase 2: Config generator (automate the boilerplate) +4. Phase 4: Secrets (solves biggest chicken-and-egg) +5. Phase 5: DNS (nice-to-have automation) +6. Phase 6: Integration script (ties it all together) +7. Phase 7: Testing & docs + +## Success Criteria + +When complete, creating a new host should: +- Take < 5 minutes of human time +- Require minimal user input (hostname, IP, basic specs) +- Result in a fully configured, secret-enabled, DNS-registered host +- Be reproducible and documented +- Handle common errors gracefully + +--- + +## Notes + +- Keep incremental commits at each phase +- Test each phase independently before moving to next +- Maintain backward compatibility with manual workflow +- Document any manual steps that can't be automated