opentofu-experiments #4
231
TODO.md
Normal file
231
TODO.md
Normal file
@@ -0,0 +1,231 @@
|
|||||||
|
# TODO: Automated Host Deployment Pipeline
|
||||||
|
|
||||||
|
## Vision
|
||||||
|
|
||||||
|
Automate the entire process of creating, configuring, and deploying new NixOS hosts on Proxmox from a single command or script.
|
||||||
|
|
||||||
|
**Desired workflow:**
|
||||||
|
```bash
|
||||||
|
./scripts/create-host.sh --hostname myhost --ip 10.69.13.50
|
||||||
|
# Script creates config, deploys VM, bootstraps NixOS, and you're ready to go
|
||||||
|
```
|
||||||
|
|
||||||
|
**Current manual workflow (from CLAUDE.md):**
|
||||||
|
1. Create `/hosts/<hostname>/` directory structure
|
||||||
|
2. Add host to `flake.nix`
|
||||||
|
3. Add DNS entries
|
||||||
|
4. Clone template VM manually
|
||||||
|
5. Run `prepare-host.sh` on new VM
|
||||||
|
6. Add generated age key to `.sops.yaml`
|
||||||
|
7. Configure networking
|
||||||
|
8. Commit and push
|
||||||
|
9. Run `nixos-rebuild boot --flake URL#<hostname>` on host
|
||||||
|
|
||||||
|
## The Plan
|
||||||
|
|
||||||
|
### Phase 1: Parameterized OpenTofu Deployments ✓ (Partially Complete)
|
||||||
|
|
||||||
|
**Status:** Template building works, single VM deployment works, need to parameterize
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
- [ ] Create module/template structure in terraform for repeatable VM deployments
|
||||||
|
- [ ] Parameterize VM configuration (hostname, CPU, memory, disk, IP)
|
||||||
|
- [ ] Support both DHCP and static IP configuration via cloud-init
|
||||||
|
- [ ] Test deploying multiple VMs from same template
|
||||||
|
|
||||||
|
**Deliverable:** Can deploy a VM with custom parameters via OpenTofu
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2: Host Configuration Generator
|
||||||
|
|
||||||
|
**Goal:** Automate creation of host configuration files
|
||||||
|
|
||||||
|
Doesn't have to be a plain shell script, we could also use something like python, would probably make templating easier.
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
- [ ] Create script `scripts/create-host-config.sh`
|
||||||
|
- [ ] Takes parameters: hostname, IP, CPU cores, memory, disk size
|
||||||
|
- [ ] Generates `/hosts/<hostname>/` directory structure from template
|
||||||
|
- [ ] Creates `configuration.nix` with proper hostname and networking
|
||||||
|
- [ ] Generates `default.nix` with standard imports
|
||||||
|
- [ ] Copies/links `hardware-configuration.nix` from template
|
||||||
|
- [ ] Add host entry to `flake.nix` programmatically
|
||||||
|
- [ ] Parse flake.nix
|
||||||
|
- [ ] Insert new nixosConfiguration entry
|
||||||
|
- [ ] Maintain formatting
|
||||||
|
- [ ] Generate corresponding OpenTofu configuration
|
||||||
|
- [ ] Create `terraform/hosts/<hostname>.tf` with VM definition
|
||||||
|
- [ ] Use parameters from script input
|
||||||
|
|
||||||
|
**Deliverable:** Script generates all config files for a new host
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 3: Bootstrap Mechanism
|
||||||
|
|
||||||
|
**Goal:** Get freshly deployed VM to apply its specific host configuration
|
||||||
|
|
||||||
|
**Challenge:** Chicken-and-egg problem - VM needs to know its hostname and pull the right config
|
||||||
|
|
||||||
|
**Option A: Cloud-init bootstrap script**
|
||||||
|
- [ ] Add cloud-init `runcmd` to template2 that:
|
||||||
|
- [ ] Reads hostname from cloud-init metadata
|
||||||
|
- [ ] Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git#${hostname}`
|
||||||
|
- [ ] Reboots into the new configuration
|
||||||
|
- [ ] Test cloud-init script execution on fresh VM
|
||||||
|
- [ ] Handle failure cases (flake doesn't exist, network issues)
|
||||||
|
|
||||||
|
**Option B: Terraform provisioner**
|
||||||
|
- [ ] Use OpenTofu's `remote-exec` provisioner
|
||||||
|
- [ ] SSH into new VM after creation
|
||||||
|
- [ ] Run `nixos-rebuild boot --flake <url>#<hostname>`
|
||||||
|
- [ ] Trigger reboot via SSH
|
||||||
|
|
||||||
|
**Option C: Two-stage deployment**
|
||||||
|
- [ ] Deploy VM with template2 (minimal config)
|
||||||
|
- [ ] Run Ansible playbook to bootstrap specific config
|
||||||
|
- [ ] Similar to existing `run-upgrade.yml` pattern
|
||||||
|
|
||||||
|
**Decision needed:** Which approach fits best? (Recommend Option A for automation)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 4: Secrets Management Automation
|
||||||
|
|
||||||
|
**Challenge:** sops needs age key, but age key is generated on first boot
|
||||||
|
|
||||||
|
**Current workflow:**
|
||||||
|
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
|
||||||
|
2. User runs `prepare-host.sh` which prints public key
|
||||||
|
3. User manually adds public key to `.sops.yaml`
|
||||||
|
4. User commits, pushes
|
||||||
|
5. VM can now decrypt secrets
|
||||||
|
|
||||||
|
**Proposed solution:**
|
||||||
|
|
||||||
|
**Option A: Pre-generate age keys**
|
||||||
|
- [ ] Generate age key pair during `create-host-config.sh`
|
||||||
|
- [ ] Add public key to `.sops.yaml` immediately
|
||||||
|
- [ ] Store private key temporarily (secure location)
|
||||||
|
- [ ] Inject private key via cloud-init write_files or Terraform file provisioner
|
||||||
|
- [ ] VM uses pre-configured key from first boot
|
||||||
|
|
||||||
|
**Option B: Post-deployment secret injection**
|
||||||
|
- [ ] VM boots with template, generates its own key
|
||||||
|
- [ ] Fetch public key via SSH after first boot
|
||||||
|
- [ ] Automatically add to `.sops.yaml` and commit
|
||||||
|
- [ ] Trigger rebuild on VM to pick up secrets access
|
||||||
|
|
||||||
|
**Option C: Separate secrets from initial deployment**
|
||||||
|
- [ ] Initial deployment works without secrets
|
||||||
|
- [ ] After VM is running, user manually adds age key
|
||||||
|
- [ ] Subsequent auto-upgrades pick up secrets
|
||||||
|
|
||||||
|
**Decision needed:** Option A is most automated, but requires secure key handling
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 5: DNS Automation
|
||||||
|
|
||||||
|
**Goal:** Automatically generate DNS entries from host configurations
|
||||||
|
|
||||||
|
**Approach:** Leverage Nix to generate zone file entries from flake host configurations
|
||||||
|
|
||||||
|
Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs.
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
- [ ] Add optional CNAME field to host configurations
|
||||||
|
- [ ] Add `networking.cnames = [ "alias1" "alias2" ]` or similar option
|
||||||
|
- [ ] Document in host configuration template
|
||||||
|
- [ ] Create Nix function to extract DNS records from all hosts
|
||||||
|
- [ ] Parse each host's `networking.hostName` and IP configuration
|
||||||
|
- [ ] Collect any defined CNAMEs
|
||||||
|
- [ ] Generate zone file fragment with A and CNAME records
|
||||||
|
- [ ] Integrate auto-generated records into zone files
|
||||||
|
- [ ] Keep manual entries separate (for non-flake hosts/services)
|
||||||
|
- [ ] Include generated fragment in main zone file
|
||||||
|
- [ ] Add comments showing which records are auto-generated
|
||||||
|
- [ ] Update zone file serial number automatically
|
||||||
|
- [ ] Test zone file validity after generation
|
||||||
|
- [ ] Either:
|
||||||
|
- [ ] Automatically trigger DNS server reload (Ansible)
|
||||||
|
- [ ] Or document manual step: merge to master, run upgrade on ns1/ns2
|
||||||
|
|
||||||
|
**Deliverable:** DNS A records and CNAMEs automatically generated from host configs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 6: Integration Script
|
||||||
|
|
||||||
|
**Goal:** Single command to create and deploy a new host
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
- [ ] Create `scripts/create-host.sh` master script that orchestrates:
|
||||||
|
1. Prompts for: hostname, IP (or DHCP), CPU, memory, disk
|
||||||
|
2. Validates inputs (IP not in use, hostname unique, etc.)
|
||||||
|
3. Calls host config generator (Phase 2)
|
||||||
|
4. Generates OpenTofu config (Phase 2)
|
||||||
|
5. Handles secrets (Phase 4)
|
||||||
|
6. Updates DNS (Phase 5)
|
||||||
|
7. Commits all changes to git
|
||||||
|
8. Runs `tofu apply` to deploy VM
|
||||||
|
9. Waits for bootstrap to complete (Phase 3)
|
||||||
|
10. Prints success message with IP and SSH command
|
||||||
|
- [ ] Add `--dry-run` flag to preview changes
|
||||||
|
- [ ] Add `--interactive` mode vs `--batch` mode
|
||||||
|
- [ ] Error handling and rollback on failures
|
||||||
|
|
||||||
|
**Deliverable:** `./scripts/create-host.sh --hostname myhost --ip 10.69.13.50` creates a fully working host
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 7: Testing & Documentation
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
- [ ] Test full pipeline end-to-end
|
||||||
|
- [ ] Create test host and verify all steps
|
||||||
|
- [ ] Document the new workflow in CLAUDE.md
|
||||||
|
- [ ] Add troubleshooting section
|
||||||
|
- [ ] Create examples for common scenarios (DHCP host, static IP host, etc.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. **Bootstrap method:** Cloud-init runcmd vs Terraform provisioner vs Ansible?
|
||||||
|
2. **Secrets handling:** Pre-generate keys vs post-deployment injection?
|
||||||
|
3. **DNS automation:** Auto-commit or manual merge?
|
||||||
|
4. **Git workflow:** Auto-push changes or leave for user review?
|
||||||
|
5. **Template selection:** Single template2 or multiple templates for different host types?
|
||||||
|
6. **Networking:** Always DHCP initially, or support static IP from start?
|
||||||
|
7. **Error recovery:** What happens if bootstrap fails? Manual intervention or retry?
|
||||||
|
|
||||||
|
## Implementation Order
|
||||||
|
|
||||||
|
Recommended sequence:
|
||||||
|
1. Phase 1: Parameterize OpenTofu (foundation for testing)
|
||||||
|
2. Phase 3: Bootstrap mechanism (core automation)
|
||||||
|
3. Phase 2: Config generator (automate the boilerplate)
|
||||||
|
4. Phase 4: Secrets (solves biggest chicken-and-egg)
|
||||||
|
5. Phase 5: DNS (nice-to-have automation)
|
||||||
|
6. Phase 6: Integration script (ties it all together)
|
||||||
|
7. Phase 7: Testing & docs
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
When complete, creating a new host should:
|
||||||
|
- Take < 5 minutes of human time
|
||||||
|
- Require minimal user input (hostname, IP, basic specs)
|
||||||
|
- Result in a fully configured, secret-enabled, DNS-registered host
|
||||||
|
- Be reproducible and documented
|
||||||
|
- Handle common errors gracefully
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Keep incremental commits at each phase
|
||||||
|
- Test each phase independently before moving to next
|
||||||
|
- Maintain backward compatibility with manual workflow
|
||||||
|
- Document any manual steps that can't be automated
|
||||||
Reference in New Issue
Block a user