Document multi-phase plan for automating NixOS host creation, deployment, and configuration on Proxmox including OpenTofu parameterization, config generation, bootstrap mechanism, secrets management, and Nix-based DNS automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
8.4 KiB
TODO: Automated Host Deployment Pipeline
Vision
Automate the entire process of creating, configuring, and deploying new NixOS hosts on Proxmox from a single command or script.
Desired workflow:
./scripts/create-host.sh --hostname myhost --ip 10.69.13.50
# Script creates config, deploys VM, bootstraps NixOS, and you're ready to go
Current manual workflow (from CLAUDE.md):
- Create
/hosts/<hostname>/directory structure - Add host to
flake.nix - Add DNS entries
- Clone template VM manually
- Run
prepare-host.shon new VM - Add generated age key to
.sops.yaml - Configure networking
- Commit and push
- Run
nixos-rebuild boot --flake URL#<hostname>on host
The Plan
Phase 1: Parameterized OpenTofu Deployments ✓ (Partially Complete)
Status: Template building works, single VM deployment works, need to parameterize
Tasks:
- Create module/template structure in terraform for repeatable VM deployments
- Parameterize VM configuration (hostname, CPU, memory, disk, IP)
- Support both DHCP and static IP configuration via cloud-init
- Test deploying multiple VMs from same template
Deliverable: Can deploy a VM with custom parameters via OpenTofu
Phase 2: Host Configuration Generator
Goal: Automate creation of host configuration files
Doesn't have to be a plain shell script, we could also use something like python, would probably make templating easier.
Tasks:
- Create script
scripts/create-host-config.sh- Takes parameters: hostname, IP, CPU cores, memory, disk size
- Generates
/hosts/<hostname>/directory structure from template - Creates
configuration.nixwith proper hostname and networking - Generates
default.nixwith standard imports - Copies/links
hardware-configuration.nixfrom template
- Add host entry to
flake.nixprogrammatically- Parse flake.nix
- Insert new nixosConfiguration entry
- Maintain formatting
- Generate corresponding OpenTofu configuration
- Create
terraform/hosts/<hostname>.tfwith VM definition - Use parameters from script input
- Create
Deliverable: Script generates all config files for a new host
Phase 3: Bootstrap Mechanism
Goal: Get freshly deployed VM to apply its specific host configuration
Challenge: Chicken-and-egg problem - VM needs to know its hostname and pull the right config
Option A: Cloud-init bootstrap script
- Add cloud-init
runcmdto template2 that:- Reads hostname from cloud-init metadata
- Runs
nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git#${hostname} - Reboots into the new configuration
- Test cloud-init script execution on fresh VM
- Handle failure cases (flake doesn't exist, network issues)
Option B: Terraform provisioner
- Use OpenTofu's
remote-execprovisioner - SSH into new VM after creation
- Run
nixos-rebuild boot --flake <url>#<hostname> - Trigger reboot via SSH
Option C: Two-stage deployment
- Deploy VM with template2 (minimal config)
- Run Ansible playbook to bootstrap specific config
- Similar to existing
run-upgrade.ymlpattern
Decision needed: Which approach fits best? (Recommend Option A for automation)
Phase 4: Secrets Management Automation
Challenge: sops needs age key, but age key is generated on first boot
Current workflow:
- VM boots, generates age key at
/var/lib/sops-nix/key.txt - User runs
prepare-host.shwhich prints public key - User manually adds public key to
.sops.yaml - User commits, pushes
- VM can now decrypt secrets
Proposed solution:
Option A: Pre-generate age keys
- Generate age key pair during
create-host-config.sh - Add public key to
.sops.yamlimmediately - Store private key temporarily (secure location)
- Inject private key via cloud-init write_files or Terraform file provisioner
- VM uses pre-configured key from first boot
Option B: Post-deployment secret injection
- VM boots with template, generates its own key
- Fetch public key via SSH after first boot
- Automatically add to
.sops.yamland commit - Trigger rebuild on VM to pick up secrets access
Option C: Separate secrets from initial deployment
- Initial deployment works without secrets
- After VM is running, user manually adds age key
- Subsequent auto-upgrades pick up secrets
Decision needed: Option A is most automated, but requires secure key handling
Phase 5: DNS Automation
Goal: Automatically generate DNS entries from host configurations
Approach: Leverage Nix to generate zone file entries from flake host configurations
Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs.
Tasks:
- Add optional CNAME field to host configurations
- Add
networking.cnames = [ "alias1" "alias2" ]or similar option - Document in host configuration template
- Add
- Create Nix function to extract DNS records from all hosts
- Parse each host's
networking.hostNameand IP configuration - Collect any defined CNAMEs
- Generate zone file fragment with A and CNAME records
- Parse each host's
- Integrate auto-generated records into zone files
- Keep manual entries separate (for non-flake hosts/services)
- Include generated fragment in main zone file
- Add comments showing which records are auto-generated
- Update zone file serial number automatically
- Test zone file validity after generation
- Either:
- Automatically trigger DNS server reload (Ansible)
- Or document manual step: merge to master, run upgrade on ns1/ns2
Deliverable: DNS A records and CNAMEs automatically generated from host configs
Phase 6: Integration Script
Goal: Single command to create and deploy a new host
Tasks:
- Create
scripts/create-host.shmaster script that orchestrates:- Prompts for: hostname, IP (or DHCP), CPU, memory, disk
- Validates inputs (IP not in use, hostname unique, etc.)
- Calls host config generator (Phase 2)
- Generates OpenTofu config (Phase 2)
- Handles secrets (Phase 4)
- Updates DNS (Phase 5)
- Commits all changes to git
- Runs
tofu applyto deploy VM - Waits for bootstrap to complete (Phase 3)
- Prints success message with IP and SSH command
- Add
--dry-runflag to preview changes - Add
--interactivemode vs--batchmode - Error handling and rollback on failures
Deliverable: ./scripts/create-host.sh --hostname myhost --ip 10.69.13.50 creates a fully working host
Phase 7: Testing & Documentation
Tasks:
- Test full pipeline end-to-end
- Create test host and verify all steps
- Document the new workflow in CLAUDE.md
- Add troubleshooting section
- Create examples for common scenarios (DHCP host, static IP host, etc.)
Open Questions
- Bootstrap method: Cloud-init runcmd vs Terraform provisioner vs Ansible?
- Secrets handling: Pre-generate keys vs post-deployment injection?
- DNS automation: Auto-commit or manual merge?
- Git workflow: Auto-push changes or leave for user review?
- Template selection: Single template2 or multiple templates for different host types?
- Networking: Always DHCP initially, or support static IP from start?
- Error recovery: What happens if bootstrap fails? Manual intervention or retry?
Implementation Order
Recommended sequence:
- Phase 1: Parameterize OpenTofu (foundation for testing)
- Phase 3: Bootstrap mechanism (core automation)
- Phase 2: Config generator (automate the boilerplate)
- Phase 4: Secrets (solves biggest chicken-and-egg)
- Phase 5: DNS (nice-to-have automation)
- Phase 6: Integration script (ties it all together)
- Phase 7: Testing & docs
Success Criteria
When complete, creating a new host should:
- Take < 5 minutes of human time
- Require minimal user input (hostname, IP, basic specs)
- Result in a fully configured, secret-enabled, DNS-registered host
- Be reproducible and documented
- Handle common errors gracefully
Notes
- Keep incremental commits at each phase
- Test each phase independently before moving to next
- Maintain backward compatibility with manual workflow
- Document any manual steps that can't be automated