From 7f72a720438722288000caed591ffcd2f60c8469 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Sat, 31 Jan 2026 16:12:49 +0100 Subject: [PATCH 1/3] flake: add opentofu to devshell --- flake.nix | 1 + 1 file changed, 1 insertion(+) diff --git a/flake.nix b/flake.nix index 9243cad..779eabb 100644 --- a/flake.nix +++ b/flake.nix @@ -325,6 +325,7 @@ default = pkgs.mkShell { packages = with pkgs; [ ansible + opentofu python3 ]; }; -- 2.49.1 From 3a464bc323ded3ccf258ab4ab2e925745842a9fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Sat, 31 Jan 2026 21:54:08 +0100 Subject: [PATCH 2/3] proxmox: add VM automation with OpenTofu and Ansible Add automated workflow for building and deploying NixOS VMs on Proxmox including template2 host configuration, Ansible playbook for image building/deployment, and OpenTofu configuration for VM provisioning with cloud-init. Co-Authored-By: Claude Sonnet 4.5 --- .gitignore | 10 ++ CLAUDE.md | 60 ++++++++++++ flake.nix | 16 ++++ hosts/template2/configuration.nix | 75 +++++++++++++++ hosts/template2/default.nix | 9 ++ hosts/template2/hardware-configuration.nix | 36 ++++++++ hosts/template2/scripts.nix | 33 +++++++ playbooks/build-and-deploy-template.yml | 101 +++++++++++++++++++++ playbooks/inventory.ini | 5 + terraform/README.md | 37 ++++++++ terraform/main.tf | 18 ++++ terraform/terraform.tfvars.example | 7 ++ terraform/variables.tf | 22 +++++ terraform/vm.tf | 90 ++++++++++++++++++ 14 files changed, 519 insertions(+) create mode 100644 hosts/template2/configuration.nix create mode 100644 hosts/template2/default.nix create mode 100644 hosts/template2/hardware-configuration.nix create mode 100644 hosts/template2/scripts.nix create mode 100644 playbooks/build-and-deploy-template.yml create mode 100644 playbooks/inventory.ini create mode 100644 terraform/README.md create mode 100644 terraform/main.tf create mode 100644 terraform/terraform.tfvars.example create mode 100644 terraform/variables.tf create mode 100644 terraform/vm.tf diff --git a/.gitignore b/.gitignore index d53e06f..8068363 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,12 @@ .direnv/ result + +# Terraform/OpenTofu +terraform/.terraform/ +terraform/.terraform.lock.hcl +terraform/*.tfstate +terraform/*.tfstate.* +terraform/terraform.tfvars +terraform/*.auto.tfvars +terraform/crash.log +terraform/crash.*.log diff --git a/CLAUDE.md b/CLAUDE.md index dd02cd7..8443a48 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -44,6 +44,15 @@ nix develop Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary. +### Git Commit Messages + +Commit messages should follow the format: `topic: short description` + +Examples: +- `flake: add opentofu to devshell` +- `template2: add proxmox image configuration` +- `terraform: add VM deployment configuration` + ## Architecture ### Directory Structure @@ -143,6 +152,57 @@ Configured in `/system/autoupgrade.nix`: - Auto-reboot after successful upgrade - Systemd service: `nixos-upgrade.service` +### Proxmox VM Provisioning with OpenTofu + +The repository includes automated workflows for building Proxmox VM templates and deploying VMs using OpenTofu (Terraform). + +#### Building and Deploying Templates + +Template VMs are built from `hosts/template2` and deployed to Proxmox using Ansible: + +```bash +# Build NixOS image and deploy to Proxmox as template +nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml +``` + +This playbook: +1. Builds the Proxmox image using `nixos-rebuild build-image --image-variant proxmox` +2. Uploads the `.vma.zst` image to Proxmox at `/var/lib/vz/dump` +3. Restores it as VM ID 9000 +4. Converts it to a template + +Template configuration (`hosts/template2`): +- Minimal base system with essential packages (age, vim, wget, git) +- Cloud-init configured for NoCloud datasource (no EC2 metadata timeout) +- DHCP networking on ens18 +- SSH key-based root login +- `prepare-host.sh` script for cleaning machine-id, SSH keys, and regenerating age keys + +#### Deploying VMs with OpenTofu + +VMs are deployed from templates using OpenTofu in the `/terraform` directory: + +```bash +cd terraform +tofu init # First time only +tofu apply # Deploy VMs +``` + +Configuration files: +- `main.tf` - Proxmox provider configuration +- `variables.tf` - Provider variables (API credentials) +- `vm.tf` - VM resource definitions +- `terraform.tfvars` - Actual credentials (gitignored) + +Example VM deployment includes: +- Clone from template VM +- Cloud-init configuration (SSH keys, network, DNS) +- Custom CPU/memory/disk sizing +- VLAN tagging +- QEMU guest agent + +OpenTofu outputs the VM's IP address after deployment for easy SSH access. + ### Adding a New Host 1. Create `/hosts//` directory diff --git a/flake.nix b/flake.nix index 779eabb..4db4e11 100644 --- a/flake.nix +++ b/flake.nix @@ -172,6 +172,22 @@ sops-nix.nixosModules.sops ]; }; + template2 = nixpkgs.lib.nixosSystem { + inherit system; + specialArgs = { + inherit inputs self sops-nix; + }; + modules = [ + ( + { config, pkgs, ... }: + { + nixpkgs.overlays = commonOverlays; + } + ) + ./hosts/template2 + sops-nix.nixosModules.sops + ]; + }; http-proxy = nixpkgs.lib.nixosSystem { inherit system; specialArgs = { diff --git a/hosts/template2/configuration.nix b/hosts/template2/configuration.nix new file mode 100644 index 0000000..7daad62 --- /dev/null +++ b/hosts/template2/configuration.nix @@ -0,0 +1,75 @@ +{ + config, + lib, + pkgs, + ... +}: + +{ + imports = [ + ./hardware-configuration.nix + ../../system/sshd.nix + ]; + + # Root user with no password but SSH key access for bootstrapping + users.users.root = { + hashedPassword = ""; + openssh.authorizedKeys.keys = [ + "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter" + ]; + }; + + # Proxmox image-specific configuration + # Configure storage to use local-zfs instead of local-lvm + image.modules.proxmox = { + proxmox.qemuConf.virtio0 = lib.mkForce "local-zfs:vm-9999-disk-0"; + proxmox.qemuConf.boot = lib.mkForce "order=virtio0"; + proxmox.cloudInit.defaultStorage = lib.mkForce "local-zfs"; + }; + + # Configure cloud-init to only use NoCloud datasource (no EC2 metadata service) + services.cloud-init.settings = { + datasource_list = [ "NoCloud" ]; + datasource = { + NoCloud = { + fs_label = "cidata"; + }; + }; + }; + + boot.loader.grub.enable = true; + boot.loader.grub.device = "/dev/vda"; + networking.hostName = "nixos-template2"; + networking.domain = "home.2rjus.net"; + networking.useNetworkd = true; + networking.useDHCP = false; + services.resolved.enable = true; + + systemd.network.enable = true; + systemd.network.networks."ens18" = { + matchConfig.Name = "ens18"; + networkConfig.DHCP = "ipv4"; + linkConfig.RequiredForOnline = "routable"; + }; + time.timeZone = "Europe/Oslo"; + + nix.settings.experimental-features = [ + "nix-command" + "flakes" + ]; + nix.settings.tarball-ttl = 0; + environment.systemPackages = with pkgs; [ + age + vim + wget + git + ]; + + # Open ports in the firewall. + # networking.firewall.allowedTCPPorts = [ ... ]; + # networking.firewall.allowedUDPPorts = [ ... ]; + # Or disable the firewall altogether. + networking.firewall.enable = false; + + system.stateVersion = "25.11"; +} diff --git a/hosts/template2/default.nix b/hosts/template2/default.nix new file mode 100644 index 0000000..711cc51 --- /dev/null +++ b/hosts/template2/default.nix @@ -0,0 +1,9 @@ +{ ... }: +{ + imports = [ + ./hardware-configuration.nix + ./configuration.nix + ./scripts.nix + ../../system/packages.nix + ]; +} diff --git a/hosts/template2/hardware-configuration.nix b/hosts/template2/hardware-configuration.nix new file mode 100644 index 0000000..7086fe9 --- /dev/null +++ b/hosts/template2/hardware-configuration.nix @@ -0,0 +1,36 @@ +{ + config, + lib, + pkgs, + modulesPath, + ... +}: + +{ + imports = [ + (modulesPath + "/profiles/qemu-guest.nix") + ]; + boot.initrd.availableKernelModules = [ + "ata_piix" + "uhci_hcd" + "virtio_pci" + "virtio_scsi" + "sd_mod" + "sr_mod" + ]; + boot.initrd.kernelModules = [ "dm-snapshot" ]; + boot.kernelModules = [ + "ptp_kvm" + "virtio_rng" # Provides entropy from host for fast SSH key generation + ]; + boot.extraModulePackages = [ ]; + + # Enables DHCP on each ethernet and wireless interface. In case of scripted networking + # (the default) this is the recommended approach. When using systemd-networkd it's + # still possible to use this option, but it's recommended to use it in conjunction + # with explicit per-interface declarations with `networking.interfaces..useDHCP`. + networking.useDHCP = lib.mkDefault true; + # networking.interfaces.ens18.useDHCP = lib.mkDefault true; + + nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux"; +} diff --git a/hosts/template2/scripts.nix b/hosts/template2/scripts.nix new file mode 100644 index 0000000..9ee1e75 --- /dev/null +++ b/hosts/template2/scripts.nix @@ -0,0 +1,33 @@ +{ pkgs, ... }: +let + prepare-host-script = pkgs.writeShellScriptBin "prepare-host.sh" + '' + echo "Removing machine-id" + rm -f /etc/machine-id || true + + echo "Removing SSH host keys" + rm -f /etc/ssh/ssh_host_* || true + + echo "Restarting SSH" + systemctl restart sshd + + echo "Removing temporary files" + rm -rf /tmp/* || true + + echo "Removing logs" + journalctl --rotate || true + journalctl --vacuum-time=1s || true + + echo "Removing cache" + rm -rf /var/cache/* || true + + echo "Generate age key" + rm -rf /var/lib/sops-nix || true + mkdir -p /var/lib/sops-nix + ${pkgs.age}/bin/age-keygen -o /var/lib/sops-nix/key.txt + ''; +in +{ + environment.systemPackages = [ prepare-host-script ]; + users.motd = "Prepare host by running 'prepare-host.sh'."; +} diff --git a/playbooks/build-and-deploy-template.yml b/playbooks/build-and-deploy-template.yml new file mode 100644 index 0000000..fdf29bd --- /dev/null +++ b/playbooks/build-and-deploy-template.yml @@ -0,0 +1,101 @@ +--- +- name: Build and deploy NixOS Proxmox template + hosts: localhost + gather_facts: false + + vars: + template_name: "template2" + nixos_config: "template2" + proxmox_node: "pve1.home.2rjus.net" # Change to your Proxmox node name + proxmox_host: "pve1.home.2rjus.net" # Change to your Proxmox host + template_vmid: 9000 # Template VM ID + storage: "local-zfs" + + tasks: + - name: Build NixOS image + ansible.builtin.command: + cmd: "nixos-rebuild build-image --image-variant proxmox --flake .#template2" + chdir: "{{ playbook_dir }}/.." + register: build_result + changed_when: true + + - name: Find built image file + ansible.builtin.find: + paths: "{{ playbook_dir}}/../result" + patterns: "*.vma.zst" + recurse: true + register: image_files + + - name: Fail if no image found + ansible.builtin.fail: + msg: "No QCOW2 image found in build output" + when: image_files.matched == 0 + + - name: Set image path + ansible.builtin.set_fact: + image_path: "{{ image_files.files[0].path }}" + + - name: Extract image filename + ansible.builtin.set_fact: + image_filename: "{{ image_path | basename }}" + + - name: Display image info + ansible.builtin.debug: + msg: "Built image: {{ image_path }} ({{ image_filename }})" + +- name: Deploy template to Proxmox + hosts: proxmox + gather_facts: false + + vars: + template_name: "template2" + template_vmid: 9000 + storage: "local-zfs" + + tasks: + - name: Get image path and filename from localhost + ansible.builtin.set_fact: + image_path: "{{ hostvars['localhost']['image_path'] }}" + image_filename: "{{ hostvars['localhost']['image_filename'] }}" + + - name: Set destination path + ansible.builtin.set_fact: + image_dest: "/var/lib/vz/dump/{{ image_filename }}" + + - name: Copy image to Proxmox + ansible.builtin.copy: + src: "{{ image_path }}" + dest: "{{ image_dest }}" + mode: '0644' + + - name: Check if template VM already exists + ansible.builtin.command: + cmd: "qm status {{ template_vmid }}" + register: vm_status + failed_when: false + changed_when: false + + - name: Destroy existing template VM if it exists + ansible.builtin.command: + cmd: "qm destroy {{ template_vmid }} --purge" + when: vm_status.rc == 0 + changed_when: true + + - name: Import image + ansible.builtin.command: + cmd: "qmrestore {{ image_dest }} {{ template_vmid }}" + changed_when: true + + - name: Convert VM to template + ansible.builtin.command: + cmd: "qm template {{ template_vmid }}" + changed_when: true + + - name: Clean up uploaded image + ansible.builtin.file: + path: "{{ image_dest }}" + state: absent + + - name: Display success message + ansible.builtin.debug: + msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}" diff --git a/playbooks/inventory.ini b/playbooks/inventory.ini new file mode 100644 index 0000000..d8c057d --- /dev/null +++ b/playbooks/inventory.ini @@ -0,0 +1,5 @@ +[proxmox] +pve1.home.2rjus.net + +[proxmox:vars] +ansible_user=root diff --git a/terraform/README.md b/terraform/README.md new file mode 100644 index 0000000..be4ebb7 --- /dev/null +++ b/terraform/README.md @@ -0,0 +1,37 @@ +# OpenTofu Configuration for Proxmox + +This directory contains OpenTofu configuration for managing Proxmox VMs. + +## Setup + +1. **Create a Proxmox API token:** + - Log into Proxmox web UI + - Go to Datacenter → Permissions → API Tokens + - Click Add + - User: `root@pam`, Token ID: `terraform` + - Uncheck "Privilege Separation" + - Save the token secret (shown only once) + +2. **Configure credentials:** + ```bash + cd terraform + cp terraform.tfvars.example terraform.tfvars + # Edit terraform.tfvars with your Proxmox details + ``` + +3. **Initialize OpenTofu:** + ```bash + tofu init + ``` + +4. **Test connection:** + ```bash + tofu plan + ``` + +## Files + +- `main.tf` - Provider configuration and test data source +- `variables.tf` - Variable definitions +- `terraform.tfvars.example` - Example credentials file +- `terraform.tfvars` - Your actual credentials (gitignored) diff --git a/terraform/main.tf b/terraform/main.tf new file mode 100644 index 0000000..2def9ff --- /dev/null +++ b/terraform/main.tf @@ -0,0 +1,18 @@ +terraform { + required_version = ">= 1.0" + required_providers { + proxmox = { + source = "telmate/proxmox" + version = "3.0.2-rc07" + } + } +} + +provider "proxmox" { + pm_api_url = var.proxmox_api_url + pm_api_token_id = var.proxmox_api_token_id + pm_api_token_secret = var.proxmox_api_token_secret + pm_tls_insecure = var.proxmox_tls_insecure +} + +# Provider configured - ready to add resources diff --git a/terraform/terraform.tfvars.example b/terraform/terraform.tfvars.example new file mode 100644 index 0000000..fa6286e --- /dev/null +++ b/terraform/terraform.tfvars.example @@ -0,0 +1,7 @@ +# Copy this file to terraform.tfvars and fill in your values +# terraform.tfvars is gitignored to keep credentials safe + +proxmox_api_url = "https://your-proxmox-host.home.2rjus.net:8006/api2/json" +proxmox_api_token_id = "root@pam!terraform" +proxmox_api_token_secret = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" +proxmox_tls_insecure = true diff --git a/terraform/variables.tf b/terraform/variables.tf new file mode 100644 index 0000000..a1ec455 --- /dev/null +++ b/terraform/variables.tf @@ -0,0 +1,22 @@ +variable "proxmox_api_url" { + description = "Proxmox API URL (e.g., https://proxmox.home.2rjus.net:8006/api2/json)" + type = string +} + +variable "proxmox_api_token_id" { + description = "Proxmox API Token ID (e.g., root@pam!terraform)" + type = string + sensitive = true +} + +variable "proxmox_api_token_secret" { + description = "Proxmox API Token Secret" + type = string + sensitive = true +} + +variable "proxmox_tls_insecure" { + description = "Skip TLS verification (set to true for self-signed certs)" + type = bool + default = true +} diff --git a/terraform/vm.tf b/terraform/vm.tf new file mode 100644 index 0000000..710c1fb --- /dev/null +++ b/terraform/vm.tf @@ -0,0 +1,90 @@ +# Example VM configuration - clone from template +# Before using this, you need to: +# 1. Upload the NixOS image to Proxmox +# 2. Restore it as a template VM (e.g., ID 9000) +# 3. Update the variables below + +variable "target_node" { + description = "Proxmox node to deploy to" + type = string + default = "pve1" +} + +variable "template_name" { + description = "Template VM name to clone from" + type = string + default = "nixos-25.11.20260128.fa83fd8" +} + +variable "ssh_public_key" { + description = "SSH public key for root user" + type = string + default = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter" +} + +# Example test VM +resource "proxmox_vm_qemu" "test_vm" { + name = "nixos-test-tofu" + target_node = var.target_node + + # Clone from template + clone = var.template_name + + # Full clone (not linked) + full_clone = true + + # Boot configuration + boot = "order=virtio0" + scsihw = "virtio-scsi-single" + + # VM settings + cpu { + cores = 2 + } + memory = 2048 + + # Network + network { + id = 0 + model = "virtio" + bridge = "vmbr0" + tag = 13 + } + + # Disk settings + disks { + virtio { + virtio0 { + disk { + size = "20G" + storage = "local-zfs" + } + } + } + } + + # Start on boot + start_at_node_boot = true + + # Agent + agent = 1 + + # Cloud-init configuration + ciuser = "root" + sshkeys = var.ssh_public_key + ipconfig0 = "ip=dhcp" + nameserver = "10.69.13.5 10.69.13.6" + searchdomain = "home.2rjus.net" + + # Skip IPv6 since we don't use it + skip_ipv6 = true + + rng { + source = "/dev/urandom" + period = 1000 + } +} + +output "test_vm_ip" { + value = proxmox_vm_qemu.test_vm.default_ipv4_address +} -- 2.49.1 From ce6d2b1d33653c8f6d4a0b60c720ae4c07444626 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Sat, 31 Jan 2026 22:22:19 +0100 Subject: [PATCH 3/3] docs: add TODO.md for automated deployment pipeline Document multi-phase plan for automating NixOS host creation, deployment, and configuration on Proxmox including OpenTofu parameterization, config generation, bootstrap mechanism, secrets management, and Nix-based DNS automation. Co-Authored-By: Claude Sonnet 4.5 --- TODO.md | 231 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 TODO.md diff --git a/TODO.md b/TODO.md new file mode 100644 index 0000000..0000856 --- /dev/null +++ b/TODO.md @@ -0,0 +1,231 @@ +# TODO: Automated Host Deployment Pipeline + +## Vision + +Automate the entire process of creating, configuring, and deploying new NixOS hosts on Proxmox from a single command or script. + +**Desired workflow:** +```bash +./scripts/create-host.sh --hostname myhost --ip 10.69.13.50 +# Script creates config, deploys VM, bootstraps NixOS, and you're ready to go +``` + +**Current manual workflow (from CLAUDE.md):** +1. Create `/hosts//` directory structure +2. Add host to `flake.nix` +3. Add DNS entries +4. Clone template VM manually +5. Run `prepare-host.sh` on new VM +6. Add generated age key to `.sops.yaml` +7. Configure networking +8. Commit and push +9. Run `nixos-rebuild boot --flake URL#` on host + +## The Plan + +### Phase 1: Parameterized OpenTofu Deployments ✓ (Partially Complete) + +**Status:** Template building works, single VM deployment works, need to parameterize + +**Tasks:** +- [ ] Create module/template structure in terraform for repeatable VM deployments +- [ ] Parameterize VM configuration (hostname, CPU, memory, disk, IP) +- [ ] Support both DHCP and static IP configuration via cloud-init +- [ ] Test deploying multiple VMs from same template + +**Deliverable:** Can deploy a VM with custom parameters via OpenTofu + +--- + +### Phase 2: Host Configuration Generator + +**Goal:** Automate creation of host configuration files + +Doesn't have to be a plain shell script, we could also use something like python, would probably make templating easier. + +**Tasks:** +- [ ] Create script `scripts/create-host-config.sh` + - [ ] Takes parameters: hostname, IP, CPU cores, memory, disk size + - [ ] Generates `/hosts//` directory structure from template + - [ ] Creates `configuration.nix` with proper hostname and networking + - [ ] Generates `default.nix` with standard imports + - [ ] Copies/links `hardware-configuration.nix` from template +- [ ] Add host entry to `flake.nix` programmatically + - [ ] Parse flake.nix + - [ ] Insert new nixosConfiguration entry + - [ ] Maintain formatting +- [ ] Generate corresponding OpenTofu configuration + - [ ] Create `terraform/hosts/.tf` with VM definition + - [ ] Use parameters from script input + +**Deliverable:** Script generates all config files for a new host + +--- + +### Phase 3: Bootstrap Mechanism + +**Goal:** Get freshly deployed VM to apply its specific host configuration + +**Challenge:** Chicken-and-egg problem - VM needs to know its hostname and pull the right config + +**Option A: Cloud-init bootstrap script** +- [ ] Add cloud-init `runcmd` to template2 that: + - [ ] Reads hostname from cloud-init metadata + - [ ] Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git#${hostname}` + - [ ] Reboots into the new configuration +- [ ] Test cloud-init script execution on fresh VM +- [ ] Handle failure cases (flake doesn't exist, network issues) + +**Option B: Terraform provisioner** +- [ ] Use OpenTofu's `remote-exec` provisioner +- [ ] SSH into new VM after creation +- [ ] Run `nixos-rebuild boot --flake #` +- [ ] Trigger reboot via SSH + +**Option C: Two-stage deployment** +- [ ] Deploy VM with template2 (minimal config) +- [ ] Run Ansible playbook to bootstrap specific config +- [ ] Similar to existing `run-upgrade.yml` pattern + +**Decision needed:** Which approach fits best? (Recommend Option A for automation) + +--- + +### Phase 4: Secrets Management Automation + +**Challenge:** sops needs age key, but age key is generated on first boot + +**Current workflow:** +1. VM boots, generates age key at `/var/lib/sops-nix/key.txt` +2. User runs `prepare-host.sh` which prints public key +3. User manually adds public key to `.sops.yaml` +4. User commits, pushes +5. VM can now decrypt secrets + +**Proposed solution:** + +**Option A: Pre-generate age keys** +- [ ] Generate age key pair during `create-host-config.sh` +- [ ] Add public key to `.sops.yaml` immediately +- [ ] Store private key temporarily (secure location) +- [ ] Inject private key via cloud-init write_files or Terraform file provisioner +- [ ] VM uses pre-configured key from first boot + +**Option B: Post-deployment secret injection** +- [ ] VM boots with template, generates its own key +- [ ] Fetch public key via SSH after first boot +- [ ] Automatically add to `.sops.yaml` and commit +- [ ] Trigger rebuild on VM to pick up secrets access + +**Option C: Separate secrets from initial deployment** +- [ ] Initial deployment works without secrets +- [ ] After VM is running, user manually adds age key +- [ ] Subsequent auto-upgrades pick up secrets + +**Decision needed:** Option A is most automated, but requires secure key handling + +--- + +### Phase 5: DNS Automation + +**Goal:** Automatically generate DNS entries from host configurations + +**Approach:** Leverage Nix to generate zone file entries from flake host configurations + +Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs. + +**Tasks:** +- [ ] Add optional CNAME field to host configurations + - [ ] Add `networking.cnames = [ "alias1" "alias2" ]` or similar option + - [ ] Document in host configuration template +- [ ] Create Nix function to extract DNS records from all hosts + - [ ] Parse each host's `networking.hostName` and IP configuration + - [ ] Collect any defined CNAMEs + - [ ] Generate zone file fragment with A and CNAME records +- [ ] Integrate auto-generated records into zone files + - [ ] Keep manual entries separate (for non-flake hosts/services) + - [ ] Include generated fragment in main zone file + - [ ] Add comments showing which records are auto-generated +- [ ] Update zone file serial number automatically +- [ ] Test zone file validity after generation +- [ ] Either: + - [ ] Automatically trigger DNS server reload (Ansible) + - [ ] Or document manual step: merge to master, run upgrade on ns1/ns2 + +**Deliverable:** DNS A records and CNAMEs automatically generated from host configs + +--- + +### Phase 6: Integration Script + +**Goal:** Single command to create and deploy a new host + +**Tasks:** +- [ ] Create `scripts/create-host.sh` master script that orchestrates: + 1. Prompts for: hostname, IP (or DHCP), CPU, memory, disk + 2. Validates inputs (IP not in use, hostname unique, etc.) + 3. Calls host config generator (Phase 2) + 4. Generates OpenTofu config (Phase 2) + 5. Handles secrets (Phase 4) + 6. Updates DNS (Phase 5) + 7. Commits all changes to git + 8. Runs `tofu apply` to deploy VM + 9. Waits for bootstrap to complete (Phase 3) + 10. Prints success message with IP and SSH command +- [ ] Add `--dry-run` flag to preview changes +- [ ] Add `--interactive` mode vs `--batch` mode +- [ ] Error handling and rollback on failures + +**Deliverable:** `./scripts/create-host.sh --hostname myhost --ip 10.69.13.50` creates a fully working host + +--- + +### Phase 7: Testing & Documentation + +**Tasks:** +- [ ] Test full pipeline end-to-end +- [ ] Create test host and verify all steps +- [ ] Document the new workflow in CLAUDE.md +- [ ] Add troubleshooting section +- [ ] Create examples for common scenarios (DHCP host, static IP host, etc.) + +--- + +## Open Questions + +1. **Bootstrap method:** Cloud-init runcmd vs Terraform provisioner vs Ansible? +2. **Secrets handling:** Pre-generate keys vs post-deployment injection? +3. **DNS automation:** Auto-commit or manual merge? +4. **Git workflow:** Auto-push changes or leave for user review? +5. **Template selection:** Single template2 or multiple templates for different host types? +6. **Networking:** Always DHCP initially, or support static IP from start? +7. **Error recovery:** What happens if bootstrap fails? Manual intervention or retry? + +## Implementation Order + +Recommended sequence: +1. Phase 1: Parameterize OpenTofu (foundation for testing) +2. Phase 3: Bootstrap mechanism (core automation) +3. Phase 2: Config generator (automate the boilerplate) +4. Phase 4: Secrets (solves biggest chicken-and-egg) +5. Phase 5: DNS (nice-to-have automation) +6. Phase 6: Integration script (ties it all together) +7. Phase 7: Testing & docs + +## Success Criteria + +When complete, creating a new host should: +- Take < 5 minutes of human time +- Require minimal user input (hostname, IP, basic specs) +- Result in a fully configured, secret-enabled, DNS-registered host +- Be reproducible and documented +- Handle common errors gracefully + +--- + +## Notes + +- Keep incremental commits at each phase +- Test each phase independently before moving to next +- Maintain backward compatibility with manual workflow +- Document any manual steps that can't be automated -- 2.49.1