opentofu-experiments #4

Merged
torjus merged 3 commits from opentofu-experiments into master 2026-01-31 22:07:23 +00:00
15 changed files with 751 additions and 0 deletions

10
.gitignore vendored
View File

@@ -1,2 +1,12 @@
.direnv/
result
# Terraform/OpenTofu
terraform/.terraform/
terraform/.terraform.lock.hcl
terraform/*.tfstate
terraform/*.tfstate.*
terraform/terraform.tfvars
terraform/*.auto.tfvars
terraform/crash.log
terraform/crash.*.log

View File

@@ -44,6 +44,15 @@ nix develop
Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
### Git Commit Messages
Commit messages should follow the format: `topic: short description`
Examples:
- `flake: add opentofu to devshell`
- `template2: add proxmox image configuration`
- `terraform: add VM deployment configuration`
## Architecture
### Directory Structure
@@ -143,6 +152,57 @@ Configured in `/system/autoupgrade.nix`:
- Auto-reboot after successful upgrade
- Systemd service: `nixos-upgrade.service`
### Proxmox VM Provisioning with OpenTofu
The repository includes automated workflows for building Proxmox VM templates and deploying VMs using OpenTofu (Terraform).
#### Building and Deploying Templates
Template VMs are built from `hosts/template2` and deployed to Proxmox using Ansible:
```bash
# Build NixOS image and deploy to Proxmox as template
nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml
```
This playbook:
1. Builds the Proxmox image using `nixos-rebuild build-image --image-variant proxmox`
2. Uploads the `.vma.zst` image to Proxmox at `/var/lib/vz/dump`
3. Restores it as VM ID 9000
4. Converts it to a template
Template configuration (`hosts/template2`):
- Minimal base system with essential packages (age, vim, wget, git)
- Cloud-init configured for NoCloud datasource (no EC2 metadata timeout)
- DHCP networking on ens18
- SSH key-based root login
- `prepare-host.sh` script for cleaning machine-id, SSH keys, and regenerating age keys
#### Deploying VMs with OpenTofu
VMs are deployed from templates using OpenTofu in the `/terraform` directory:
```bash
cd terraform
tofu init # First time only
tofu apply # Deploy VMs
```
Configuration files:
- `main.tf` - Proxmox provider configuration
- `variables.tf` - Provider variables (API credentials)
- `vm.tf` - VM resource definitions
- `terraform.tfvars` - Actual credentials (gitignored)
Example VM deployment includes:
- Clone from template VM
- Cloud-init configuration (SSH keys, network, DNS)
- Custom CPU/memory/disk sizing
- VLAN tagging
- QEMU guest agent
OpenTofu outputs the VM's IP address after deployment for easy SSH access.
### Adding a New Host
1. Create `/hosts/<hostname>/` directory

231
TODO.md Normal file
View File

@@ -0,0 +1,231 @@
# TODO: Automated Host Deployment Pipeline
## Vision
Automate the entire process of creating, configuring, and deploying new NixOS hosts on Proxmox from a single command or script.
**Desired workflow:**
```bash
./scripts/create-host.sh --hostname myhost --ip 10.69.13.50
# Script creates config, deploys VM, bootstraps NixOS, and you're ready to go
```
**Current manual workflow (from CLAUDE.md):**
1. Create `/hosts/<hostname>/` directory structure
2. Add host to `flake.nix`
3. Add DNS entries
4. Clone template VM manually
5. Run `prepare-host.sh` on new VM
6. Add generated age key to `.sops.yaml`
7. Configure networking
8. Commit and push
9. Run `nixos-rebuild boot --flake URL#<hostname>` on host
## The Plan
### Phase 1: Parameterized OpenTofu Deployments ✓ (Partially Complete)
**Status:** Template building works, single VM deployment works, need to parameterize
**Tasks:**
- [ ] Create module/template structure in terraform for repeatable VM deployments
- [ ] Parameterize VM configuration (hostname, CPU, memory, disk, IP)
- [ ] Support both DHCP and static IP configuration via cloud-init
- [ ] Test deploying multiple VMs from same template
**Deliverable:** Can deploy a VM with custom parameters via OpenTofu
---
### Phase 2: Host Configuration Generator
**Goal:** Automate creation of host configuration files
Doesn't have to be a plain shell script, we could also use something like python, would probably make templating easier.
**Tasks:**
- [ ] Create script `scripts/create-host-config.sh`
- [ ] Takes parameters: hostname, IP, CPU cores, memory, disk size
- [ ] Generates `/hosts/<hostname>/` directory structure from template
- [ ] Creates `configuration.nix` with proper hostname and networking
- [ ] Generates `default.nix` with standard imports
- [ ] Copies/links `hardware-configuration.nix` from template
- [ ] Add host entry to `flake.nix` programmatically
- [ ] Parse flake.nix
- [ ] Insert new nixosConfiguration entry
- [ ] Maintain formatting
- [ ] Generate corresponding OpenTofu configuration
- [ ] Create `terraform/hosts/<hostname>.tf` with VM definition
- [ ] Use parameters from script input
**Deliverable:** Script generates all config files for a new host
---
### Phase 3: Bootstrap Mechanism
**Goal:** Get freshly deployed VM to apply its specific host configuration
**Challenge:** Chicken-and-egg problem - VM needs to know its hostname and pull the right config
**Option A: Cloud-init bootstrap script**
- [ ] Add cloud-init `runcmd` to template2 that:
- [ ] Reads hostname from cloud-init metadata
- [ ] Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git#${hostname}`
- [ ] Reboots into the new configuration
- [ ] Test cloud-init script execution on fresh VM
- [ ] Handle failure cases (flake doesn't exist, network issues)
**Option B: Terraform provisioner**
- [ ] Use OpenTofu's `remote-exec` provisioner
- [ ] SSH into new VM after creation
- [ ] Run `nixos-rebuild boot --flake <url>#<hostname>`
- [ ] Trigger reboot via SSH
**Option C: Two-stage deployment**
- [ ] Deploy VM with template2 (minimal config)
- [ ] Run Ansible playbook to bootstrap specific config
- [ ] Similar to existing `run-upgrade.yml` pattern
**Decision needed:** Which approach fits best? (Recommend Option A for automation)
---
### Phase 4: Secrets Management Automation
**Challenge:** sops needs age key, but age key is generated on first boot
**Current workflow:**
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
2. User runs `prepare-host.sh` which prints public key
3. User manually adds public key to `.sops.yaml`
4. User commits, pushes
5. VM can now decrypt secrets
**Proposed solution:**
**Option A: Pre-generate age keys**
- [ ] Generate age key pair during `create-host-config.sh`
- [ ] Add public key to `.sops.yaml` immediately
- [ ] Store private key temporarily (secure location)
- [ ] Inject private key via cloud-init write_files or Terraform file provisioner
- [ ] VM uses pre-configured key from first boot
**Option B: Post-deployment secret injection**
- [ ] VM boots with template, generates its own key
- [ ] Fetch public key via SSH after first boot
- [ ] Automatically add to `.sops.yaml` and commit
- [ ] Trigger rebuild on VM to pick up secrets access
**Option C: Separate secrets from initial deployment**
- [ ] Initial deployment works without secrets
- [ ] After VM is running, user manually adds age key
- [ ] Subsequent auto-upgrades pick up secrets
**Decision needed:** Option A is most automated, but requires secure key handling
---
### Phase 5: DNS Automation
**Goal:** Automatically generate DNS entries from host configurations
**Approach:** Leverage Nix to generate zone file entries from flake host configurations
Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs.
**Tasks:**
- [ ] Add optional CNAME field to host configurations
- [ ] Add `networking.cnames = [ "alias1" "alias2" ]` or similar option
- [ ] Document in host configuration template
- [ ] Create Nix function to extract DNS records from all hosts
- [ ] Parse each host's `networking.hostName` and IP configuration
- [ ] Collect any defined CNAMEs
- [ ] Generate zone file fragment with A and CNAME records
- [ ] Integrate auto-generated records into zone files
- [ ] Keep manual entries separate (for non-flake hosts/services)
- [ ] Include generated fragment in main zone file
- [ ] Add comments showing which records are auto-generated
- [ ] Update zone file serial number automatically
- [ ] Test zone file validity after generation
- [ ] Either:
- [ ] Automatically trigger DNS server reload (Ansible)
- [ ] Or document manual step: merge to master, run upgrade on ns1/ns2
**Deliverable:** DNS A records and CNAMEs automatically generated from host configs
---
### Phase 6: Integration Script
**Goal:** Single command to create and deploy a new host
**Tasks:**
- [ ] Create `scripts/create-host.sh` master script that orchestrates:
1. Prompts for: hostname, IP (or DHCP), CPU, memory, disk
2. Validates inputs (IP not in use, hostname unique, etc.)
3. Calls host config generator (Phase 2)
4. Generates OpenTofu config (Phase 2)
5. Handles secrets (Phase 4)
6. Updates DNS (Phase 5)
7. Commits all changes to git
8. Runs `tofu apply` to deploy VM
9. Waits for bootstrap to complete (Phase 3)
10. Prints success message with IP and SSH command
- [ ] Add `--dry-run` flag to preview changes
- [ ] Add `--interactive` mode vs `--batch` mode
- [ ] Error handling and rollback on failures
**Deliverable:** `./scripts/create-host.sh --hostname myhost --ip 10.69.13.50` creates a fully working host
---
### Phase 7: Testing & Documentation
**Tasks:**
- [ ] Test full pipeline end-to-end
- [ ] Create test host and verify all steps
- [ ] Document the new workflow in CLAUDE.md
- [ ] Add troubleshooting section
- [ ] Create examples for common scenarios (DHCP host, static IP host, etc.)
---
## Open Questions
1. **Bootstrap method:** Cloud-init runcmd vs Terraform provisioner vs Ansible?
2. **Secrets handling:** Pre-generate keys vs post-deployment injection?
3. **DNS automation:** Auto-commit or manual merge?
4. **Git workflow:** Auto-push changes or leave for user review?
5. **Template selection:** Single template2 or multiple templates for different host types?
6. **Networking:** Always DHCP initially, or support static IP from start?
7. **Error recovery:** What happens if bootstrap fails? Manual intervention or retry?
## Implementation Order
Recommended sequence:
1. Phase 1: Parameterize OpenTofu (foundation for testing)
2. Phase 3: Bootstrap mechanism (core automation)
3. Phase 2: Config generator (automate the boilerplate)
4. Phase 4: Secrets (solves biggest chicken-and-egg)
5. Phase 5: DNS (nice-to-have automation)
6. Phase 6: Integration script (ties it all together)
7. Phase 7: Testing & docs
## Success Criteria
When complete, creating a new host should:
- Take < 5 minutes of human time
- Require minimal user input (hostname, IP, basic specs)
- Result in a fully configured, secret-enabled, DNS-registered host
- Be reproducible and documented
- Handle common errors gracefully
---
## Notes
- Keep incremental commits at each phase
- Test each phase independently before moving to next
- Maintain backward compatibility with manual workflow
- Document any manual steps that can't be automated

View File

@@ -172,6 +172,22 @@
sops-nix.nixosModules.sops
];
};
template2 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = [
(
{ config, pkgs, ... }:
{
nixpkgs.overlays = commonOverlays;
}
)
./hosts/template2
sops-nix.nixosModules.sops
];
};
http-proxy = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
@@ -325,6 +341,7 @@
default = pkgs.mkShell {
packages = with pkgs; [
ansible
opentofu
python3
];
};

View File

@@ -0,0 +1,75 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
./hardware-configuration.nix
../../system/sshd.nix
];
# Root user with no password but SSH key access for bootstrapping
users.users.root = {
hashedPassword = "";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
];
};
# Proxmox image-specific configuration
# Configure storage to use local-zfs instead of local-lvm
image.modules.proxmox = {
proxmox.qemuConf.virtio0 = lib.mkForce "local-zfs:vm-9999-disk-0";
proxmox.qemuConf.boot = lib.mkForce "order=virtio0";
proxmox.cloudInit.defaultStorage = lib.mkForce "local-zfs";
};
# Configure cloud-init to only use NoCloud datasource (no EC2 metadata service)
services.cloud-init.settings = {
datasource_list = [ "NoCloud" ];
datasource = {
NoCloud = {
fs_label = "cidata";
};
};
};
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "nixos-template2";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
networkConfig.DHCP = "ipv4";
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
age
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11";
}

View File

@@ -0,0 +1,9 @@
{ ... }:
{
imports = [
./hardware-configuration.nix
./configuration.nix
./scripts.nix
../../system/packages.nix
];
}

View File

@@ -0,0 +1,36 @@
{
config,
lib,
pkgs,
modulesPath,
...
}:
{
imports = [
(modulesPath + "/profiles/qemu-guest.nix")
];
boot.initrd.availableKernelModules = [
"ata_piix"
"uhci_hcd"
"virtio_pci"
"virtio_scsi"
"sd_mod"
"sr_mod"
];
boot.initrd.kernelModules = [ "dm-snapshot" ];
boot.kernelModules = [
"ptp_kvm"
"virtio_rng" # Provides entropy from host for fast SSH key generation
];
boot.extraModulePackages = [ ];
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
# (the default) this is the recommended approach. When using systemd-networkd it's
# still possible to use this option, but it's recommended to use it in conjunction
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
networking.useDHCP = lib.mkDefault true;
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
}

View File

@@ -0,0 +1,33 @@
{ pkgs, ... }:
let
prepare-host-script = pkgs.writeShellScriptBin "prepare-host.sh"
''
echo "Removing machine-id"
rm -f /etc/machine-id || true
echo "Removing SSH host keys"
rm -f /etc/ssh/ssh_host_* || true
echo "Restarting SSH"
systemctl restart sshd
echo "Removing temporary files"
rm -rf /tmp/* || true
echo "Removing logs"
journalctl --rotate || true
journalctl --vacuum-time=1s || true
echo "Removing cache"
rm -rf /var/cache/* || true
echo "Generate age key"
rm -rf /var/lib/sops-nix || true
mkdir -p /var/lib/sops-nix
${pkgs.age}/bin/age-keygen -o /var/lib/sops-nix/key.txt
'';
in
{
environment.systemPackages = [ prepare-host-script ];
users.motd = "Prepare host by running 'prepare-host.sh'.";
}

View File

@@ -0,0 +1,101 @@
---
- name: Build and deploy NixOS Proxmox template
hosts: localhost
gather_facts: false
vars:
template_name: "template2"
nixos_config: "template2"
proxmox_node: "pve1.home.2rjus.net" # Change to your Proxmox node name
proxmox_host: "pve1.home.2rjus.net" # Change to your Proxmox host
template_vmid: 9000 # Template VM ID
storage: "local-zfs"
tasks:
- name: Build NixOS image
ansible.builtin.command:
cmd: "nixos-rebuild build-image --image-variant proxmox --flake .#template2"
chdir: "{{ playbook_dir }}/.."
register: build_result
changed_when: true
- name: Find built image file
ansible.builtin.find:
paths: "{{ playbook_dir}}/../result"
patterns: "*.vma.zst"
recurse: true
register: image_files
- name: Fail if no image found
ansible.builtin.fail:
msg: "No QCOW2 image found in build output"
when: image_files.matched == 0
- name: Set image path
ansible.builtin.set_fact:
image_path: "{{ image_files.files[0].path }}"
- name: Extract image filename
ansible.builtin.set_fact:
image_filename: "{{ image_path | basename }}"
- name: Display image info
ansible.builtin.debug:
msg: "Built image: {{ image_path }} ({{ image_filename }})"
- name: Deploy template to Proxmox
hosts: proxmox
gather_facts: false
vars:
template_name: "template2"
template_vmid: 9000
storage: "local-zfs"
tasks:
- name: Get image path and filename from localhost
ansible.builtin.set_fact:
image_path: "{{ hostvars['localhost']['image_path'] }}"
image_filename: "{{ hostvars['localhost']['image_filename'] }}"
- name: Set destination path
ansible.builtin.set_fact:
image_dest: "/var/lib/vz/dump/{{ image_filename }}"
- name: Copy image to Proxmox
ansible.builtin.copy:
src: "{{ image_path }}"
dest: "{{ image_dest }}"
mode: '0644'
- name: Check if template VM already exists
ansible.builtin.command:
cmd: "qm status {{ template_vmid }}"
register: vm_status
failed_when: false
changed_when: false
- name: Destroy existing template VM if it exists
ansible.builtin.command:
cmd: "qm destroy {{ template_vmid }} --purge"
when: vm_status.rc == 0
changed_when: true
- name: Import image
ansible.builtin.command:
cmd: "qmrestore {{ image_dest }} {{ template_vmid }}"
changed_when: true
- name: Convert VM to template
ansible.builtin.command:
cmd: "qm template {{ template_vmid }}"
changed_when: true
- name: Clean up uploaded image
ansible.builtin.file:
path: "{{ image_dest }}"
state: absent
- name: Display success message
ansible.builtin.debug:
msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}"

5
playbooks/inventory.ini Normal file
View File

@@ -0,0 +1,5 @@
[proxmox]
pve1.home.2rjus.net
[proxmox:vars]
ansible_user=root

37
terraform/README.md Normal file
View File

@@ -0,0 +1,37 @@
# OpenTofu Configuration for Proxmox
This directory contains OpenTofu configuration for managing Proxmox VMs.
## Setup
1. **Create a Proxmox API token:**
- Log into Proxmox web UI
- Go to Datacenter → Permissions → API Tokens
- Click Add
- User: `root@pam`, Token ID: `terraform`
- Uncheck "Privilege Separation"
- Save the token secret (shown only once)
2. **Configure credentials:**
```bash
cd terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your Proxmox details
```
3. **Initialize OpenTofu:**
```bash
tofu init
```
4. **Test connection:**
```bash
tofu plan
```
## Files
- `main.tf` - Provider configuration and test data source
- `variables.tf` - Variable definitions
- `terraform.tfvars.example` - Example credentials file
- `terraform.tfvars` - Your actual credentials (gitignored)

18
terraform/main.tf Normal file
View File

@@ -0,0 +1,18 @@
terraform {
required_version = ">= 1.0"
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "3.0.2-rc07"
}
}
}
provider "proxmox" {
pm_api_url = var.proxmox_api_url
pm_api_token_id = var.proxmox_api_token_id
pm_api_token_secret = var.proxmox_api_token_secret
pm_tls_insecure = var.proxmox_tls_insecure
}
# Provider configured - ready to add resources

View File

@@ -0,0 +1,7 @@
# Copy this file to terraform.tfvars and fill in your values
# terraform.tfvars is gitignored to keep credentials safe
proxmox_api_url = "https://your-proxmox-host.home.2rjus.net:8006/api2/json"
proxmox_api_token_id = "root@pam!terraform"
proxmox_api_token_secret = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
proxmox_tls_insecure = true

22
terraform/variables.tf Normal file
View File

@@ -0,0 +1,22 @@
variable "proxmox_api_url" {
description = "Proxmox API URL (e.g., https://proxmox.home.2rjus.net:8006/api2/json)"
type = string
}
variable "proxmox_api_token_id" {
description = "Proxmox API Token ID (e.g., root@pam!terraform)"
type = string
sensitive = true
}
variable "proxmox_api_token_secret" {
description = "Proxmox API Token Secret"
type = string
sensitive = true
}
variable "proxmox_tls_insecure" {
description = "Skip TLS verification (set to true for self-signed certs)"
type = bool
default = true
}

90
terraform/vm.tf Normal file
View File

@@ -0,0 +1,90 @@
# Example VM configuration - clone from template
# Before using this, you need to:
# 1. Upload the NixOS image to Proxmox
# 2. Restore it as a template VM (e.g., ID 9000)
# 3. Update the variables below
variable "target_node" {
description = "Proxmox node to deploy to"
type = string
default = "pve1"
}
variable "template_name" {
description = "Template VM name to clone from"
type = string
default = "nixos-25.11.20260128.fa83fd8"
}
variable "ssh_public_key" {
description = "SSH public key for root user"
type = string
default = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
}
# Example test VM
resource "proxmox_vm_qemu" "test_vm" {
name = "nixos-test-tofu"
target_node = var.target_node
# Clone from template
clone = var.template_name
# Full clone (not linked)
full_clone = true
# Boot configuration
boot = "order=virtio0"
scsihw = "virtio-scsi-single"
# VM settings
cpu {
cores = 2
}
memory = 2048
# Network
network {
id = 0
model = "virtio"
bridge = "vmbr0"
tag = 13
}
# Disk settings
disks {
virtio {
virtio0 {
disk {
size = "20G"
storage = "local-zfs"
}
}
}
}
# Start on boot
start_at_node_boot = true
# Agent
agent = 1
# Cloud-init configuration
ciuser = "root"
sshkeys = var.ssh_public_key
ipconfig0 = "ip=dhcp"
nameserver = "10.69.13.5 10.69.13.6"
searchdomain = "home.2rjus.net"
# Skip IPv6 since we don't use it
skip_ipv6 = true
rng {
source = "/dev/urandom"
period = 1000
}
}
output "test_vm_ip" {
value = proxmox_vm_qemu.test_vm.default_ipv4_address
}