25 Commits

Author SHA1 Message Date
4afb37d730 create-host: enable resolved in configuration.nix.j2
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m15s
2026-02-01 20:37:36 +01:00
a2c798bc30 vault: add minimal vault config
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2026-02-01 20:27:02 +01:00
6d64e53586 hosts: add vault01 host
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
2026-02-01 20:08:48 +01:00
e0ad445341 planning: update TODO.md
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2026-02-01 20:05:56 +01:00
d194c147d6 Merge pull request 'pipeline-testing-improvements' (#9) from pipeline-testing-improvements into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m13s
Reviewed-on: #9
2026-02-01 16:45:04 +00:00
9908286062 scripts: fix create-host flake.nix insertion point
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m12s
Run nix flake check / flake-check (push) Failing after 8m24s
Fix bug where new hosts were added outside of nixosConfigurations block
instead of inside it.

Issues fixed:
1. Pattern was looking for "packages =" but actual text is "packages = forAllSystems"
2. Replacement was putting new entry AFTER closing brace instead of BEFORE
3. testvm01 was at top-level flake output instead of in nixosConfigurations

Changes:
- Update pattern to match "packages = forAllSystems"
- Put new entry BEFORE the closing brace of nixosConfigurations
- Move testvm01 to correct location inside nixosConfigurations block

Result: nix flake show now correctly shows testvm01 as NixOS configuration

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
cec496dda7 terraform: use local storage for cloud-init disks
Fix error "500 can't upload to storage type 'zfspool'" by using "local"
storage pool for cloud-init disks instead of the VM's storage pool.

Cloud-init disks require storage that supports ISO/snippet content types,
which zfspool does not. The "local" storage pool (directory-based) supports
this content type.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
fca50562c3 terraform: fix cloud-init conditional type inconsistency
Fix OpenTofu error where static IP and DHCP branches had different object
structures in the subnets array. Move conditional to network_config level
so both branches return complete, consistent yamlencode() results.

Error was: "The true and false result expressions must have consistent types"

Solution: Make network_config itself conditional rather than the subnets
array, ensuring both branches return the same type (string from yamlencode).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
1f1829dc2f docs: update terraform README for cloud-init refactoring
Remove mention of .generated/ directory and clarify that cloud-init.tf
manages all cloud-init disks, not just branch-specific ones.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
21a32e0521 terraform: refactor cloud-init to use proxmox_cloud_init_disk resource
Replace SSH upload approach with native proxmox_cloud_init_disk resource
for cleaner, more maintainable cloud-init management.

Changes:
- Use proxmox_cloud_init_disk for all VMs (not just branch-specific ones)
- Include SSH keys, network config, and metadata in cloud-init disk
- Conditionally include NIXOS_FLAKE_BRANCH for VMs with flake_branch set
- Replace ide2 cloudinit disk with cdrom reference to cloud-init disk
- Remove built-in cloud-init parameters (ciuser, sshkeys, etc.)
- Remove cicustom parameter (no longer needed)
- Remove proxmox_host variable (no SSH uploads required)
- Remove .gitignore entry for .generated/ directory

Benefits:
- No SSH access to Proxmox required
- All cloud-init config managed in Terraform
- Consistent approach for all VMs
- Cleaner state management

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
7fe0aa0f54 test: add testvm01 for pipeline testing 2026-02-01 17:41:04 +01:00
83de9a3ffb pipeline: add testing improvements for branch-based workflows
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Implement dual improvements to enable efficient testing of pipeline changes
without polluting master branch:

1. Add --force flag to create-host script
   - Skip hostname/IP uniqueness validation
   - Overwrite existing host configurations
   - Update entries in flake.nix and terraform/vms.tf (no duplicates)
   - Useful for iterating on configurations during testing

2. Add branch support to bootstrap mechanism
   - Bootstrap service reads NIXOS_FLAKE_BRANCH environment variable
   - Defaults to master if not set
   - Uses branch in git URL via ?ref= parameter
   - Service loads environment from /etc/environment

3. Add cloud-init disk support for branch configuration
   - VMs can specify flake_branch field in terraform/vms.tf
   - Automatically generates cloud-init snippet setting NIXOS_FLAKE_BRANCH
   - Uploads snippet to Proxmox via SSH
   - Production VMs omit flake_branch and use master

4. Update documentation
   - Document --force flag usage in create-host README
   - Add branch testing examples in terraform README
   - Update TODO.md with testing workflow
   - Add .generated/ to gitignore

Testing workflow: Create feature branch, set flake_branch in VM definition,
deploy with terraform, iterate with --force flag, clean up before merging.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 16:34:28 +01:00
30addc5116 Merge pull request 'template2: add filesystem definitions to support normal builds' (#8) from template2-fix-normal-build into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m24s
Reviewed-on: #8
2026-02-01 10:19:30 +00:00
2aeed8f231 template2: add filesystem definitions to support normal builds
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m17s
Run nix flake check / flake-check (push) Failing after 16m59s
Add filesystem configuration matching Proxmox image builder output
to allow template2 to build with both `nixos-rebuild build` and
`nixos-rebuild build-image --image-variant proxmox`.

Filesystem specs discovered from running VM:
- ext4 filesystem with label "nixos"
- x-systemd.growfs option for automatic partition growth
- No swap partition

Using lib.mkDefault ensures these definitions work for normal builds
while allowing the Proxmox image builder to override when needed.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 11:17:48 +01:00
c3180c1b2c Merge pull request 'bootstrap: implement automated VM bootstrap mechanism for Phase 3' (#7) from phase3-bootstrap-mechanism into master
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m21s
Reviewed-on: #7
2026-02-01 09:40:09 +00:00
6f7aee3444 bootstrap: implement automated VM bootstrap mechanism for Phase 3
Some checks failed
Run nix flake check / flake-check (pull_request) Failing after 1m20s
Run nix flake check / flake-check (push) Failing after 1m54s
Add systemd service that automatically bootstraps freshly deployed VMs
with their host-specific NixOS configuration from the flake repository.

Changes:
- hosts/template2/bootstrap.nix: New systemd oneshot service that:
  - Runs after cloud-init completes (ensures hostname is set)
  - Reads hostname from hostnamectl (set by cloud-init from Terraform)
  - Checks network connectivity via HTTPS (curl)
  - Runs nixos-rebuild boot with flake URL
  - Reboots on success, fails gracefully with clear errors on failure

- hosts/template2/configuration.nix: Configure cloud-init datasource
  - Changed from NoCloud to ConfigDrive (used by Proxmox)
  - Allows cloud-init to receive config from Proxmox

- hosts/template2/default.nix: Import bootstrap.nix module

- terraform/vms.tf: Add cloud-init disk to VMs
  - Configure disks.ide.ide2.cloudinit block
  - Removed invalid cloudinit_cdrom_storage parameter
  - Enables Proxmox to inject cloud-init configuration

- TODO.md: Mark Phase 3 as completed

This eliminates the manual nixos-rebuild step from the deployment workflow.
VMs now automatically pull and apply their configuration on first boot.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 10:38:35 +01:00
af17387c7d Merge pull request 'scripts: add create-host tool for automated host configuration generation' (#6) from phase2-host-config-generator into master
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m50s
Reviewed-on: #6
2026-02-01 01:48:19 +00:00
408554b477 scripts: add create-host tool for automated host configuration generation
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m50s
Run nix flake check / flake-check (pull_request) Failing after 1m49s
Implements Phase 2 of the automated deployment pipeline.

This commit adds a Python CLI tool that automates the creation of NixOS host
configurations, eliminating manual boilerplate and reducing errors.

Features:
- Python CLI using typer framework with rich terminal UI
- Comprehensive validation (hostname format/uniqueness, IP subnet/uniqueness)
- Jinja2 templates for NixOS configurations
- Automatic updates to flake.nix and terraform/vms.tf
- Support for both static IP and DHCP configurations
- Dry-run mode for safe previews
- Packaged as Nix derivation and added to devShell

Usage:
  create-host --hostname myhost --ip 10.69.13.50/24

The tool generates:
- hosts/<hostname>/default.nix
- hosts/<hostname>/configuration.nix
- Updates flake.nix with new nixosConfigurations entry
- Updates terraform/vms.tf with new VM definition

All generated configurations include full system imports (monitoring, SOPS,
autoupgrade, etc.) and are validated with nix flake check and tofu validate.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 02:27:57 +01:00
b20ad9c275 docs: mark Phase 1 of automated deployment pipeline as completed
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m50s
Periodic flake update / flake-update (push) Successful in 1m6s
Phase 1 is now fully implemented with parameterized multi-VM deployments
via OpenTofu. Updated status, tasks, and added implementation details.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 23:33:14 +01:00
076e22c338 Merge pull request 'terraform: add parameterized multi-VM deployment system' (#5) from terraform-parameterized-deployments into master
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m22s
Reviewed-on: #5
2026-01-31 22:31:14 +00:00
7aa5137039 terraform: add parameterized multi-VM deployment system
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m52s
Run nix flake check / flake-check (pull_request) Failing after 1m24s
Implements Phase 1 of the OpenTofu deployment plan:
- Replace single-VM configuration with locals-based for_each pattern
- Support multiple VMs in single deployment
- Automatic DHCP vs static IP detection
- Configurable defaults with per-VM overrides
- Dynamic outputs for VM IPs and specifications

New files:
- outputs.tf: Dynamic outputs for deployed VMs
- vms.tf: VM definitions using locals.vms map

Updated files:
- variables.tf: Added default variables for VM configuration
- README.md: Comprehensive documentation and examples

Removed files:
- vm.tf: Replaced by new vms.tf (archived as vm.tf.old, then removed)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 23:30:00 +01:00
b3132fbe70 Merge pull request 'opentofu-experiments' (#4) from opentofu-experiments into master
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m56s
Reviewed-on: #4
2026-01-31 22:07:23 +00:00
ce6d2b1d33 docs: add TODO.md for automated deployment pipeline
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m56s
Run nix flake check / flake-check (pull_request) Failing after 1m30s
Document multi-phase plan for automating NixOS host creation, deployment, and configuration on Proxmox including OpenTofu parameterization, config generation, bootstrap mechanism, secrets management, and Nix-based DNS automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 22:22:19 +01:00
3a464bc323 proxmox: add VM automation with OpenTofu and Ansible
Add automated workflow for building and deploying NixOS VMs on Proxmox including template2 host configuration, Ansible playbook for image building/deployment, and OpenTofu configuration for VM provisioning with cloud-init.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 21:54:08 +01:00
7f72a72043 flake: add opentofu to devshell
Some checks failed
Run nix flake check / flake-check (push) Failing after 17m5s
2026-01-31 16:12:49 +01:00
36 changed files with 2749 additions and 1 deletions

10
.gitignore vendored
View File

@@ -1,2 +1,12 @@
.direnv/
result
# Terraform/OpenTofu
terraform/.terraform/
terraform/.terraform.lock.hcl
terraform/*.tfstate
terraform/*.tfstate.*
terraform/terraform.tfvars
terraform/*.auto.tfvars
terraform/crash.log
terraform/crash.*.log

View File

@@ -44,6 +44,15 @@ nix develop
Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
### Git Commit Messages
Commit messages should follow the format: `topic: short description`
Examples:
- `flake: add opentofu to devshell`
- `template2: add proxmox image configuration`
- `terraform: add VM deployment configuration`
## Architecture
### Directory Structure
@@ -143,6 +152,57 @@ Configured in `/system/autoupgrade.nix`:
- Auto-reboot after successful upgrade
- Systemd service: `nixos-upgrade.service`
### Proxmox VM Provisioning with OpenTofu
The repository includes automated workflows for building Proxmox VM templates and deploying VMs using OpenTofu (Terraform).
#### Building and Deploying Templates
Template VMs are built from `hosts/template2` and deployed to Proxmox using Ansible:
```bash
# Build NixOS image and deploy to Proxmox as template
nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml
```
This playbook:
1. Builds the Proxmox image using `nixos-rebuild build-image --image-variant proxmox`
2. Uploads the `.vma.zst` image to Proxmox at `/var/lib/vz/dump`
3. Restores it as VM ID 9000
4. Converts it to a template
Template configuration (`hosts/template2`):
- Minimal base system with essential packages (age, vim, wget, git)
- Cloud-init configured for NoCloud datasource (no EC2 metadata timeout)
- DHCP networking on ens18
- SSH key-based root login
- `prepare-host.sh` script for cleaning machine-id, SSH keys, and regenerating age keys
#### Deploying VMs with OpenTofu
VMs are deployed from templates using OpenTofu in the `/terraform` directory:
```bash
cd terraform
tofu init # First time only
tofu apply # Deploy VMs
```
Configuration files:
- `main.tf` - Proxmox provider configuration
- `variables.tf` - Provider variables (API credentials)
- `vm.tf` - VM resource definitions
- `terraform.tfvars` - Actual credentials (gitignored)
Example VM deployment includes:
- Clone from template VM
- Cloud-init configuration (SSH keys, network, DNS)
- Custom CPU/memory/disk sizing
- VLAN tagging
- QEMU guest agent
OpenTofu outputs the VM's IP address after deployment for easy SSH access.
### Adding a New Host
1. Create `/hosts/<hostname>/` directory

549
TODO.md Normal file
View File

@@ -0,0 +1,549 @@
# TODO: Automated Host Deployment Pipeline
## Vision
Automate the entire process of creating, configuring, and deploying new NixOS hosts on Proxmox from a single command or script.
**Desired workflow:**
```bash
./scripts/create-host.sh --hostname myhost --ip 10.69.13.50
# Script creates config, deploys VM, bootstraps NixOS, and you're ready to go
```
**Current manual workflow (from CLAUDE.md):**
1. Create `/hosts/<hostname>/` directory structure
2. Add host to `flake.nix`
3. Add DNS entries
4. Clone template VM manually
5. Run `prepare-host.sh` on new VM
6. Add generated age key to `.sops.yaml`
7. Configure networking
8. Commit and push
9. Run `nixos-rebuild boot --flake URL#<hostname>` on host
## The Plan
### Phase 1: Parameterized OpenTofu Deployments ✅ COMPLETED
**Status:** Fully implemented and tested
**Implementation:**
- Locals-based structure using `for_each` pattern for multiple VM deployments
- All VM parameters configurable with smart defaults (CPU, memory, disk, IP, storage, etc.)
- Automatic DHCP vs static IP detection based on `ip` field presence
- Dynamic outputs showing deployed VM IPs and specifications
- Successfully tested deploying multiple VMs simultaneously
**Tasks:**
- [x] Create module/template structure in terraform for repeatable VM deployments
- [x] Parameterize VM configuration (hostname, CPU, memory, disk, IP)
- [x] Support both DHCP and static IP configuration via cloud-init
- [x] Test deploying multiple VMs from same template
**Deliverable:** ✅ Can deploy multiple VMs with custom parameters via OpenTofu in a single `tofu apply`
**Files:**
- `terraform/vms.tf` - VM definitions using locals map
- `terraform/outputs.tf` - Dynamic outputs for all VMs
- `terraform/variables.tf` - Configurable defaults
- `terraform/README.md` - Complete documentation
---
### Phase 2: Host Configuration Generator ✅ COMPLETED
**Status:** ✅ Fully implemented and tested
**Completed:** 2025-02-01
**Enhanced:** 2025-02-01 (added --force flag)
**Goal:** Automate creation of host configuration files
**Implementation:**
- Python CLI tool packaged as Nix derivation
- Available as `create-host` command in devShell
- Rich terminal UI with configuration previews
- Comprehensive validation (hostname format/uniqueness, IP subnet/uniqueness)
- Jinja2 templates for NixOS configurations
- Automatic updates to flake.nix and terraform/vms.tf
- `--force` flag for regenerating existing configurations (useful for testing)
**Tasks:**
- [x] Create Python CLI with typer framework
- [x] Takes parameters: hostname, IP, CPU cores, memory, disk size
- [x] Generates `/hosts/<hostname>/` directory structure
- [x] Creates `configuration.nix` with proper hostname and networking
- [x] Generates `default.nix` with standard imports
- [x] References shared `hardware-configuration.nix` from template
- [x] Add host entry to `flake.nix` programmatically
- [x] Text-based manipulation (regex insertion)
- [x] Inserts new nixosConfiguration entry
- [x] Maintains proper formatting
- [x] Generate corresponding OpenTofu configuration
- [x] Adds VM definition to `terraform/vms.tf`
- [x] Uses parameters from CLI input
- [x] Supports both static IP and DHCP modes
- [x] Package as Nix derivation with templates
- [x] Add to flake packages and devShell
- [x] Implement dry-run mode
- [x] Write comprehensive README
**Usage:**
```bash
# In nix develop shell
create-host \
--hostname test01 \
--ip 10.69.13.50/24 \ # optional, omit for DHCP
--cpu 4 \ # optional, default 2
--memory 4096 \ # optional, default 2048
--disk 50G \ # optional, default 20G
--dry-run # optional preview mode
```
**Files:**
- `scripts/create-host/` - Complete Python package with Nix derivation
- `scripts/create-host/README.md` - Full documentation and examples
**Deliverable:** ✅ Tool generates all config files for a new host, validated with Nix and Terraform
---
### Phase 3: Bootstrap Mechanism ✅ COMPLETED
**Status:** ✅ Fully implemented and tested
**Completed:** 2025-02-01
**Enhanced:** 2025-02-01 (added branch support for testing)
**Goal:** Get freshly deployed VM to apply its specific host configuration
**Implementation:** Systemd oneshot service that runs on first boot after cloud-init
**Approach taken:** Systemd service (variant of Option A)
- Systemd service `nixos-bootstrap.service` runs on first boot
- Depends on `cloud-config.service` to ensure hostname is set
- Reads hostname from `hostnamectl` (set by cloud-init via Terraform)
- Supports custom git branch via `NIXOS_FLAKE_BRANCH` environment variable
- Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#${hostname}`
- Reboots into new configuration on success
- Fails gracefully without reboot on errors (network issues, missing config)
- Service self-destructs after successful bootstrap (not in new config)
**Tasks:**
- [x] Create bootstrap service module in template2
- [x] systemd oneshot service with proper dependencies
- [x] Reads hostname from hostnamectl (cloud-init sets it)
- [x] Checks network connectivity via HTTPS (curl)
- [x] Runs nixos-rebuild boot with flake URL
- [x] Reboots on success, fails gracefully on error
- [x] Configure cloud-init datasource
- [x] Use ConfigDrive datasource (Proxmox provider)
- [x] Add cloud-init disk to Terraform VMs (disks.ide.ide2.cloudinit)
- [x] Hostname passed via cloud-init user-data from Terraform
- [x] Test bootstrap service execution on fresh VM
- [x] Handle failure cases (flake doesn't exist, network issues)
- [x] Clear error messages in journald
- [x] No reboot on failure
- [x] System remains accessible for debugging
**Files:**
- `hosts/template2/bootstrap.nix` - Bootstrap service definition
- `hosts/template2/configuration.nix` - Cloud-init ConfigDrive datasource
- `terraform/vms.tf` - Cloud-init disk configuration
**Deliverable:** ✅ VMs automatically bootstrap and reboot into host-specific configuration on first boot
---
### Phase 4: Secrets Management with HashiCorp Vault
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
**Current workflow:**
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
2. User runs `prepare-host.sh` which prints public key
3. User manually adds public key to `.sops.yaml`
4. User commits, pushes
5. VM can now decrypt secrets
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
**Benefits:**
- Industry-standard secrets management (Vault experience transferable to work)
- Eliminates manual age key distribution step
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
- Automatic secret rotation capabilities
- Audit logging for all secret access
- AppRole authentication enables automated bootstrap
**Architecture:**
```
vault.home.2rjus.net
├─ KV Secrets Engine (replaces sops-nix)
├─ PKI Engine (replaces step-ca for TLS)
├─ SSH CA Engine (replaces step-ca SSH CA)
└─ AppRole Auth (per-host authentication)
New hosts authenticate on first boot
Fetch secrets via Vault API
No manual key distribution needed
```
---
#### Phase 4a: Vault Server Setup
**Goal:** Deploy and configure Vault server with auto-unseal
**Tasks:**
- [ ] Create `hosts/vault01/` configuration
- [ ] Basic NixOS configuration (hostname, networking, etc.)
- [ ] Vault service configuration
- [ ] Firewall rules (8200 for API, 8201 for cluster)
- [ ] Add to flake.nix and terraform
- [ ] Implement auto-unseal mechanism
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
- [ ] Use tpm2-tools to seal/unseal Vault keys
- [ ] Systemd service to unseal on boot
- [ ] **Fallback:** Shamir secret sharing with systemd automation
- [ ] Generate 3 keys, threshold 2
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
- [ ] Systemd service auto-unseals using 2 keys
- [ ] Initial Vault setup
- [ ] Initialize Vault
- [ ] Configure storage backend (integrated raft or file)
- [ ] Set up root token management
- [ ] Enable audit logging
- [ ] Deploy to infrastructure
- [ ] Add DNS entry for vault.home.2rjus.net
- [ ] Deploy VM via terraform
- [ ] Bootstrap and verify Vault is running
**Deliverable:** Running Vault server that auto-unseals on boot
---
#### Phase 4b: Vault-as-Code with OpenTofu
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
**Tasks:**
- [ ] Set up Vault Terraform provider
- [ ] Create `terraform/vault/` directory
- [ ] Configure Vault provider (address, auth)
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
- [ ] Enable and configure secrets engines
- [ ] Enable KV v2 secrets engine at `secret/`
- [ ] Define secret path structure (per-service, per-host)
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
- [ ] Define policies as code
- [ ] Create policies for different service tiers
- [ ] Principle of least privilege (hosts only read their secrets)
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
- [ ] Set up AppRole authentication
- [ ] Enable AppRole auth backend
- [ ] Create role per host type (monitoring, dns, database, etc.)
- [ ] Bind policies to roles
- [ ] Configure TTL and token policies
- [ ] Migrate existing secrets from sops-nix
- [ ] Create migration script/playbook
- [ ] Decrypt sops secrets and load into Vault KV
- [ ] Verify all secrets migrated successfully
- [ ] Keep sops as backup during transition
- [ ] Implement secrets-as-code patterns
- [ ] Secret values in gitignored terraform.tfvars
- [ ] Or use random_password for auto-generated secrets
- [ ] Secret structure/paths in version-controlled .tf files
**Example OpenTofu:**
```hcl
resource "vault_kv_secret_v2" "monitoring_grafana" {
mount = "secret"
name = "monitoring/grafana"
data_json = jsonencode({
admin_password = var.grafana_admin_password
smtp_password = var.smtp_password
})
}
resource "vault_policy" "monitoring" {
name = "monitoring-policy"
policy = <<EOT
path "secret/data/monitoring/*" {
capabilities = ["read"]
}
EOT
}
resource "vault_approle_auth_backend_role" "monitoring01" {
backend = "approle"
role_name = "monitoring01"
token_policies = ["monitoring-policy"]
}
```
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
---
#### Phase 4c: PKI Migration (Replace step-ca)
**Goal:** Consolidate PKI infrastructure into Vault
**Tasks:**
- [ ] Set up Vault PKI engines
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
- [ ] Sign intermediate with root CA
- [ ] Configure CRL and OCSP
- [ ] Enable ACME support
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
- [ ] Create PKI role for homelab domain
- [ ] Set certificate TTLs and allowed domains
- [ ] Configure SSH CA in Vault
- [ ] Enable SSH secrets engine (`ssh/` mount)
- [ ] Generate SSH signing keys
- [ ] Create roles for host and user certificates
- [ ] Configure TTLs and allowed principals
- [ ] Migrate hosts from step-ca to Vault
- [ ] Update system/acme.nix to use Vault ACME endpoint
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test certificate issuance on one host
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Migrate SSH CA trust
- [ ] Distribute Vault SSH CA public key to all hosts
- [ ] Update sshd_config to trust Vault CA
- [ ] Test SSH certificate authentication
- [ ] Decommission step-ca
- [ ] Verify all services migrated
- [ ] Stop step-ca service on ca host
- [ ] Archive step-ca configuration for backup
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
---
#### Phase 4d: Bootstrap Integration
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
**Tasks:**
- [ ] Update create-host tool
- [ ] Generate AppRole role_id + secret_id for new host
- [ ] Or create wrapped token for one-time bootstrap
- [ ] Add host-specific policy to Vault (via terraform)
- [ ] Store bootstrap credentials for cloud-init injection
- [ ] Update template2 for Vault authentication
- [ ] Create Vault authentication module
- [ ] Reads bootstrap credentials from cloud-init
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
- [ ] Stores role_id + secret_id locally for services to use
- [ ] Create NixOS Vault secrets module
- [ ] Replacement for sops.secrets
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
- [ ] Or runtime secret fetching for services
- [ ] Handle Vault token renewal
- [ ] Update bootstrap service
- [ ] After authenticating to Vault, fetch any bootstrap secrets
- [ ] Run nixos-rebuild with host configuration
- [ ] Services automatically fetch their secrets from Vault
- [ ] Update terraform cloud-init
- [ ] Inject Vault address and bootstrap credentials
- [ ] Pass via cloud-init user-data or write_files
- [ ] Credentials scoped to single use or short TTL
- [ ] Test complete flow
- [ ] Run create-host to generate new host config
- [ ] Deploy with terraform
- [ ] Verify host bootstraps and authenticates to Vault
- [ ] Verify services can fetch secrets
- [ ] Confirm no manual steps required
**Bootstrap flow:**
```
1. terraform apply (deploys VM with cloud-init)
2. Cloud-init sets hostname + Vault bootstrap credentials
3. nixos-bootstrap.service runs:
- Authenticates to Vault with bootstrap credentials
- Retrieves permanent AppRole credentials
- Stores locally for service use
- Runs nixos-rebuild
4. Host services fetch secrets from Vault as needed
5. Done - no manual intervention
```
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
---
### Phase 5: DNS Automation
**Goal:** Automatically generate DNS entries from host configurations
**Approach:** Leverage Nix to generate zone file entries from flake host configurations
Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs.
**Tasks:**
- [ ] Add optional CNAME field to host configurations
- [ ] Add `networking.cnames = [ "alias1" "alias2" ]` or similar option
- [ ] Document in host configuration template
- [ ] Create Nix function to extract DNS records from all hosts
- [ ] Parse each host's `networking.hostName` and IP configuration
- [ ] Collect any defined CNAMEs
- [ ] Generate zone file fragment with A and CNAME records
- [ ] Integrate auto-generated records into zone files
- [ ] Keep manual entries separate (for non-flake hosts/services)
- [ ] Include generated fragment in main zone file
- [ ] Add comments showing which records are auto-generated
- [ ] Update zone file serial number automatically
- [ ] Test zone file validity after generation
- [ ] Either:
- [ ] Automatically trigger DNS server reload (Ansible)
- [ ] Or document manual step: merge to master, run upgrade on ns1/ns2
**Deliverable:** DNS A records and CNAMEs automatically generated from host configs
---
### Phase 6: Integration Script
**Goal:** Single command to create and deploy a new host
**Tasks:**
- [ ] Create `scripts/create-host.sh` master script that orchestrates:
1. Prompts for: hostname, IP (or DHCP), CPU, memory, disk
2. Validates inputs (IP not in use, hostname unique, etc.)
3. Calls host config generator (Phase 2)
4. Generates OpenTofu config (Phase 2)
5. Handles secrets (Phase 4)
6. Updates DNS (Phase 5)
7. Commits all changes to git
8. Runs `tofu apply` to deploy VM
9. Waits for bootstrap to complete (Phase 3)
10. Prints success message with IP and SSH command
- [ ] Add `--dry-run` flag to preview changes
- [ ] Add `--interactive` mode vs `--batch` mode
- [ ] Error handling and rollback on failures
**Deliverable:** `./scripts/create-host.sh --hostname myhost --ip 10.69.13.50` creates a fully working host
---
### Phase 7: Testing & Documentation
**Status:** 🚧 In Progress (testing improvements completed)
**Testing Improvements Implemented (2025-02-01):**
The pipeline now supports efficient testing without polluting master branch:
**1. --force Flag for create-host**
- Re-run `create-host` to regenerate existing configurations
- Updates existing entries in flake.nix and terraform/vms.tf (no duplicates)
- Skip uniqueness validation checks
- Useful for iterating on configuration templates during testing
**2. Branch Support for Bootstrap**
- Bootstrap service reads `NIXOS_FLAKE_BRANCH` environment variable
- Defaults to `master` if not set
- Allows testing pipeline changes on feature branches
- Cloud-init passes branch via `/etc/environment`
**3. Cloud-init Disk for Branch Configuration**
- Terraform generates custom cloud-init snippets for test VMs
- Set `flake_branch` field in VM definition to use non-master branch
- Production VMs omit this field and use master (default)
- Files automatically uploaded to Proxmox via SSH
**Testing Workflow:**
```bash
# 1. Create test branch
git checkout -b test-pipeline
# 2. Generate or update host config
create-host --hostname testvm01 --ip 10.69.13.100/24
# 3. Edit terraform/vms.tf to add test VM with branch
# vms = {
# "testvm01" = {
# ip = "10.69.13.100/24"
# flake_branch = "test-pipeline" # Bootstrap from this branch
# }
# }
# 4. Commit and push test branch
git add -A && git commit -m "test: add testvm01"
git push origin test-pipeline
# 5. Deploy VM
cd terraform && tofu apply
# 6. Watch bootstrap (VM fetches from test-pipeline branch)
ssh root@10.69.13.100
journalctl -fu nixos-bootstrap.service
# 7. Iterate: modify templates and regenerate with --force
cd .. && create-host --hostname testvm01 --ip 10.69.13.100/24 --force
git commit -am "test: update config" && git push
# Redeploy to test fresh bootstrap
cd terraform
tofu destroy -target=proxmox_vm_qemu.vm[\"testvm01\"] && tofu apply
# 8. Clean up when done: squash commits, merge to master, remove test VM
```
**Files:**
- `scripts/create-host/create_host.py` - Added --force parameter
- `scripts/create-host/manipulators.py` - Update vs insert logic
- `hosts/template2/bootstrap.nix` - Branch support via environment variable
- `terraform/vms.tf` - flake_branch field support
- `terraform/cloud-init.tf` - Custom cloud-init disk generation
- `terraform/variables.tf` - proxmox_host variable for SSH uploads
**Remaining Tasks:**
- [ ] Test full pipeline end-to-end on feature branch
- [ ] Update CLAUDE.md with testing workflow
- [ ] Add troubleshooting section
- [ ] Create examples for common scenarios (DHCP host, static IP host, etc.)
---
## Open Questions
1. **Bootstrap method:** Cloud-init runcmd vs Terraform provisioner vs Ansible?
2. **Secrets handling:** Pre-generate keys vs post-deployment injection?
3. **DNS automation:** Auto-commit or manual merge?
4. **Git workflow:** Auto-push changes or leave for user review?
5. **Template selection:** Single template2 or multiple templates for different host types?
6. **Networking:** Always DHCP initially, or support static IP from start?
7. **Error recovery:** What happens if bootstrap fails? Manual intervention or retry?
## Implementation Order
Recommended sequence:
1. Phase 1: Parameterize OpenTofu (foundation for testing)
2. Phase 3: Bootstrap mechanism (core automation)
3. Phase 2: Config generator (automate the boilerplate)
4. Phase 4: Secrets (solves biggest chicken-and-egg)
5. Phase 5: DNS (nice-to-have automation)
6. Phase 6: Integration script (ties it all together)
7. Phase 7: Testing & docs
## Success Criteria
When complete, creating a new host should:
- Take < 5 minutes of human time
- Require minimal user input (hostname, IP, basic specs)
- Result in a fully configured, secret-enabled, DNS-registered host
- Be reproducible and documented
- Handle common errors gracefully
---
## Notes
- Keep incremental commits at each phase
- Test each phase independently before moving to next
- Maintain backward compatibility with manual workflow
- Document any manual steps that can't be automated

View File

@@ -172,6 +172,22 @@
sops-nix.nixosModules.sops
];
};
template2 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = [
(
{ config, pkgs, ... }:
{
nixpkgs.overlays = commonOverlays;
}
)
./hosts/template2
sops-nix.nixosModules.sops
];
};
http-proxy = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
@@ -318,14 +334,53 @@
sops-nix.nixosModules.sops
];
};
testvm01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = [
(
{ config, pkgs, ... }:
{
nixpkgs.overlays = commonOverlays;
}
)
./hosts/testvm01
sops-nix.nixosModules.sops
];
};
vault01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = [
(
{ config, pkgs, ... }:
{
nixpkgs.overlays = commonOverlays;
}
)
./hosts/vault01
sops-nix.nixosModules.sops
];
};
};
packages = forAllSystems (
{ pkgs }:
{
create-host = pkgs.callPackage ./scripts/create-host { };
}
);
devShells = forAllSystems (
{ pkgs }:
{
default = pkgs.mkShell {
packages = with pkgs; [
ansible
python3
opentofu
(pkgs.callPackage ./scripts/create-host { })
];
};
}

View File

@@ -0,0 +1,73 @@
{ pkgs, config, lib, ... }:
let
bootstrap-script = pkgs.writeShellApplication {
name = "nixos-bootstrap";
runtimeInputs = with pkgs; [ systemd curl nixos-rebuild jq git ];
text = ''
set -euo pipefail
# Read hostname set by cloud-init (from Terraform VM name via user-data)
# Cloud-init sets the system hostname from user-data.txt, so we read it from hostnamectl
HOSTNAME=$(hostnamectl hostname)
echo "DEBUG: Hostname from hostnamectl: '$HOSTNAME'"
echo "Starting NixOS bootstrap for host: $HOSTNAME"
echo "Waiting for network connectivity..."
# Verify we can reach the git server via HTTPS (doesn't respond to ping)
if ! curl -s --connect-timeout 5 --max-time 10 https://git.t-juice.club >/dev/null 2>&1; then
echo "ERROR: Cannot reach git.t-juice.club via HTTPS"
echo "Check network configuration and DNS settings"
exit 1
fi
echo "Network connectivity confirmed"
echo "Fetching and building NixOS configuration from flake..."
# Read git branch from environment, default to master
BRANCH="''${NIXOS_FLAKE_BRANCH:-master}"
echo "Using git branch: $BRANCH"
# Build and activate the host-specific configuration
FLAKE_URL="git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#''${HOSTNAME}"
if nixos-rebuild boot --flake "$FLAKE_URL"; then
echo "Successfully built configuration for $HOSTNAME"
echo "Rebooting into new configuration..."
sleep 2
systemctl reboot
else
echo "ERROR: nixos-rebuild failed for $HOSTNAME"
echo "Check that flake has configuration for this hostname"
echo "Manual intervention required - system will not reboot"
exit 1
fi
'';
};
in
{
systemd.services."nixos-bootstrap" = {
description = "Bootstrap NixOS configuration from flake on first boot";
# Wait for cloud-init to finish setting hostname and network to be online
after = [ "cloud-config.service" "network-online.target" ];
wants = [ "network-online.target" ];
requires = [ "cloud-config.service" ];
# Run on boot
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
ExecStart = "${bootstrap-script}/bin/nixos-bootstrap";
# Read environment variables from /etc/environment (set by cloud-init)
EnvironmentFile = "-/etc/environment";
# Logging to journald
StandardOutput = "journal+console";
StandardError = "journal+console";
};
};
}

View File

@@ -0,0 +1,70 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
./hardware-configuration.nix
../../system/sshd.nix
];
# Root user with no password but SSH key access for bootstrapping
users.users.root = {
hashedPassword = "";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
];
};
# Proxmox image-specific configuration
# Configure storage to use local-zfs instead of local-lvm
image.modules.proxmox = {
proxmox.qemuConf.virtio0 = lib.mkForce "local-zfs:vm-9999-disk-0";
proxmox.qemuConf.boot = lib.mkForce "order=virtio0";
proxmox.cloudInit.defaultStorage = lib.mkForce "local-zfs";
};
# Configure cloud-init to use ConfigDrive datasource (used by Proxmox)
services.cloud-init.settings = {
datasource_list = [ "ConfigDrive" "NoCloud" ];
};
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "nixos-template2";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
networkConfig.DHCP = "ipv4";
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
age
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11";
}

View File

@@ -0,0 +1,10 @@
{ ... }:
{
imports = [
./hardware-configuration.nix
./configuration.nix
./scripts.nix
./bootstrap.nix
../../system/packages.nix
];
}

View File

@@ -0,0 +1,45 @@
{
config,
lib,
pkgs,
modulesPath,
...
}:
{
imports = [
(modulesPath + "/profiles/qemu-guest.nix")
];
boot.initrd.availableKernelModules = [
"ata_piix"
"uhci_hcd"
"virtio_pci"
"virtio_scsi"
"sd_mod"
"sr_mod"
];
boot.initrd.kernelModules = [ "dm-snapshot" ];
boot.kernelModules = [
"ptp_kvm"
"virtio_rng" # Provides entropy from host for fast SSH key generation
];
boot.extraModulePackages = [ ];
# Filesystem configuration matching Proxmox image builder output
fileSystems."/" = lib.mkDefault {
device = "/dev/disk/by-label/nixos";
fsType = "ext4";
options = [ "x-systemd.growfs" ];
};
swapDevices = lib.mkDefault [ ];
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
# (the default) this is the recommended approach. When using systemd-networkd it's
# still possible to use this option, but it's recommended to use it in conjunction
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
networking.useDHCP = lib.mkDefault true;
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
}

View File

@@ -0,0 +1,33 @@
{ pkgs, ... }:
let
prepare-host-script = pkgs.writeShellScriptBin "prepare-host.sh"
''
echo "Removing machine-id"
rm -f /etc/machine-id || true
echo "Removing SSH host keys"
rm -f /etc/ssh/ssh_host_* || true
echo "Restarting SSH"
systemctl restart sshd
echo "Removing temporary files"
rm -rf /tmp/* || true
echo "Removing logs"
journalctl --rotate || true
journalctl --vacuum-time=1s || true
echo "Removing cache"
rm -rf /var/cache/* || true
echo "Generate age key"
rm -rf /var/lib/sops-nix || true
mkdir -p /var/lib/sops-nix
${pkgs.age}/bin/age-keygen -o /var/lib/sops-nix/key.txt
'';
in
{
environment.systemPackages = [ prepare-host-script ];
users.motd = "Prepare host by running 'prepare-host.sh'.";
}

View File

@@ -0,0 +1,61 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "testvm01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = false;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.101/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -0,0 +1,63 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
../../services/vault
];
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "vault01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.19/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -0,0 +1,101 @@
---
- name: Build and deploy NixOS Proxmox template
hosts: localhost
gather_facts: false
vars:
template_name: "template2"
nixos_config: "template2"
proxmox_node: "pve1.home.2rjus.net" # Change to your Proxmox node name
proxmox_host: "pve1.home.2rjus.net" # Change to your Proxmox host
template_vmid: 9000 # Template VM ID
storage: "local-zfs"
tasks:
- name: Build NixOS image
ansible.builtin.command:
cmd: "nixos-rebuild build-image --image-variant proxmox --flake .#template2"
chdir: "{{ playbook_dir }}/.."
register: build_result
changed_when: true
- name: Find built image file
ansible.builtin.find:
paths: "{{ playbook_dir}}/../result"
patterns: "*.vma.zst"
recurse: true
register: image_files
- name: Fail if no image found
ansible.builtin.fail:
msg: "No QCOW2 image found in build output"
when: image_files.matched == 0
- name: Set image path
ansible.builtin.set_fact:
image_path: "{{ image_files.files[0].path }}"
- name: Extract image filename
ansible.builtin.set_fact:
image_filename: "{{ image_path | basename }}"
- name: Display image info
ansible.builtin.debug:
msg: "Built image: {{ image_path }} ({{ image_filename }})"
- name: Deploy template to Proxmox
hosts: proxmox
gather_facts: false
vars:
template_name: "template2"
template_vmid: 9000
storage: "local-zfs"
tasks:
- name: Get image path and filename from localhost
ansible.builtin.set_fact:
image_path: "{{ hostvars['localhost']['image_path'] }}"
image_filename: "{{ hostvars['localhost']['image_filename'] }}"
- name: Set destination path
ansible.builtin.set_fact:
image_dest: "/var/lib/vz/dump/{{ image_filename }}"
- name: Copy image to Proxmox
ansible.builtin.copy:
src: "{{ image_path }}"
dest: "{{ image_dest }}"
mode: '0644'
- name: Check if template VM already exists
ansible.builtin.command:
cmd: "qm status {{ template_vmid }}"
register: vm_status
failed_when: false
changed_when: false
- name: Destroy existing template VM if it exists
ansible.builtin.command:
cmd: "qm destroy {{ template_vmid }} --purge"
when: vm_status.rc == 0
changed_when: true
- name: Import image
ansible.builtin.command:
cmd: "qmrestore {{ image_dest }} {{ template_vmid }}"
changed_when: true
- name: Convert VM to template
ansible.builtin.command:
cmd: "qm template {{ template_vmid }}"
changed_when: true
- name: Clean up uploaded image
ansible.builtin.file:
path: "{{ image_dest }}"
state: absent
- name: Display success message
ansible.builtin.debug:
msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}"

5
playbooks/inventory.ini Normal file
View File

@@ -0,0 +1,5 @@
[proxmox]
pve1.home.2rjus.net
[proxmox:vars]
ansible_user=root

View File

@@ -0,0 +1 @@
recursive-include templates *.j2

View File

@@ -0,0 +1,268 @@
# NixOS Host Configuration Generator
Automated tool for generating NixOS host configurations, flake.nix entries, and Terraform VM definitions for homelab infrastructure.
## Installation
The tool is available in the Nix development shell:
```bash
nix develop
```
## Usage
### Basic Usage
Create a new host with DHCP networking:
```bash
python -m scripts.create_host.create_host create --hostname test01
```
Create a new host with static IP:
```bash
python -m scripts.create_host.create_host create \
--hostname test01 \
--ip 10.69.13.50/24
```
Create a host with custom resources:
```bash
python -m scripts.create_host.create_host create \
--hostname bighost01 \
--ip 10.69.13.51/24 \
--cpu 8 \
--memory 8192 \
--disk 100G
```
### Dry Run Mode
Preview what would be created without making changes:
```bash
python -m scripts.create_host.create_host create \
--hostname test01 \
--ip 10.69.13.50/24 \
--dry-run
```
### Force Mode (Regenerate Existing Configuration)
Overwrite an existing host configuration (useful for testing):
```bash
python -m scripts.create_host.create_host create \
--hostname test01 \
--ip 10.69.13.50/24 \
--force
```
This mode:
- Skips hostname and IP uniqueness validation
- Overwrites files in `hosts/<hostname>/`
- Updates existing entries in `flake.nix` and `terraform/vms.tf` (doesn't duplicate)
- Useful for iterating on configuration templates during testing
### Options
- `--hostname` (required): Hostname for the new host
- Must be lowercase alphanumeric with hyphens
- Must be unique (not already exist in repository)
- `--ip` (optional): Static IP address with CIDR notation
- Format: `10.69.13.X/24`
- Must be in `10.69.13.0/24` subnet
- Last octet must be 1-254
- Omit this option for DHCP configuration
- `--cpu` (optional, default: 2): Number of CPU cores
- Must be at least 1
- `--memory` (optional, default: 2048): Memory in MB
- Must be at least 512
- `--disk` (optional, default: "20G"): Disk size
- Examples: "20G", "50G", "100G"
- `--dry-run` (flag): Preview changes without creating files
- `--force` (flag): Overwrite existing host configuration
- Skips uniqueness validation
- Updates existing entries instead of creating duplicates
## What It Does
The tool performs the following actions:
1. **Validates** the configuration:
- Hostname format (RFC 1123 compliance)
- Hostname uniqueness
- IP address format and subnet (if provided)
- IP address uniqueness (if provided)
2. **Generates** host configuration files:
- `hosts/<hostname>/default.nix` - Import wrapper
- `hosts/<hostname>/configuration.nix` - Full host configuration
3. **Updates** repository files:
- `flake.nix` - Adds new nixosConfigurations entry
- `terraform/vms.tf` - Adds new VM definition
4. **Displays** next steps for:
- Reviewing changes with git diff
- Verifying NixOS configuration
- Verifying Terraform configuration
- Committing changes
- Deploying the VM
## Generated Configuration
### Host Features
All generated hosts include:
- Full system imports from `../../system`:
- Nix binary cache integration
- SSH with root login
- SOPS secrets management
- Internal ACME CA integration
- Daily auto-upgrades with auto-reboot
- Prometheus node-exporter
- Promtail logging to monitoring01
- VM guest agent from `../../common/vm`
- Hardware configuration from `../template/hardware-configuration.nix`
### Networking
**Static IP mode** (when `--ip` is provided):
```nix
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [ "10.69.13.50/24" ];
routes = [ { Gateway = "10.69.13.1"; } ];
linkConfig.RequiredForOnline = "routable";
};
```
**DHCP mode** (when `--ip` is omitted):
```nix
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
networkConfig.DHCP = "ipv4";
linkConfig.RequiredForOnline = "routable";
};
```
### DNS Configuration
All hosts are configured with:
- DNS servers: `10.69.13.5`, `10.69.13.6` (ns1, ns2)
- Domain: `home.2rjus.net`
## Examples
### Create a test VM with defaults
```bash
python -m scripts.create_host.create_host create --hostname test99
```
This creates a DHCP VM with 2 CPU cores, 2048 MB memory, and 20G disk.
### Create a database server with static IP
```bash
python -m scripts.create_host.create_host create \
--hostname pgdb2 \
--ip 10.69.13.52/24 \
--cpu 4 \
--memory 4096 \
--disk 50G
```
### Preview changes before creating
```bash
python -m scripts.create_host.create_host create \
--hostname test99 \
--ip 10.69.13.99/24 \
--dry-run
```
## Error Handling
The tool validates input and provides clear error messages for:
- Invalid hostname format (must be lowercase alphanumeric with hyphens)
- Duplicate hostname (already exists in repository)
- Invalid IP format (must be X.X.X.X/24)
- Wrong subnet (must be 10.69.13.0/24)
- Invalid last octet (must be 1-254)
- Duplicate IP address (already in use)
- Resource constraints (CPU < 1, memory < 512 MB)
## Integration with Deployment Pipeline
This tool implements **Phase 2** of the automated deployment pipeline:
1. **Phase 1**: Template building ✓ (build-and-deploy-template.yml)
2. **Phase 2**: Host configuration generation ✓ (this tool)
3. **Phase 3**: Bootstrap automation (planned)
4. **Phase 4**: Secrets management (planned)
5. **Phase 5**: DNS automation (planned)
6. **Phase 6**: Full integration (planned)
## Development
### Project Structure
```
scripts/create-host/
├── create_host.py # Main CLI entry point (typer app)
├── __init__.py # Package initialization
├── validators.py # Validation logic
├── generators.py # File generation using Jinja2
├── manipulators.py # Text manipulation for flake.nix and vms.tf
├── models.py # Data models (HostConfig)
├── templates/
│ ├── default.nix.j2 # Template for default.nix
│ └── configuration.nix.j2 # Template for configuration.nix
└── README.md # This file
```
### Testing
Run the test cases from the implementation plan:
```bash
# Test 1: DHCP host with defaults
python -m scripts.create_host.create_host create --hostname testdhcp --dry-run
# Test 2: Static IP host
python -m scripts.create_host.create_host create \
--hostname test50 --ip 10.69.13.50/24 --dry-run
# Test 3: Custom resources
python -m scripts.create_host.create_host create \
--hostname test51 --ip 10.69.13.51/24 \
--cpu 8 --memory 8192 --disk 100G --dry-run
# Test 4: Duplicate hostname (should error)
python -m scripts.create_host.create_host create --hostname ns1 --dry-run
# Test 5: Invalid subnet (should error)
python -m scripts.create_host.create_host create \
--hostname testbad --ip 192.168.1.50/24 --dry-run
# Test 6: Invalid hostname (should error)
python -m scripts.create_host.create_host create --hostname Test_Host --dry-run
```
## License
Part of the nixos-servers homelab infrastructure repository.

View File

@@ -0,0 +1,3 @@
"""NixOS host configuration generator for homelab infrastructure."""
__version__ = "0.1.0"

View File

@@ -0,0 +1,6 @@
"""Entry point for running the create-host module."""
from .create_host import app
if __name__ == "__main__":
app()

View File

@@ -0,0 +1,197 @@
"""CLI tool for generating NixOS host configurations."""
import sys
from pathlib import Path
from typing import Optional
import typer
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from generators import generate_host_files
from manipulators import update_flake_nix, update_terraform_vms
from models import HostConfig
from validators import (
validate_hostname_format,
validate_hostname_unique,
validate_ip_subnet,
validate_ip_unique,
)
app = typer.Typer(
name="create-host",
help="Generate NixOS host configurations for homelab infrastructure",
add_completion=False,
)
console = Console()
def get_repo_root() -> Path:
"""Get the repository root directory."""
# Use current working directory as repo root
# The tool should be run from the repository root
return Path.cwd()
@app.callback(invoke_without_command=True)
def main(
ctx: typer.Context,
hostname: Optional[str] = typer.Option(None, "--hostname", help="Hostname for the new host"),
ip: Optional[str] = typer.Option(
None, "--ip", help="Static IP address with CIDR (e.g., 10.69.13.50/24). Omit for DHCP."
),
cpu: int = typer.Option(2, "--cpu", help="Number of CPU cores"),
memory: int = typer.Option(2048, "--memory", help="Memory in MB"),
disk: str = typer.Option("20G", "--disk", help="Disk size (e.g., 20G, 50G, 100G)"),
dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes without creating files"),
force: bool = typer.Option(False, "--force", help="Overwrite existing host configuration"),
) -> None:
"""
Create a new NixOS host configuration.
Generates host configuration files, updates flake.nix, and adds Terraform VM definition.
"""
# Show help if no hostname provided
if hostname is None:
console.print("[bold red]Error:[/bold red] --hostname is required\n")
ctx.get_help()
sys.exit(1)
try:
# Build configuration
config = HostConfig(
hostname=hostname,
ip=ip,
cpu=cpu,
memory=memory,
disk=disk,
)
# Get repository root
repo_root = get_repo_root()
# Validate configuration
console.print("\n[bold blue]Validating configuration...[/bold blue]")
config.validate()
validate_hostname_format(hostname)
# Skip uniqueness checks in force mode
if not force:
validate_hostname_unique(hostname, repo_root)
if ip:
validate_ip_unique(ip, repo_root)
else:
# Check if we're actually overwriting something
host_dir = repo_root / "hosts" / hostname
if host_dir.exists():
console.print(f"[yellow]⚠[/yellow] Updating existing host configuration for {hostname}")
if ip:
validate_ip_subnet(ip)
console.print("[green]✓[/green] All validations passed\n")
# Display configuration summary
display_config_summary(config)
# Dry run mode - exit before making changes
if dry_run:
console.print("\n[yellow]DRY RUN MODE - No files will be created[/yellow]\n")
display_dry_run_summary(config, repo_root)
return
# Generate files
console.print("\n[bold blue]Generating host configuration...[/bold blue]")
generate_host_files(config, repo_root)
action = "Updated" if force else "Created"
console.print(f"[green]✓[/green] {action} hosts/{hostname}/default.nix")
console.print(f"[green]✓[/green] {action} hosts/{hostname}/configuration.nix")
update_flake_nix(config, repo_root, force=force)
console.print("[green]✓[/green] Updated flake.nix")
update_terraform_vms(config, repo_root, force=force)
console.print("[green]✓[/green] Updated terraform/vms.tf")
# Success message
console.print("\n[bold green]✓ Host configuration generated successfully![/bold green]\n")
# Display next steps
display_next_steps(hostname)
except ValueError as e:
console.print(f"\n[bold red]Error:[/bold red] {e}\n", style="red")
sys.exit(1)
except Exception as e:
console.print(f"\n[bold red]Unexpected error:[/bold red] {e}\n", style="red")
sys.exit(1)
def display_config_summary(config: HostConfig) -> None:
"""Display configuration summary table."""
table = Table(title="Host Configuration", show_header=False)
table.add_column("Property", style="cyan")
table.add_column("Value", style="white")
table.add_row("Hostname", config.hostname)
table.add_row("Domain", config.domain)
table.add_row("Network Mode", "Static IP" if config.is_static_ip else "DHCP")
if config.is_static_ip:
table.add_row("IP Address", config.ip)
table.add_row("Gateway", config.gateway)
table.add_row("DNS Servers", ", ".join(config.nameservers))
table.add_row("CPU Cores", str(config.cpu))
table.add_row("Memory", f"{config.memory} MB")
table.add_row("Disk Size", config.disk)
table.add_row("State Version", config.state_version)
console.print(table)
def display_dry_run_summary(config: HostConfig, repo_root: Path) -> None:
"""Display what would be created in dry run mode."""
console.print("[bold]Files that would be created:[/bold]")
console.print(f"{repo_root}/hosts/{config.hostname}/default.nix")
console.print(f"{repo_root}/hosts/{config.hostname}/configuration.nix")
console.print("\n[bold]Files that would be modified:[/bold]")
console.print(f"{repo_root}/flake.nix (add nixosConfigurations.{config.hostname})")
console.print(f"{repo_root}/terraform/vms.tf (add VM definition)")
def display_next_steps(hostname: str) -> None:
"""Display next steps after successful generation."""
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
1. Review changes:
[white]git diff[/white]
2. Verify NixOS configuration:
[white]nix flake check
nix build .#nixosConfigurations.{hostname}.config.system.build.toplevel[/white]
3. Verify Terraform configuration:
[white]cd terraform
tofu validate
tofu plan[/white]
4. Commit changes:
[white]git add hosts/{hostname} flake.nix terraform/vms.tf
git commit -m "hosts: add {hostname} configuration"[/white]
5. Deploy VM (after merging to master):
[white]cd terraform
tofu apply[/white]
6. Bootstrap the host (see Phase 3 of deployment pipeline)
"""
console.print(Panel(next_steps, border_style="cyan"))
if __name__ == "__main__":
app()

View File

@@ -0,0 +1,38 @@
{ lib
, python3
, python3Packages
}:
python3Packages.buildPythonApplication {
pname = "create-host";
version = "0.1.0";
src = ./.;
pyproject = true;
build-system = with python3Packages; [
setuptools
];
propagatedBuildInputs = with python3Packages; [
typer
jinja2
rich
];
# Install templates to share directory
postInstall = ''
mkdir -p $out/share/create-host
cp -r templates $out/share/create-host/
'';
# No tests yet
doCheck = false;
meta = with lib; {
description = "NixOS host configuration generator for homelab infrastructure";
license = licenses.mit;
maintainers = [ ];
};
}

View File

@@ -0,0 +1,88 @@
"""File generation using Jinja2 templates."""
import sys
from pathlib import Path
from jinja2 import Environment, BaseLoader, TemplateNotFound
from models import HostConfig
class PackageTemplateLoader(BaseLoader):
"""Custom Jinja2 loader that works with both dev and installed packages."""
def __init__(self):
# Try to find templates in multiple locations
self.template_dirs = []
# Location 1: Development (scripts/create-host/templates)
dev_dir = Path(__file__).parent / "templates"
if dev_dir.exists():
self.template_dirs.append(dev_dir)
# Location 2: Installed via Nix (../share/create-host/templates from bin dir)
# When installed via Nix, __file__ is in lib/python3.X/site-packages/
# and templates are in ../../../share/create-host/templates
for site_path in sys.path:
site_dir = Path(site_path)
# Try to find the Nix store path
if "site-packages" in str(site_dir):
# Go up to the package root (e.g., /nix/store/xxx-create-host-0.1.0)
pkg_root = site_dir.parent.parent.parent
share_templates = pkg_root / "share" / "create-host" / "templates"
if share_templates.exists():
self.template_dirs.append(share_templates)
# Location 3: Fallback - sys.path templates
for site_path in sys.path:
site_templates = Path(site_path) / "templates"
if site_templates.exists():
self.template_dirs.append(site_templates)
def get_source(self, environment, template):
for template_dir in self.template_dirs:
template_path = template_dir / template
if template_path.exists():
mtime = template_path.stat().st_mtime
source = template_path.read_text()
return source, str(template_path), lambda: mtime == template_path.stat().st_mtime
raise TemplateNotFound(template)
def generate_host_files(config: HostConfig, repo_root: Path) -> None:
"""
Generate host configuration files from templates.
Args:
config: Host configuration
repo_root: Path to repository root
"""
# Setup Jinja2 environment with custom loader
env = Environment(
loader=PackageTemplateLoader(),
trim_blocks=True,
lstrip_blocks=True,
)
# Create host directory
host_dir = repo_root / "hosts" / config.hostname
host_dir.mkdir(parents=True, exist_ok=True)
# Generate default.nix
default_template = env.get_template("default.nix.j2")
default_content = default_template.render(hostname=config.hostname)
(host_dir / "default.nix").write_text(default_content)
# Generate configuration.nix
config_template = env.get_template("configuration.nix.j2")
config_content = config_template.render(
hostname=config.hostname,
domain=config.domain,
nameservers=config.nameservers,
is_static_ip=config.is_static_ip,
ip=config.ip,
gateway=config.gateway,
state_version=config.state_version,
)
(host_dir / "configuration.nix").write_text(config_content)

View File

@@ -0,0 +1,124 @@
"""Text manipulation for flake.nix and Terraform files."""
import re
from pathlib import Path
from models import HostConfig
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
"""
Add or update host entry in flake.nix nixosConfigurations.
Args:
config: Host configuration
repo_root: Path to repository root
force: If True, replace existing entry; if False, insert new entry
"""
flake_path = repo_root / "flake.nix"
content = flake_path.read_text()
# Create new entry
new_entry = f""" {config.hostname} = nixpkgs.lib.nixosSystem {{
inherit system;
specialArgs = {{
inherit inputs self sops-nix;
}};
modules = [
(
{{ config, pkgs, ... }}:
{{
nixpkgs.overlays = commonOverlays;
}}
)
./hosts/{config.hostname}
sops-nix.nixosModules.sops
];
}};
"""
# Check if hostname already exists
hostname_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem"
existing_match = re.search(hostname_pattern, content, re.MULTILINE)
if existing_match and force:
# Replace existing entry
# Match the entire block from "hostname = " to "};"
replace_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
new_content, count = re.subn(replace_pattern, new_entry, content, flags=re.MULTILINE | re.DOTALL)
if count == 0:
raise ValueError(f"Could not find existing entry for {config.hostname} in flake.nix")
else:
# Insert new entry before closing brace of nixosConfigurations
# Pattern: " };\n packages = forAllSystems"
pattern = r"( \};)\n( packages = forAllSystems)"
replacement = rf"{new_entry}\g<1>\n\g<2>"
new_content, count = re.subn(pattern, replacement, content)
if count == 0:
raise ValueError(
"Could not find insertion point in flake.nix. "
"Looking for pattern: ' };\\n packages = forAllSystems'"
)
flake_path.write_text(new_content)
def update_terraform_vms(config: HostConfig, repo_root: Path, force: bool = False) -> None:
"""
Add or update VM entry in terraform/vms.tf locals.vms map.
Args:
config: Host configuration
repo_root: Path to repository root
force: If True, replace existing entry; if False, insert new entry
"""
terraform_path = repo_root / "terraform" / "vms.tf"
content = terraform_path.read_text()
# Create new entry based on whether we have static IP or DHCP
if config.is_static_ip:
new_entry = f''' "{config.hostname}" = {{
ip = "{config.ip}"
cpu_cores = {config.cpu}
memory = {config.memory}
disk_size = "{config.disk}"
}}
'''
else:
new_entry = f''' "{config.hostname}" = {{
cpu_cores = {config.cpu}
memory = {config.memory}
disk_size = "{config.disk}"
}}
'''
# Check if hostname already exists
hostname_pattern = rf'^\s+"{re.escape(config.hostname)}" = \{{'
existing_match = re.search(hostname_pattern, content, re.MULTILINE)
if existing_match and force:
# Replace existing entry
# Match the entire block from "hostname" = { to }
replace_pattern = rf'^\s+"{re.escape(config.hostname)}" = \{{.*?^\s+\}}\n'
new_content, count = re.subn(replace_pattern, new_entry, content, flags=re.MULTILINE | re.DOTALL)
if count == 0:
raise ValueError(f"Could not find existing entry for {config.hostname} in terraform/vms.tf")
else:
# Insert new entry before closing brace
# Pattern: " }\n\n # Compute VM configurations"
pattern = r"( \})\n\n( # Compute VM configurations)"
replacement = rf"{new_entry}\g<1>\n\n\g<2>"
new_content, count = re.subn(pattern, replacement, content)
if count == 0:
raise ValueError(
"Could not find insertion point in terraform/vms.tf. "
"Looking for pattern: ' }\\n\\n # Compute VM configurations'"
)
terraform_path.write_text(new_content)

View File

@@ -0,0 +1,54 @@
"""Data models for host configuration."""
from dataclasses import dataclass
from typing import Optional
@dataclass
class HostConfig:
"""Configuration for a new NixOS host."""
hostname: str
ip: Optional[str] = None
cpu: int = 2
memory: int = 2048
disk: str = "20G"
@property
def is_static_ip(self) -> bool:
"""Check if host uses static IP configuration."""
return self.ip is not None
@property
def gateway(self) -> str:
"""Default gateway for the network."""
return "10.69.13.1"
@property
def nameservers(self) -> list[str]:
"""DNS nameservers for the network."""
return ["10.69.13.5", "10.69.13.6"]
@property
def domain(self) -> str:
"""Domain name for the network."""
return "home.2rjus.net"
@property
def state_version(self) -> str:
"""NixOS state version for new hosts."""
return "25.11"
def validate(self) -> None:
"""Validate configuration constraints."""
if not self.hostname:
raise ValueError("Hostname cannot be empty")
if self.cpu < 1:
raise ValueError("CPU cores must be at least 1")
if self.memory < 512:
raise ValueError("Memory must be at least 512 MB")
if not self.disk:
raise ValueError("Disk size cannot be empty")

View File

@@ -0,0 +1,33 @@
from setuptools import setup
from pathlib import Path
# Read templates
templates = [str(p.relative_to(".")) for p in Path("templates").glob("*.j2")]
setup(
name="create-host",
version="0.1.0",
description="NixOS host configuration generator for homelab infrastructure",
py_modules=[
"create_host",
"models",
"validators",
"generators",
"manipulators",
],
include_package_data=True,
data_files=[
("templates", templates),
],
install_requires=[
"typer",
"jinja2",
"rich",
],
entry_points={
"console_scripts": [
"create-host=create_host:app",
],
},
python_requires=">=3.9",
)

View File

@@ -0,0 +1,66 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "{{ hostname }}";
networking.domain = "{{ domain }}";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
{% for ns in nameservers %}
"{{ ns }}"
{% endfor %}
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
{% if is_static_ip %}
address = [
"{{ ip }}"
];
routes = [
{ Gateway = "{{ gateway }}"; }
];
{% else %}
networkConfig.DHCP = "ipv4";
{% endif %}
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "{{ state_version }}"; # Did you read the comment?
}

View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -0,0 +1,159 @@
"""Validation functions for host configuration."""
import re
from pathlib import Path
from typing import Optional
def validate_hostname_format(hostname: str) -> None:
"""
Validate hostname format according to RFC 1123.
Args:
hostname: Hostname to validate
Raises:
ValueError: If hostname format is invalid
"""
# RFC 1123: lowercase, alphanumeric, hyphens, max 63 chars
pattern = r"^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$"
if not re.match(pattern, hostname):
raise ValueError(
f"Invalid hostname '{hostname}'. "
"Must be lowercase alphanumeric with hyphens, "
"start and end with alphanumeric, max 63 characters."
)
def validate_hostname_unique(hostname: str, repo_root: Path) -> None:
"""
Validate that hostname is unique in the repository.
Args:
hostname: Hostname to check
repo_root: Path to repository root
Raises:
ValueError: If hostname already exists
"""
# Check if host directory exists
host_dir = repo_root / "hosts" / hostname
if host_dir.exists():
raise ValueError(f"Host directory already exists: {host_dir}")
# Check if hostname exists in flake.nix
flake_path = repo_root / "flake.nix"
if flake_path.exists():
flake_content = flake_path.read_text()
# Look for pattern like " hostname = "
hostname_pattern = rf'^\s+{re.escape(hostname)}\s*='
if re.search(hostname_pattern, flake_content, re.MULTILINE):
raise ValueError(f"Hostname '{hostname}' already exists in flake.nix")
def validate_ip_format(ip: str) -> None:
"""
Validate IP address format with CIDR notation.
Args:
ip: IP address with CIDR (e.g., "10.69.13.50/24")
Raises:
ValueError: If IP format is invalid
"""
if not ip:
return
# Check CIDR notation
if "/" not in ip:
raise ValueError(f"IP address must include CIDR notation (e.g., {ip}/24)")
ip_part, cidr_part = ip.rsplit("/", 1)
# Validate CIDR is /24
if cidr_part != "24":
raise ValueError(f"CIDR notation must be /24, got /{cidr_part}")
# Validate IP format
octets = ip_part.split(".")
if len(octets) != 4:
raise ValueError(f"Invalid IP address format: {ip_part}")
try:
octet_values = [int(octet) for octet in octets]
except ValueError:
raise ValueError(f"Invalid IP address format: {ip_part}")
# Check each octet is 0-255
for i, value in enumerate(octet_values):
if not 0 <= value <= 255:
raise ValueError(f"Invalid octet value {value} in IP address")
# Check last octet is 1-254
if not 1 <= octet_values[3] <= 254:
raise ValueError(
f"Last octet must be 1-254, got {octet_values[3]}"
)
def validate_ip_subnet(ip: str) -> None:
"""
Validate that IP address is in the correct subnet (10.69.13.0/24).
Args:
ip: IP address with CIDR (e.g., "10.69.13.50/24")
Raises:
ValueError: If IP is not in correct subnet
"""
if not ip:
return
validate_ip_format(ip)
ip_part = ip.split("/")[0]
octets = ip_part.split(".")
# Check subnet is 10.69.13.x
if octets[:3] != ["10", "69", "13"]:
raise ValueError(
f"IP address must be in 10.69.13.0/24 subnet, got {ip_part}"
)
def validate_ip_unique(ip: Optional[str], repo_root: Path) -> None:
"""
Validate that IP address is not already in use.
Args:
ip: IP address with CIDR to check (None for DHCP)
repo_root: Path to repository root
Raises:
ValueError: If IP is already in use
"""
if not ip:
return # DHCP mode, no uniqueness check needed
# Extract just the IP part without CIDR for searching
ip_part = ip.split("/")[0]
# Check all hosts/*/configuration.nix files
hosts_dir = repo_root / "hosts"
if hosts_dir.exists():
for config_file in hosts_dir.glob("*/configuration.nix"):
content = config_file.read_text()
if ip_part in content:
raise ValueError(
f"IP address {ip_part} already in use in {config_file}"
)
# Check terraform/vms.tf
terraform_file = repo_root / "terraform" / "vms.tf"
if terraform_file.exists():
content = terraform_file.read_text()
if ip_part in content:
raise ValueError(
f"IP address {ip_part} already in use in {terraform_file}"
)

View File

@@ -0,0 +1,8 @@
{ ... }:
{
services.vault = {
enable = true;
storageBackend = "file";
};
}

214
terraform/README.md Normal file
View File

@@ -0,0 +1,214 @@
# OpenTofu Configuration for Proxmox
This directory contains OpenTofu configuration for managing Proxmox VMs using a parameterized, multi-VM deployment system.
## Setup
1. **Create a Proxmox API token:**
- Log into Proxmox web UI
- Go to Datacenter → Permissions → API Tokens
- Click Add
- User: `root@pam`, Token ID: `terraform`
- Uncheck "Privilege Separation"
- Save the token secret (shown only once)
2. **Configure credentials:**
```bash
cd terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your Proxmox details
```
3. **Initialize OpenTofu:**
```bash
tofu init
```
4. **Test connection:**
```bash
tofu plan
```
## Defining VMs
All VMs are defined in the `vms.tf` file in the `locals.vms` map. Each VM can specify custom configurations or use defaults from `variables.tf`.
### Example: DHCP VM
```hcl
vms = {
"simple-vm" = {
cpu_cores = 2
memory = 2048
disk_size = "20G"
# No "ip" field = DHCP
}
}
```
### Example: Static IP VM
```hcl
vms = {
"web-server" = {
ip = "10.69.13.50/24"
cpu_cores = 4
memory = 4096
disk_size = "50G"
}
}
```
### Example: Minimal VM (all defaults)
```hcl
vms = {
"test-vm" = {}
}
```
### Example: Multiple VMs
```hcl
vms = {
"vm1" = {
ip = "10.69.13.50/24"
}
"vm2" = {
ip = "10.69.13.51/24"
cpu_cores = 4
memory = 8192
}
"vm3" = {
# DHCP
cpu_cores = 2
memory = 2048
}
}
```
### Example: Test VM with Custom Git Branch
For testing pipeline changes without polluting master:
```hcl
vms = {
"test-vm" = {
ip = "10.69.13.100/24"
flake_branch = "test-pipeline" # Bootstrap from this branch
}
}
```
This VM will bootstrap from the `test-pipeline` branch instead of `master`. Production VMs should omit the `flake_branch` field.
## Configuration Options
Each VM in the `vms` map supports the following fields (all optional):
| Field | Description | Default |
|-------|-------------|---------|
| `ip` | Static IP with CIDR (e.g., "10.69.13.50/24"). Omit for DHCP | DHCP |
| `gateway` | Network gateway (used with static IP) | `10.69.13.1` |
| `cpu_cores` | Number of CPU cores | `2` |
| `memory` | Memory in MB | `2048` |
| `disk_size` | Disk size (e.g., "20G", "100G") | `"20G"` |
| `flake_branch` | Git branch for bootstrap (for testing, omit for production) | `master` |
| `target_node` | Proxmox node to deploy to | `"pve1"` |
| `template_name` | Template VM to clone from | `"nixos-25.11.20260128.fa83fd8"` |
| `storage` | Storage backend | `"local-zfs"` |
| `bridge` | Network bridge | `"vmbr0"` |
| `vlan_tag` | VLAN tag | `13` |
| `ssh_public_key` | SSH public key for root | See `variables.tf` |
| `nameservers` | DNS servers | `"10.69.13.5 10.69.13.6"` |
| `search_domain` | DNS search domain | `"home.2rjus.net"` |
Defaults are defined in `variables.tf` and can be changed globally.
## Deployment Commands
### Deploy All VMs
```bash
tofu apply
```
### Deploy Specific VM
```bash
tofu apply -target=proxmox_vm_qemu.vm[\"vm-name\"]
```
### Destroy Specific VM
```bash
tofu destroy -target=proxmox_vm_qemu.vm[\"vm-name\"]
```
### View Deployed VMs
```bash
tofu output vm_ips
tofu output deployment_summary
```
### Plan Changes
```bash
tofu plan
```
## Outputs
After deployment, OpenTofu provides two outputs:
**vm_ips**: IP addresses and SSH commands for each VM
```
vm_ips = {
"vm1" = {
ip = "10.69.13.50"
ssh_command = "ssh root@10.69.13.50"
}
}
```
**deployment_summary**: Full specifications for each VM
```
deployment_summary = {
"vm1" = {
cpu_cores = 4
memory_mb = 4096
disk_size = "50G"
ip = "10.69.13.50"
node = "pve1"
}
}
```
## Workflow
1. Edit `vms.tf` to define your VMs in the `locals.vms` map
2. Run `tofu plan` to preview changes
3. Run `tofu apply` to deploy
4. Run `tofu output vm_ips` to get IP addresses
5. SSH to VMs and configure as needed
## Files
- `main.tf` - Provider configuration
- `variables.tf` - Variable definitions and defaults
- `vms.tf` - VM definitions and deployment logic
- `cloud-init.tf` - Cloud-init disk management (SSH keys, networking, branch config)
- `outputs.tf` - Output definitions for deployed VMs
- `terraform.tfvars.example` - Example credentials file
- `terraform.tfvars` - Your actual credentials (gitignored)
- `vm.tf.old` - Archived single-VM configuration (reference)
## Notes
- VMs are deployed as full clones (not linked clones)
- Cloud-init handles initial networking configuration
- QEMU guest agent is enabled on all VMs
- All VMs start on boot by default
- IPv6 is disabled
- Destroying VMs removes them from Proxmox but does not clean up DNS entries or NixOS configurations

58
terraform/cloud-init.tf Normal file
View File

@@ -0,0 +1,58 @@
# Cloud-init configuration for all VMs
#
# This file manages cloud-init disks for all VMs using the proxmox_cloud_init_disk resource.
# VMs with flake_branch set will include NIXOS_FLAKE_BRANCH environment variable.
resource "proxmox_cloud_init_disk" "ci" {
for_each = local.vm_configs
name = each.key
pve_node = each.value.target_node
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets
# User data includes SSH keys and optionally NIXOS_FLAKE_BRANCH
user_data = <<-EOT
#cloud-config
ssh_authorized_keys:
- ${each.value.ssh_public_key}
${each.value.flake_branch != null ? <<-BRANCH
write_files:
- path: /etc/environment
content: |
NIXOS_FLAKE_BRANCH=${each.value.flake_branch}
append: true
BRANCH
: ""}
EOT
# Network configuration - static IP or DHCP
network_config = each.value.ip != null ? yamlencode({
version = 1
config = [{
type = "physical"
name = "ens18"
subnets = [{
type = "static"
address = each.value.ip
gateway = each.value.gateway
dns_nameservers = split(" ", each.value.nameservers)
dns_search = [each.value.search_domain]
}]
}]
}) : yamlencode({
version = 1
config = [{
type = "physical"
name = "ens18"
subnets = [{
type = "dhcp"
}]
}]
})
# Instance metadata
meta_data = yamlencode({
instance_id = sha1(each.key)
local-hostname = each.key
})
}

18
terraform/main.tf Normal file
View File

@@ -0,0 +1,18 @@
terraform {
required_version = ">= 1.0"
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "3.0.2-rc07"
}
}
}
provider "proxmox" {
pm_api_url = var.proxmox_api_url
pm_api_token_id = var.proxmox_api_token_id
pm_api_token_secret = var.proxmox_api_token_secret
pm_tls_insecure = var.proxmox_tls_insecure
}
# Provider configured - ready to add resources

24
terraform/outputs.tf Normal file
View File

@@ -0,0 +1,24 @@
# Dynamic outputs for all deployed VMs
output "vm_ips" {
description = "IP addresses and SSH commands for deployed VMs"
value = {
for name, vm in proxmox_vm_qemu.vm : name => {
ip = vm.default_ipv4_address
ssh_command = "ssh root@${vm.default_ipv4_address}"
}
}
}
output "deployment_summary" {
description = "Summary of deployed VMs with their specifications"
value = {
for name, vm in proxmox_vm_qemu.vm : name => {
cpu_cores = vm.cpu[0].cores
memory_mb = vm.memory
disk_size = vm.disks[0].virtio[0].virtio0[0].disk[0].size
ip = vm.default_ipv4_address
node = vm.target_node
}
}
}

View File

@@ -0,0 +1,7 @@
# Copy this file to terraform.tfvars and fill in your values
# terraform.tfvars is gitignored to keep credentials safe
proxmox_api_url = "https://your-proxmox-host.home.2rjus.net:8006/api2/json"
proxmox_api_token_id = "root@pam!terraform"
proxmox_api_token_secret = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
proxmox_tls_insecure = true

97
terraform/variables.tf Normal file
View File

@@ -0,0 +1,97 @@
variable "proxmox_api_url" {
description = "Proxmox API URL (e.g., https://proxmox.home.2rjus.net:8006/api2/json)"
type = string
}
variable "proxmox_api_token_id" {
description = "Proxmox API Token ID (e.g., root@pam!terraform)"
type = string
sensitive = true
}
variable "proxmox_api_token_secret" {
description = "Proxmox API Token Secret"
type = string
sensitive = true
}
variable "proxmox_tls_insecure" {
description = "Skip TLS verification (set to true for self-signed certs)"
type = bool
default = true
}
# Default values for VM configurations
# These can be overridden per-VM in vms.tf
variable "default_target_node" {
description = "Default Proxmox node to deploy VMs to"
type = string
default = "pve1"
}
variable "default_template_name" {
description = "Default template VM name to clone from"
type = string
default = "nixos-25.11.20260128.fa83fd8"
}
variable "default_ssh_public_key" {
description = "Default SSH public key for root user"
type = string
default = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
}
variable "default_storage" {
description = "Default storage backend for VM disks"
type = string
default = "local-zfs"
}
variable "default_bridge" {
description = "Default network bridge"
type = string
default = "vmbr0"
}
variable "default_vlan_tag" {
description = "Default VLAN tag"
type = number
default = 13
}
variable "default_gateway" {
description = "Default network gateway for static IP VMs"
type = string
default = "10.69.13.1"
}
variable "default_nameservers" {
description = "Default DNS nameservers"
type = string
default = "10.69.13.5 10.69.13.6"
}
variable "default_search_domain" {
description = "Default DNS search domain"
type = string
default = "home.2rjus.net"
}
variable "default_cpu_cores" {
description = "Default number of CPU cores"
type = number
default = 2
}
variable "default_memory" {
description = "Default memory in MB"
type = number
default = 2048
}
variable "default_disk_size" {
description = "Default disk size"
type = string
default = "20G"
}

135
terraform/vms.tf Normal file
View File

@@ -0,0 +1,135 @@
# VM Definitions
# Define all VMs to deploy in the locals.vms map below
# Omit fields to use defaults from variables.tf
locals {
# Define VMs here
# Each VM can override defaults by specifying values
# Omit "ip" field for DHCP, include it for static IP
vms = {
# Example DHCP VM (uncomment to deploy):
# "example-dhcp-vm" = {
# cpu_cores = 2
# memory = 2048
# disk_size = "20G"
# }
# Example Static IP VM (uncomment to deploy):
# "example-static-vm" = {
# ip = "10.69.13.50/24"
# cpu_cores = 4
# memory = 4096
# disk_size = "50G"
# }
# Example Test VM with custom git branch (for testing pipeline changes):
# "test-vm" = {
# ip = "10.69.13.100/24"
# flake_branch = "test-pipeline" # Bootstrap from this branch instead of master
# }
# Example Minimal VM using all defaults (uncomment to deploy):
# "minimal-vm" = {}
# "bootstrap-verify-test" = {}
"testvm01" = {
ip = "10.69.13.101/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "pipeline-testing-improvements"
}
"vault01" = {
ip = "10.69.13.19/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
}
}
# Compute VM configurations with defaults applied
vm_configs = {
for name, vm in local.vms : name => {
target_node = lookup(vm, "target_node", var.default_target_node)
template_name = lookup(vm, "template_name", var.default_template_name)
cpu_cores = lookup(vm, "cpu_cores", var.default_cpu_cores)
memory = lookup(vm, "memory", var.default_memory)
disk_size = lookup(vm, "disk_size", var.default_disk_size)
storage = lookup(vm, "storage", var.default_storage)
bridge = lookup(vm, "bridge", var.default_bridge)
vlan_tag = lookup(vm, "vlan_tag", var.default_vlan_tag)
ssh_public_key = lookup(vm, "ssh_public_key", var.default_ssh_public_key)
nameservers = lookup(vm, "nameservers", var.default_nameservers)
search_domain = lookup(vm, "search_domain", var.default_search_domain)
# Network configuration - detect DHCP vs static
ip = lookup(vm, "ip", null)
gateway = lookup(vm, "gateway", var.default_gateway)
# Branch configuration for bootstrap (optional, uses master if not set)
flake_branch = lookup(vm, "flake_branch", null)
}
}
}
# Deploy all VMs using for_each
resource "proxmox_vm_qemu" "vm" {
for_each = local.vm_configs
name = each.key
target_node = each.value.target_node
# Clone from template
clone = each.value.template_name
full_clone = true
# Boot configuration
boot = "order=virtio0"
scsihw = "virtio-scsi-single"
# VM settings
cpu {
cores = each.value.cpu_cores
}
memory = each.value.memory
# Network
network {
id = 0
model = "virtio"
bridge = each.value.bridge
tag = each.value.vlan_tag
}
# Disk settings
disks {
virtio {
virtio0 {
disk {
size = each.value.disk_size
storage = each.value.storage
}
}
}
ide {
ide2 {
# Reference the custom cloud-init disk created in cloud-init.tf
cdrom {
iso = proxmox_cloud_init_disk.ci[each.key].id
}
}
}
}
# Start on boot
start_at_node_boot = true
# Agent
agent = 1
# Skip IPv6 since we don't use it
skip_ipv6 = true
# RNG device for better entropy
rng {
source = "/dev/urandom"
period = 1000
}
}