12 Commits

Author SHA1 Message Date
4afb37d730 create-host: enable resolved in configuration.nix.j2
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m15s
2026-02-01 20:37:36 +01:00
a2c798bc30 vault: add minimal vault config
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2026-02-01 20:27:02 +01:00
6d64e53586 hosts: add vault01 host
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
2026-02-01 20:08:48 +01:00
e0ad445341 planning: update TODO.md
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2026-02-01 20:05:56 +01:00
d194c147d6 Merge pull request 'pipeline-testing-improvements' (#9) from pipeline-testing-improvements into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m13s
Reviewed-on: #9
2026-02-01 16:45:04 +00:00
9908286062 scripts: fix create-host flake.nix insertion point
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m12s
Run nix flake check / flake-check (push) Failing after 8m24s
Fix bug where new hosts were added outside of nixosConfigurations block
instead of inside it.

Issues fixed:
1. Pattern was looking for "packages =" but actual text is "packages = forAllSystems"
2. Replacement was putting new entry AFTER closing brace instead of BEFORE
3. testvm01 was at top-level flake output instead of in nixosConfigurations

Changes:
- Update pattern to match "packages = forAllSystems"
- Put new entry BEFORE the closing brace of nixosConfigurations
- Move testvm01 to correct location inside nixosConfigurations block

Result: nix flake show now correctly shows testvm01 as NixOS configuration

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
cec496dda7 terraform: use local storage for cloud-init disks
Fix error "500 can't upload to storage type 'zfspool'" by using "local"
storage pool for cloud-init disks instead of the VM's storage pool.

Cloud-init disks require storage that supports ISO/snippet content types,
which zfspool does not. The "local" storage pool (directory-based) supports
this content type.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
fca50562c3 terraform: fix cloud-init conditional type inconsistency
Fix OpenTofu error where static IP and DHCP branches had different object
structures in the subnets array. Move conditional to network_config level
so both branches return complete, consistent yamlencode() results.

Error was: "The true and false result expressions must have consistent types"

Solution: Make network_config itself conditional rather than the subnets
array, ensuring both branches return the same type (string from yamlencode).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
1f1829dc2f docs: update terraform README for cloud-init refactoring
Remove mention of .generated/ directory and clarify that cloud-init.tf
manages all cloud-init disks, not just branch-specific ones.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
21a32e0521 terraform: refactor cloud-init to use proxmox_cloud_init_disk resource
Replace SSH upload approach with native proxmox_cloud_init_disk resource
for cleaner, more maintainable cloud-init management.

Changes:
- Use proxmox_cloud_init_disk for all VMs (not just branch-specific ones)
- Include SSH keys, network config, and metadata in cloud-init disk
- Conditionally include NIXOS_FLAKE_BRANCH for VMs with flake_branch set
- Replace ide2 cloudinit disk with cdrom reference to cloud-init disk
- Remove built-in cloud-init parameters (ciuser, sshkeys, etc.)
- Remove cicustom parameter (no longer needed)
- Remove proxmox_host variable (no SSH uploads required)
- Remove .gitignore entry for .generated/ directory

Benefits:
- No SSH access to Proxmox required
- All cloud-init config managed in Terraform
- Consistent approach for all VMs
- Cleaner state management

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
7fe0aa0f54 test: add testvm01 for pipeline testing 2026-02-01 17:41:04 +01:00
83de9a3ffb pipeline: add testing improvements for branch-based workflows
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Implement dual improvements to enable efficient testing of pipeline changes
without polluting master branch:

1. Add --force flag to create-host script
   - Skip hostname/IP uniqueness validation
   - Overwrite existing host configurations
   - Update entries in flake.nix and terraform/vms.tf (no duplicates)
   - Useful for iterating on configurations during testing

2. Add branch support to bootstrap mechanism
   - Bootstrap service reads NIXOS_FLAKE_BRANCH environment variable
   - Defaults to master if not set
   - Uses branch in git URL via ?ref= parameter
   - Service loads environment from /etc/environment

3. Add cloud-init disk support for branch configuration
   - VMs can specify flake_branch field in terraform/vms.tf
   - Automatically generates cloud-init snippet setting NIXOS_FLAKE_BRANCH
   - Uploads snippet to Proxmox via SSH
   - Production VMs omit flake_branch and use master

4. Update documentation
   - Document --force flag usage in create-host README
   - Add branch testing examples in terraform README
   - Update TODO.md with testing workflow
   - Add .generated/ to gitignore

Testing workflow: Create feature branch, set flake_branch in VM definition,
deploy with terraform, iterate with --force flag, clean up before merging.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 16:34:28 +01:00
15 changed files with 658 additions and 74 deletions

308
TODO.md
View File

@@ -54,6 +54,7 @@ Automate the entire process of creating, configuring, and deploying new NixOS ho
**Status:** ✅ Fully implemented and tested
**Completed:** 2025-02-01
**Enhanced:** 2025-02-01 (added --force flag)
**Goal:** Automate creation of host configuration files
@@ -64,6 +65,7 @@ Automate the entire process of creating, configuring, and deploying new NixOS ho
- Comprehensive validation (hostname format/uniqueness, IP subnet/uniqueness)
- Jinja2 templates for NixOS configurations
- Automatic updates to flake.nix and terraform/vms.tf
- `--force` flag for regenerating existing configurations (useful for testing)
**Tasks:**
- [x] Create Python CLI with typer framework
@@ -109,6 +111,7 @@ create-host \
**Status:** ✅ Fully implemented and tested
**Completed:** 2025-02-01
**Enhanced:** 2025-02-01 (added branch support for testing)
**Goal:** Get freshly deployed VM to apply its specific host configuration
@@ -118,7 +121,8 @@ create-host \
- Systemd service `nixos-bootstrap.service` runs on first boot
- Depends on `cloud-config.service` to ensure hostname is set
- Reads hostname from `hostnamectl` (set by cloud-init via Terraform)
- Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git#${hostname}`
- Supports custom git branch via `NIXOS_FLAKE_BRANCH` environment variable
- Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#${hostname}`
- Reboots into new configuration on success
- Fails gracefully without reboot on errors (network issues, missing config)
- Service self-destructs after successful bootstrap (not in new config)
@@ -149,9 +153,9 @@ create-host \
---
### Phase 4: Secrets Management Automation
### Phase 4: Secrets Management with HashiCorp Vault
**Challenge:** sops needs age key, but age key is generated on first boot
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
**Current workflow:**
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
@@ -160,27 +164,213 @@ create-host \
4. User commits, pushes
5. VM can now decrypt secrets
**Proposed solution:**
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
**Option A: Pre-generate age keys**
- [ ] Generate age key pair during `create-host-config.sh`
- [ ] Add public key to `.sops.yaml` immediately
- [ ] Store private key temporarily (secure location)
- [ ] Inject private key via cloud-init write_files or Terraform file provisioner
- [ ] VM uses pre-configured key from first boot
**Benefits:**
- Industry-standard secrets management (Vault experience transferable to work)
- Eliminates manual age key distribution step
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
- Automatic secret rotation capabilities
- Audit logging for all secret access
- AppRole authentication enables automated bootstrap
**Option B: Post-deployment secret injection**
- [ ] VM boots with template, generates its own key
- [ ] Fetch public key via SSH after first boot
- [ ] Automatically add to `.sops.yaml` and commit
- [ ] Trigger rebuild on VM to pick up secrets access
**Architecture:**
```
vault.home.2rjus.net
├─ KV Secrets Engine (replaces sops-nix)
├─ PKI Engine (replaces step-ca for TLS)
├─ SSH CA Engine (replaces step-ca SSH CA)
└─ AppRole Auth (per-host authentication)
New hosts authenticate on first boot
Fetch secrets via Vault API
No manual key distribution needed
```
**Option C: Separate secrets from initial deployment**
- [ ] Initial deployment works without secrets
- [ ] After VM is running, user manually adds age key
- [ ] Subsequent auto-upgrades pick up secrets
---
**Decision needed:** Option A is most automated, but requires secure key handling
#### Phase 4a: Vault Server Setup
**Goal:** Deploy and configure Vault server with auto-unseal
**Tasks:**
- [ ] Create `hosts/vault01/` configuration
- [ ] Basic NixOS configuration (hostname, networking, etc.)
- [ ] Vault service configuration
- [ ] Firewall rules (8200 for API, 8201 for cluster)
- [ ] Add to flake.nix and terraform
- [ ] Implement auto-unseal mechanism
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
- [ ] Use tpm2-tools to seal/unseal Vault keys
- [ ] Systemd service to unseal on boot
- [ ] **Fallback:** Shamir secret sharing with systemd automation
- [ ] Generate 3 keys, threshold 2
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
- [ ] Systemd service auto-unseals using 2 keys
- [ ] Initial Vault setup
- [ ] Initialize Vault
- [ ] Configure storage backend (integrated raft or file)
- [ ] Set up root token management
- [ ] Enable audit logging
- [ ] Deploy to infrastructure
- [ ] Add DNS entry for vault.home.2rjus.net
- [ ] Deploy VM via terraform
- [ ] Bootstrap and verify Vault is running
**Deliverable:** Running Vault server that auto-unseals on boot
---
#### Phase 4b: Vault-as-Code with OpenTofu
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
**Tasks:**
- [ ] Set up Vault Terraform provider
- [ ] Create `terraform/vault/` directory
- [ ] Configure Vault provider (address, auth)
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
- [ ] Enable and configure secrets engines
- [ ] Enable KV v2 secrets engine at `secret/`
- [ ] Define secret path structure (per-service, per-host)
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
- [ ] Define policies as code
- [ ] Create policies for different service tiers
- [ ] Principle of least privilege (hosts only read their secrets)
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
- [ ] Set up AppRole authentication
- [ ] Enable AppRole auth backend
- [ ] Create role per host type (monitoring, dns, database, etc.)
- [ ] Bind policies to roles
- [ ] Configure TTL and token policies
- [ ] Migrate existing secrets from sops-nix
- [ ] Create migration script/playbook
- [ ] Decrypt sops secrets and load into Vault KV
- [ ] Verify all secrets migrated successfully
- [ ] Keep sops as backup during transition
- [ ] Implement secrets-as-code patterns
- [ ] Secret values in gitignored terraform.tfvars
- [ ] Or use random_password for auto-generated secrets
- [ ] Secret structure/paths in version-controlled .tf files
**Example OpenTofu:**
```hcl
resource "vault_kv_secret_v2" "monitoring_grafana" {
mount = "secret"
name = "monitoring/grafana"
data_json = jsonencode({
admin_password = var.grafana_admin_password
smtp_password = var.smtp_password
})
}
resource "vault_policy" "monitoring" {
name = "monitoring-policy"
policy = <<EOT
path "secret/data/monitoring/*" {
capabilities = ["read"]
}
EOT
}
resource "vault_approle_auth_backend_role" "monitoring01" {
backend = "approle"
role_name = "monitoring01"
token_policies = ["monitoring-policy"]
}
```
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
---
#### Phase 4c: PKI Migration (Replace step-ca)
**Goal:** Consolidate PKI infrastructure into Vault
**Tasks:**
- [ ] Set up Vault PKI engines
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
- [ ] Sign intermediate with root CA
- [ ] Configure CRL and OCSP
- [ ] Enable ACME support
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
- [ ] Create PKI role for homelab domain
- [ ] Set certificate TTLs and allowed domains
- [ ] Configure SSH CA in Vault
- [ ] Enable SSH secrets engine (`ssh/` mount)
- [ ] Generate SSH signing keys
- [ ] Create roles for host and user certificates
- [ ] Configure TTLs and allowed principals
- [ ] Migrate hosts from step-ca to Vault
- [ ] Update system/acme.nix to use Vault ACME endpoint
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test certificate issuance on one host
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Migrate SSH CA trust
- [ ] Distribute Vault SSH CA public key to all hosts
- [ ] Update sshd_config to trust Vault CA
- [ ] Test SSH certificate authentication
- [ ] Decommission step-ca
- [ ] Verify all services migrated
- [ ] Stop step-ca service on ca host
- [ ] Archive step-ca configuration for backup
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
---
#### Phase 4d: Bootstrap Integration
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
**Tasks:**
- [ ] Update create-host tool
- [ ] Generate AppRole role_id + secret_id for new host
- [ ] Or create wrapped token for one-time bootstrap
- [ ] Add host-specific policy to Vault (via terraform)
- [ ] Store bootstrap credentials for cloud-init injection
- [ ] Update template2 for Vault authentication
- [ ] Create Vault authentication module
- [ ] Reads bootstrap credentials from cloud-init
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
- [ ] Stores role_id + secret_id locally for services to use
- [ ] Create NixOS Vault secrets module
- [ ] Replacement for sops.secrets
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
- [ ] Or runtime secret fetching for services
- [ ] Handle Vault token renewal
- [ ] Update bootstrap service
- [ ] After authenticating to Vault, fetch any bootstrap secrets
- [ ] Run nixos-rebuild with host configuration
- [ ] Services automatically fetch their secrets from Vault
- [ ] Update terraform cloud-init
- [ ] Inject Vault address and bootstrap credentials
- [ ] Pass via cloud-init user-data or write_files
- [ ] Credentials scoped to single use or short TTL
- [ ] Test complete flow
- [ ] Run create-host to generate new host config
- [ ] Deploy with terraform
- [ ] Verify host bootstraps and authenticates to Vault
- [ ] Verify services can fetch secrets
- [ ] Confirm no manual steps required
**Bootstrap flow:**
```
1. terraform apply (deploys VM with cloud-init)
2. Cloud-init sets hostname + Vault bootstrap credentials
3. nixos-bootstrap.service runs:
- Authenticates to Vault with bootstrap credentials
- Retrieves permanent AppRole credentials
- Stores locally for service use
- Runs nixos-rebuild
4. Host services fetch secrets from Vault as needed
5. Done - no manual intervention
```
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
---
@@ -240,10 +430,80 @@ Since most hosts use static IPs defined in their NixOS configurations, we can ex
### Phase 7: Testing & Documentation
**Tasks:**
- [ ] Test full pipeline end-to-end
- [ ] Create test host and verify all steps
- [ ] Document the new workflow in CLAUDE.md
**Status:** 🚧 In Progress (testing improvements completed)
**Testing Improvements Implemented (2025-02-01):**
The pipeline now supports efficient testing without polluting master branch:
**1. --force Flag for create-host**
- Re-run `create-host` to regenerate existing configurations
- Updates existing entries in flake.nix and terraform/vms.tf (no duplicates)
- Skip uniqueness validation checks
- Useful for iterating on configuration templates during testing
**2. Branch Support for Bootstrap**
- Bootstrap service reads `NIXOS_FLAKE_BRANCH` environment variable
- Defaults to `master` if not set
- Allows testing pipeline changes on feature branches
- Cloud-init passes branch via `/etc/environment`
**3. Cloud-init Disk for Branch Configuration**
- Terraform generates custom cloud-init snippets for test VMs
- Set `flake_branch` field in VM definition to use non-master branch
- Production VMs omit this field and use master (default)
- Files automatically uploaded to Proxmox via SSH
**Testing Workflow:**
```bash
# 1. Create test branch
git checkout -b test-pipeline
# 2. Generate or update host config
create-host --hostname testvm01 --ip 10.69.13.100/24
# 3. Edit terraform/vms.tf to add test VM with branch
# vms = {
# "testvm01" = {
# ip = "10.69.13.100/24"
# flake_branch = "test-pipeline" # Bootstrap from this branch
# }
# }
# 4. Commit and push test branch
git add -A && git commit -m "test: add testvm01"
git push origin test-pipeline
# 5. Deploy VM
cd terraform && tofu apply
# 6. Watch bootstrap (VM fetches from test-pipeline branch)
ssh root@10.69.13.100
journalctl -fu nixos-bootstrap.service
# 7. Iterate: modify templates and regenerate with --force
cd .. && create-host --hostname testvm01 --ip 10.69.13.100/24 --force
git commit -am "test: update config" && git push
# Redeploy to test fresh bootstrap
cd terraform
tofu destroy -target=proxmox_vm_qemu.vm[\"testvm01\"] && tofu apply
# 8. Clean up when done: squash commits, merge to master, remove test VM
```
**Files:**
- `scripts/create-host/create_host.py` - Added --force parameter
- `scripts/create-host/manipulators.py` - Update vs insert logic
- `hosts/template2/bootstrap.nix` - Branch support via environment variable
- `terraform/vms.tf` - flake_branch field support
- `terraform/cloud-init.tf` - Custom cloud-init disk generation
- `terraform/variables.tf` - proxmox_host variable for SSH uploads
**Remaining Tasks:**
- [ ] Test full pipeline end-to-end on feature branch
- [ ] Update CLAUDE.md with testing workflow
- [ ] Add troubleshooting section
- [ ] Create examples for common scenarios (DHCP host, static IP host, etc.)

View File

@@ -334,6 +334,38 @@
sops-nix.nixosModules.sops
];
};
testvm01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = [
(
{ config, pkgs, ... }:
{
nixpkgs.overlays = commonOverlays;
}
)
./hosts/testvm01
sops-nix.nixosModules.sops
];
};
vault01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = [
(
{ config, pkgs, ... }:
{
nixpkgs.overlays = commonOverlays;
}
)
./hosts/vault01
sops-nix.nixosModules.sops
];
};
};
packages = forAllSystems (
{ pkgs }:

View File

@@ -24,8 +24,12 @@ let
echo "Network connectivity confirmed"
echo "Fetching and building NixOS configuration from flake..."
# Read git branch from environment, default to master
BRANCH="''${NIXOS_FLAKE_BRANCH:-master}"
echo "Using git branch: $BRANCH"
# Build and activate the host-specific configuration
FLAKE_URL="git+https://git.t-juice.club/torjus/nixos-servers.git#''${HOSTNAME}"
FLAKE_URL="git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#''${HOSTNAME}"
if nixos-rebuild boot --flake "$FLAKE_URL"; then
echo "Successfully built configuration for $HOSTNAME"
@@ -58,6 +62,9 @@ in
RemainAfterExit = true;
ExecStart = "${bootstrap-script}/bin/nixos-bootstrap";
# Read environment variables from /etc/environment (set by cloud-init)
EnvironmentFile = "-/etc/environment";
# Logging to journald
StandardOutput = "journal+console";
StandardError = "journal+console";

View File

@@ -0,0 +1,61 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "testvm01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = false;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.101/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -0,0 +1,63 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
../../services/vault
];
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "vault01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.19/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -50,6 +50,23 @@ python -m scripts.create_host.create_host create \
--dry-run
```
### Force Mode (Regenerate Existing Configuration)
Overwrite an existing host configuration (useful for testing):
```bash
python -m scripts.create_host.create_host create \
--hostname test01 \
--ip 10.69.13.50/24 \
--force
```
This mode:
- Skips hostname and IP uniqueness validation
- Overwrites files in `hosts/<hostname>/`
- Updates existing entries in `flake.nix` and `terraform/vms.tf` (doesn't duplicate)
- Useful for iterating on configuration templates during testing
### Options
- `--hostname` (required): Hostname for the new host
@@ -73,6 +90,10 @@ python -m scripts.create_host.create_host create \
- `--dry-run` (flag): Preview changes without creating files
- `--force` (flag): Overwrite existing host configuration
- Skips uniqueness validation
- Updates existing entries instead of creating duplicates
## What It Does
The tool performs the following actions:

View File

@@ -45,6 +45,7 @@ def main(
memory: int = typer.Option(2048, "--memory", help="Memory in MB"),
disk: str = typer.Option("20G", "--disk", help="Disk size (e.g., 20G, 50G, 100G)"),
dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes without creating files"),
force: bool = typer.Option(False, "--force", help="Overwrite existing host configuration"),
) -> None:
"""
Create a new NixOS host configuration.
@@ -75,11 +76,20 @@ def main(
config.validate()
validate_hostname_format(hostname)
validate_hostname_unique(hostname, repo_root)
# Skip uniqueness checks in force mode
if not force:
validate_hostname_unique(hostname, repo_root)
if ip:
validate_ip_unique(ip, repo_root)
else:
# Check if we're actually overwriting something
host_dir = repo_root / "hosts" / hostname
if host_dir.exists():
console.print(f"[yellow]⚠[/yellow] Updating existing host configuration for {hostname}")
if ip:
validate_ip_subnet(ip)
validate_ip_unique(ip, repo_root)
console.print("[green]✓[/green] All validations passed\n")
@@ -96,13 +106,14 @@ def main(
console.print("\n[bold blue]Generating host configuration...[/bold blue]")
generate_host_files(config, repo_root)
console.print(f"[green]✓[/green] Created hosts/{hostname}/default.nix")
console.print(f"[green]✓[/green] Created hosts/{hostname}/configuration.nix")
action = "Updated" if force else "Created"
console.print(f"[green]✓[/green] {action} hosts/{hostname}/default.nix")
console.print(f"[green]✓[/green] {action} hosts/{hostname}/configuration.nix")
update_flake_nix(config, repo_root)
update_flake_nix(config, repo_root, force=force)
console.print("[green]✓[/green] Updated flake.nix")
update_terraform_vms(config, repo_root)
update_terraform_vms(config, repo_root, force=force)
console.print("[green]✓[/green] Updated terraform/vms.tf")
# Success message

View File

@@ -6,21 +6,18 @@ from pathlib import Path
from models import HostConfig
def update_flake_nix(config: HostConfig, repo_root: Path) -> None:
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
"""
Add new host entry to flake.nix nixosConfigurations.
Add or update host entry in flake.nix nixosConfigurations.
Args:
config: Host configuration
repo_root: Path to repository root
force: If True, replace existing entry; if False, insert new entry
"""
flake_path = repo_root / "flake.nix"
content = flake_path.read_text()
# Find the closing of nixosConfigurations block
# Pattern: " };\n packages ="
pattern = r"( \};)\n( packages =)"
# Create new entry
new_entry = f""" {config.hostname} = nixpkgs.lib.nixosSystem {{
inherit system;
@@ -40,35 +37,47 @@ def update_flake_nix(config: HostConfig, repo_root: Path) -> None:
}};
"""
# Insert new entry before closing brace
replacement = rf"\g<1>\n{new_entry}\g<2>"
# Check if hostname already exists
hostname_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem"
existing_match = re.search(hostname_pattern, content, re.MULTILINE)
new_content, count = re.subn(pattern, replacement, content)
if existing_match and force:
# Replace existing entry
# Match the entire block from "hostname = " to "};"
replace_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
new_content, count = re.subn(replace_pattern, new_entry, content, flags=re.MULTILINE | re.DOTALL)
if count == 0:
raise ValueError(
"Could not find insertion point in flake.nix. "
"Looking for pattern: ' };\\n devShells ='"
)
if count == 0:
raise ValueError(f"Could not find existing entry for {config.hostname} in flake.nix")
else:
# Insert new entry before closing brace of nixosConfigurations
# Pattern: " };\n packages = forAllSystems"
pattern = r"( \};)\n( packages = forAllSystems)"
replacement = rf"{new_entry}\g<1>\n\g<2>"
new_content, count = re.subn(pattern, replacement, content)
if count == 0:
raise ValueError(
"Could not find insertion point in flake.nix. "
"Looking for pattern: ' };\\n packages = forAllSystems'"
)
flake_path.write_text(new_content)
def update_terraform_vms(config: HostConfig, repo_root: Path) -> None:
def update_terraform_vms(config: HostConfig, repo_root: Path, force: bool = False) -> None:
"""
Add new VM entry to terraform/vms.tf locals.vms map.
Add or update VM entry in terraform/vms.tf locals.vms map.
Args:
config: Host configuration
repo_root: Path to repository root
force: If True, replace existing entry; if False, insert new entry
"""
terraform_path = repo_root / "terraform" / "vms.tf"
content = terraform_path.read_text()
# Find the closing of locals.vms block
# Pattern: " }\n\n # Compute VM configurations"
pattern = r"( \})\n\n( # Compute VM configurations)"
# Create new entry based on whether we have static IP or DHCP
if config.is_static_ip:
new_entry = f''' "{config.hostname}" = {{
@@ -86,15 +95,30 @@ def update_terraform_vms(config: HostConfig, repo_root: Path) -> None:
}}
'''
# Insert new entry before closing brace
replacement = rf"{new_entry}\g<1>\n\n\g<2>"
# Check if hostname already exists
hostname_pattern = rf'^\s+"{re.escape(config.hostname)}" = \{{'
existing_match = re.search(hostname_pattern, content, re.MULTILINE)
new_content, count = re.subn(pattern, replacement, content)
if existing_match and force:
# Replace existing entry
# Match the entire block from "hostname" = { to }
replace_pattern = rf'^\s+"{re.escape(config.hostname)}" = \{{.*?^\s+\}}\n'
new_content, count = re.subn(replace_pattern, new_entry, content, flags=re.MULTILINE | re.DOTALL)
if count == 0:
raise ValueError(
"Could not find insertion point in terraform/vms.tf. "
"Looking for pattern: ' }\\n\\n # Compute VM configurations'"
)
if count == 0:
raise ValueError(f"Could not find existing entry for {config.hostname} in terraform/vms.tf")
else:
# Insert new entry before closing brace
# Pattern: " }\n\n # Compute VM configurations"
pattern = r"( \})\n\n( # Compute VM configurations)"
replacement = rf"{new_entry}\g<1>\n\n\g<2>"
new_content, count = re.subn(pattern, replacement, content)
if count == 0:
raise ValueError(
"Could not find insertion point in terraform/vms.tf. "
"Looking for pattern: ' }\\n\\n # Compute VM configurations'"
)
terraform_path.write_text(new_content)

View File

@@ -7,22 +7,21 @@
{
imports = [
../template/hardware-configuration.nix
../template2/hardware-configuration.nix
../../system
../../common/vm
];
nixpkgs.config.allowUnfree = true;
# Use the systemd-boot EFI boot loader.
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/sda";
boot.loader.grub.device = "/dev/vda";
networking.hostName = "{{ hostname }}";
networking.domain = "{{ domain }}";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = false;
services.resolved.enable = true;
networking.nameservers = [
{% for ns in nameservers %}
"{{ ns }}"

View File

@@ -0,0 +1,8 @@
{ ... }:
{
services.vault = {
enable = true;
storageBackend = "file";
};
}

View File

@@ -87,6 +87,21 @@ vms = {
}
```
### Example: Test VM with Custom Git Branch
For testing pipeline changes without polluting master:
```hcl
vms = {
"test-vm" = {
ip = "10.69.13.100/24"
flake_branch = "test-pipeline" # Bootstrap from this branch
}
}
```
This VM will bootstrap from the `test-pipeline` branch instead of `master`. Production VMs should omit the `flake_branch` field.
## Configuration Options
Each VM in the `vms` map supports the following fields (all optional):
@@ -98,6 +113,7 @@ Each VM in the `vms` map supports the following fields (all optional):
| `cpu_cores` | Number of CPU cores | `2` |
| `memory` | Memory in MB | `2048` |
| `disk_size` | Disk size (e.g., "20G", "100G") | `"20G"` |
| `flake_branch` | Git branch for bootstrap (for testing, omit for production) | `master` |
| `target_node` | Proxmox node to deploy to | `"pve1"` |
| `template_name` | Template VM to clone from | `"nixos-25.11.20260128.fa83fd8"` |
| `storage` | Storage backend | `"local-zfs"` |
@@ -182,6 +198,7 @@ deployment_summary = {
- `main.tf` - Provider configuration
- `variables.tf` - Variable definitions and defaults
- `vms.tf` - VM definitions and deployment logic
- `cloud-init.tf` - Cloud-init disk management (SSH keys, networking, branch config)
- `outputs.tf` - Output definitions for deployed VMs
- `terraform.tfvars.example` - Example credentials file
- `terraform.tfvars` - Your actual credentials (gitignored)

58
terraform/cloud-init.tf Normal file
View File

@@ -0,0 +1,58 @@
# Cloud-init configuration for all VMs
#
# This file manages cloud-init disks for all VMs using the proxmox_cloud_init_disk resource.
# VMs with flake_branch set will include NIXOS_FLAKE_BRANCH environment variable.
resource "proxmox_cloud_init_disk" "ci" {
for_each = local.vm_configs
name = each.key
pve_node = each.value.target_node
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets
# User data includes SSH keys and optionally NIXOS_FLAKE_BRANCH
user_data = <<-EOT
#cloud-config
ssh_authorized_keys:
- ${each.value.ssh_public_key}
${each.value.flake_branch != null ? <<-BRANCH
write_files:
- path: /etc/environment
content: |
NIXOS_FLAKE_BRANCH=${each.value.flake_branch}
append: true
BRANCH
: ""}
EOT
# Network configuration - static IP or DHCP
network_config = each.value.ip != null ? yamlencode({
version = 1
config = [{
type = "physical"
name = "ens18"
subnets = [{
type = "static"
address = each.value.ip
gateway = each.value.gateway
dns_nameservers = split(" ", each.value.nameservers)
dns_search = [each.value.search_domain]
}]
}]
}) : yamlencode({
version = 1
config = [{
type = "physical"
name = "ens18"
subnets = [{
type = "dhcp"
}]
}]
})
# Instance metadata
meta_data = yamlencode({
instance_id = sha1(each.key)
local-hostname = each.key
})
}

View File

@@ -22,9 +22,28 @@ locals {
# disk_size = "50G"
# }
# Example Test VM with custom git branch (for testing pipeline changes):
# "test-vm" = {
# ip = "10.69.13.100/24"
# flake_branch = "test-pipeline" # Bootstrap from this branch instead of master
# }
# Example Minimal VM using all defaults (uncomment to deploy):
# "minimal-vm" = {}
# "bootstrap-verify-test" = {}
"testvm01" = {
ip = "10.69.13.101/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "pipeline-testing-improvements"
}
"vault01" = {
ip = "10.69.13.19/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
}
}
# Compute VM configurations with defaults applied
@@ -44,6 +63,8 @@ locals {
# Network configuration - detect DHCP vs static
ip = lookup(vm, "ip", null)
gateway = lookup(vm, "gateway", var.default_gateway)
# Branch configuration for bootstrap (optional, uses master if not set)
flake_branch = lookup(vm, "flake_branch", null)
}
}
}
@@ -89,8 +110,9 @@ resource "proxmox_vm_qemu" "vm" {
}
ide {
ide2 {
cloudinit {
storage = each.value.storage
# Reference the custom cloud-init disk created in cloud-init.tf
cdrom {
iso = proxmox_cloud_init_disk.ci[each.key].id
}
}
}
@@ -102,15 +124,6 @@ resource "proxmox_vm_qemu" "vm" {
# Agent
agent = 1
# Cloud-init configuration
ciuser = "root"
sshkeys = each.value.ssh_public_key
nameserver = each.value.nameservers
searchdomain = each.value.search_domain
# Network configuration - DHCP or static IP
ipconfig0 = each.value.ip != null ? "ip=${each.value.ip},gw=${each.value.gateway}" : "ip=dhcp"
# Skip IPv6 since we don't use it
skip_ipv6 = true