1 Commits

Author SHA1 Message Date
74071887ad test: add testvm01 for pipeline testing
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m44s
2026-02-01 16:38:13 +01:00
13 changed files with 105 additions and 371 deletions

1
.gitignore vendored
View File

@@ -10,3 +10,4 @@ terraform/terraform.tfvars
terraform/*.auto.tfvars terraform/*.auto.tfvars
terraform/crash.log terraform/crash.log
terraform/crash.*.log terraform/crash.*.log
terraform/.generated/

224
TODO.md
View File

@@ -153,9 +153,9 @@ create-host \
--- ---
### Phase 4: Secrets Management with HashiCorp Vault ### Phase 4: Secrets Management Automation
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys **Challenge:** sops needs age key, but age key is generated on first boot
**Current workflow:** **Current workflow:**
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt` 1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
@@ -164,213 +164,27 @@ create-host \
4. User commits, pushes 4. User commits, pushes
5. VM can now decrypt secrets 5. VM can now decrypt secrets
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management **Proposed solution:**
**Benefits:** **Option A: Pre-generate age keys**
- Industry-standard secrets management (Vault experience transferable to work) - [ ] Generate age key pair during `create-host-config.sh`
- Eliminates manual age key distribution step - [ ] Add public key to `.sops.yaml` immediately
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned) - [ ] Store private key temporarily (secure location)
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA) - [ ] Inject private key via cloud-init write_files or Terraform file provisioner
- Automatic secret rotation capabilities - [ ] VM uses pre-configured key from first boot
- Audit logging for all secret access
- AppRole authentication enables automated bootstrap
**Architecture:** **Option B: Post-deployment secret injection**
``` - [ ] VM boots with template, generates its own key
vault.home.2rjus.net - [ ] Fetch public key via SSH after first boot
├─ KV Secrets Engine (replaces sops-nix) - [ ] Automatically add to `.sops.yaml` and commit
├─ PKI Engine (replaces step-ca for TLS) - [ ] Trigger rebuild on VM to pick up secrets access
├─ SSH CA Engine (replaces step-ca SSH CA)
└─ AppRole Auth (per-host authentication)
New hosts authenticate on first boot
Fetch secrets via Vault API
No manual key distribution needed
```
--- **Option C: Separate secrets from initial deployment**
- [ ] Initial deployment works without secrets
- [ ] After VM is running, user manually adds age key
- [ ] Subsequent auto-upgrades pick up secrets
#### Phase 4a: Vault Server Setup **Decision needed:** Option A is most automated, but requires secure key handling
**Goal:** Deploy and configure Vault server with auto-unseal
**Tasks:**
- [ ] Create `hosts/vault01/` configuration
- [ ] Basic NixOS configuration (hostname, networking, etc.)
- [ ] Vault service configuration
- [ ] Firewall rules (8200 for API, 8201 for cluster)
- [ ] Add to flake.nix and terraform
- [ ] Implement auto-unseal mechanism
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
- [ ] Use tpm2-tools to seal/unseal Vault keys
- [ ] Systemd service to unseal on boot
- [ ] **Fallback:** Shamir secret sharing with systemd automation
- [ ] Generate 3 keys, threshold 2
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
- [ ] Systemd service auto-unseals using 2 keys
- [ ] Initial Vault setup
- [ ] Initialize Vault
- [ ] Configure storage backend (integrated raft or file)
- [ ] Set up root token management
- [ ] Enable audit logging
- [ ] Deploy to infrastructure
- [ ] Add DNS entry for vault.home.2rjus.net
- [ ] Deploy VM via terraform
- [ ] Bootstrap and verify Vault is running
**Deliverable:** Running Vault server that auto-unseals on boot
---
#### Phase 4b: Vault-as-Code with OpenTofu
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
**Tasks:**
- [ ] Set up Vault Terraform provider
- [ ] Create `terraform/vault/` directory
- [ ] Configure Vault provider (address, auth)
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
- [ ] Enable and configure secrets engines
- [ ] Enable KV v2 secrets engine at `secret/`
- [ ] Define secret path structure (per-service, per-host)
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
- [ ] Define policies as code
- [ ] Create policies for different service tiers
- [ ] Principle of least privilege (hosts only read their secrets)
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
- [ ] Set up AppRole authentication
- [ ] Enable AppRole auth backend
- [ ] Create role per host type (monitoring, dns, database, etc.)
- [ ] Bind policies to roles
- [ ] Configure TTL and token policies
- [ ] Migrate existing secrets from sops-nix
- [ ] Create migration script/playbook
- [ ] Decrypt sops secrets and load into Vault KV
- [ ] Verify all secrets migrated successfully
- [ ] Keep sops as backup during transition
- [ ] Implement secrets-as-code patterns
- [ ] Secret values in gitignored terraform.tfvars
- [ ] Or use random_password for auto-generated secrets
- [ ] Secret structure/paths in version-controlled .tf files
**Example OpenTofu:**
```hcl
resource "vault_kv_secret_v2" "monitoring_grafana" {
mount = "secret"
name = "monitoring/grafana"
data_json = jsonencode({
admin_password = var.grafana_admin_password
smtp_password = var.smtp_password
})
}
resource "vault_policy" "monitoring" {
name = "monitoring-policy"
policy = <<EOT
path "secret/data/monitoring/*" {
capabilities = ["read"]
}
EOT
}
resource "vault_approle_auth_backend_role" "monitoring01" {
backend = "approle"
role_name = "monitoring01"
token_policies = ["monitoring-policy"]
}
```
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
---
#### Phase 4c: PKI Migration (Replace step-ca)
**Goal:** Consolidate PKI infrastructure into Vault
**Tasks:**
- [ ] Set up Vault PKI engines
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
- [ ] Sign intermediate with root CA
- [ ] Configure CRL and OCSP
- [ ] Enable ACME support
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
- [ ] Create PKI role for homelab domain
- [ ] Set certificate TTLs and allowed domains
- [ ] Configure SSH CA in Vault
- [ ] Enable SSH secrets engine (`ssh/` mount)
- [ ] Generate SSH signing keys
- [ ] Create roles for host and user certificates
- [ ] Configure TTLs and allowed principals
- [ ] Migrate hosts from step-ca to Vault
- [ ] Update system/acme.nix to use Vault ACME endpoint
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test certificate issuance on one host
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Migrate SSH CA trust
- [ ] Distribute Vault SSH CA public key to all hosts
- [ ] Update sshd_config to trust Vault CA
- [ ] Test SSH certificate authentication
- [ ] Decommission step-ca
- [ ] Verify all services migrated
- [ ] Stop step-ca service on ca host
- [ ] Archive step-ca configuration for backup
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
---
#### Phase 4d: Bootstrap Integration
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
**Tasks:**
- [ ] Update create-host tool
- [ ] Generate AppRole role_id + secret_id for new host
- [ ] Or create wrapped token for one-time bootstrap
- [ ] Add host-specific policy to Vault (via terraform)
- [ ] Store bootstrap credentials for cloud-init injection
- [ ] Update template2 for Vault authentication
- [ ] Create Vault authentication module
- [ ] Reads bootstrap credentials from cloud-init
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
- [ ] Stores role_id + secret_id locally for services to use
- [ ] Create NixOS Vault secrets module
- [ ] Replacement for sops.secrets
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
- [ ] Or runtime secret fetching for services
- [ ] Handle Vault token renewal
- [ ] Update bootstrap service
- [ ] After authenticating to Vault, fetch any bootstrap secrets
- [ ] Run nixos-rebuild with host configuration
- [ ] Services automatically fetch their secrets from Vault
- [ ] Update terraform cloud-init
- [ ] Inject Vault address and bootstrap credentials
- [ ] Pass via cloud-init user-data or write_files
- [ ] Credentials scoped to single use or short TTL
- [ ] Test complete flow
- [ ] Run create-host to generate new host config
- [ ] Deploy with terraform
- [ ] Verify host bootstraps and authenticates to Vault
- [ ] Verify services can fetch secrets
- [ ] Confirm no manual steps required
**Bootstrap flow:**
```
1. terraform apply (deploys VM with cloud-init)
2. Cloud-init sets hostname + Vault bootstrap credentials
3. nixos-bootstrap.service runs:
- Authenticates to Vault with bootstrap credentials
- Retrieves permanent AppRole credentials
- Stores locally for service use
- Runs nixos-rebuild
4. Host services fetch secrets from Vault as needed
5. Done - no manual intervention
```
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
--- ---

View File

@@ -334,6 +334,7 @@
sops-nix.nixosModules.sops sops-nix.nixosModules.sops
]; ];
}; };
};
testvm01 = nixpkgs.lib.nixosSystem { testvm01 = nixpkgs.lib.nixosSystem {
inherit system; inherit system;
specialArgs = { specialArgs = {
@@ -350,23 +351,6 @@
sops-nix.nixosModules.sops sops-nix.nixosModules.sops
]; ];
}; };
vault01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = [
(
{ config, pkgs, ... }:
{
nixpkgs.overlays = commonOverlays;
}
)
./hosts/vault01
sops-nix.nixosModules.sops
];
};
};
packages = forAllSystems ( packages = forAllSystems (
{ pkgs }: { pkgs }:
{ {

View File

@@ -7,15 +7,16 @@
{ {
imports = [ imports = [
../template2/hardware-configuration.nix ../template/hardware-configuration.nix
../../system ../../system
../../common/vm ../../common/vm
]; ];
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
# Use the systemd-boot EFI boot loader.
boot.loader.grub.enable = true; boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda"; boot.loader.grub.device = "/dev/sda";
networking.hostName = "testvm01"; networking.hostName = "testvm01";
networking.domain = "home.2rjus.net"; networking.domain = "home.2rjus.net";

View File

@@ -1,63 +0,0 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
../../services/vault
];
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "vault01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.19/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -1,5 +0,0 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -50,17 +50,17 @@ def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -
if count == 0: if count == 0:
raise ValueError(f"Could not find existing entry for {config.hostname} in flake.nix") raise ValueError(f"Could not find existing entry for {config.hostname} in flake.nix")
else: else:
# Insert new entry before closing brace of nixosConfigurations # Insert new entry before closing brace
# Pattern: " };\n packages = forAllSystems" # Pattern: " };\n packages ="
pattern = r"( \};)\n( packages = forAllSystems)" pattern = r"( \};)\n( packages =)"
replacement = rf"{new_entry}\g<1>\n\g<2>" replacement = rf"\g<1>\n{new_entry}\g<2>"
new_content, count = re.subn(pattern, replacement, content) new_content, count = re.subn(pattern, replacement, content)
if count == 0: if count == 0:
raise ValueError( raise ValueError(
"Could not find insertion point in flake.nix. " "Could not find insertion point in flake.nix. "
"Looking for pattern: ' };\\n packages = forAllSystems'" "Looking for pattern: ' };\\n packages ='"
) )
flake_path.write_text(new_content) flake_path.write_text(new_content)

View File

@@ -7,21 +7,22 @@
{ {
imports = [ imports = [
../template2/hardware-configuration.nix ../template/hardware-configuration.nix
../../system ../../system
../../common/vm ../../common/vm
]; ];
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
# Use the systemd-boot EFI boot loader.
boot.loader.grub.enable = true; boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda"; boot.loader.grub.device = "/dev/sda";
networking.hostName = "{{ hostname }}"; networking.hostName = "{{ hostname }}";
networking.domain = "{{ domain }}"; networking.domain = "{{ domain }}";
networking.useNetworkd = true; networking.useNetworkd = true;
networking.useDHCP = false; networking.useDHCP = false;
services.resolved.enable = true; services.resolved.enable = false;
networking.nameservers = [ networking.nameservers = [
{% for ns in nameservers %} {% for ns in nameservers %}
"{{ ns }}" "{{ ns }}"

View File

@@ -1,8 +0,0 @@
{ ... }:
{
services.vault = {
enable = true;
storageBackend = "file";
};
}

View File

@@ -198,10 +198,11 @@ deployment_summary = {
- `main.tf` - Provider configuration - `main.tf` - Provider configuration
- `variables.tf` - Variable definitions and defaults - `variables.tf` - Variable definitions and defaults
- `vms.tf` - VM definitions and deployment logic - `vms.tf` - VM definitions and deployment logic
- `cloud-init.tf` - Cloud-init disk management (SSH keys, networking, branch config) - `cloud-init.tf` - Custom cloud-init configuration for branch-specific bootstrap
- `outputs.tf` - Output definitions for deployed VMs - `outputs.tf` - Output definitions for deployed VMs
- `terraform.tfvars.example` - Example credentials file - `terraform.tfvars.example` - Example credentials file
- `terraform.tfvars` - Your actual credentials (gitignored) - `terraform.tfvars` - Your actual credentials (gitignored)
- `.generated/` - Auto-generated cloud-init files (gitignored)
- `vm.tf.old` - Archived single-VM configuration (reference) - `vm.tf.old` - Archived single-VM configuration (reference)
## Notes ## Notes

View File

@@ -1,58 +1,55 @@
# Cloud-init configuration for all VMs # Cloud-init configuration for branch-specific bootstrap
# #
# This file manages cloud-init disks for all VMs using the proxmox_cloud_init_disk resource. # This file manages custom cloud-init snippets for VMs that need to bootstrap
# VMs with flake_branch set will include NIXOS_FLAKE_BRANCH environment variable. # from a specific git branch (non-master). Production VMs omit flake_branch
# and use the default master branch.
resource "proxmox_cloud_init_disk" "ci" { # Generate cloud-init snippets for VMs with custom branch configuration
for_each = local.vm_configs resource "local_file" "cloud_init_branch" {
for_each = {
for name, vm in local.vm_configs : name => vm
if vm.flake_branch != null
}
name = each.key filename = "${path.module}/.generated/cloud-init-${each.key}.yml"
pve_node = each.value.target_node content = yamlencode({
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets # Write NIXOS_FLAKE_BRANCH to /etc/environment
# This will be read by bootstrap.nix service via EnvironmentFile
# User data includes SSH keys and optionally NIXOS_FLAKE_BRANCH write_files = [{
user_data = <<-EOT path = "/etc/environment"
#cloud-config content = "NIXOS_FLAKE_BRANCH=${each.value.flake_branch}\n"
ssh_authorized_keys: append = true
- ${each.value.ssh_public_key}
${each.value.flake_branch != null ? <<-BRANCH
write_files:
- path: /etc/environment
content: |
NIXOS_FLAKE_BRANCH=${each.value.flake_branch}
append: true
BRANCH
: ""}
EOT
# Network configuration - static IP or DHCP
network_config = each.value.ip != null ? yamlencode({
version = 1
config = [{
type = "physical"
name = "ens18"
subnets = [{
type = "static"
address = each.value.ip
gateway = each.value.gateway
dns_nameservers = split(" ", each.value.nameservers)
dns_search = [each.value.search_domain]
}]
}]
}) : yamlencode({
version = 1
config = [{
type = "physical"
name = "ens18"
subnets = [{
type = "dhcp"
}]
}] }]
}) })
# Instance metadata file_permission = "0644"
meta_data = yamlencode({
instance_id = sha1(each.key)
local-hostname = each.key
})
} }
# Upload cloud-init snippets to Proxmox
# Note: This requires SSH access to the Proxmox host
# Alternative: Manually copy files or use Proxmox API if available
resource "null_resource" "upload_cloud_init" {
for_each = {
for name, vm in local.vm_configs : name => vm
if vm.flake_branch != null
}
# Trigger re-upload when content changes
triggers = {
content_hash = local_file.cloud_init_branch[each.key].content
}
# Upload the cloud-init file to Proxmox snippets directory
provisioner "local-exec" {
command = <<-EOT
scp -o StrictHostKeyChecking=no \
${local_file.cloud_init_branch[each.key].filename} \
${var.proxmox_host}:/var/lib/vz/snippets/cloud-init-${each.key}.yml
EOT
}
depends_on = [local_file.cloud_init_branch]
}
# Ensure VMs depend on cloud-init being uploaded
# This is handled implicitly by the cicustom reference in vms.tf

View File

@@ -21,6 +21,12 @@ variable "proxmox_tls_insecure" {
default = true default = true
} }
variable "proxmox_host" {
description = "Proxmox host for SSH access (used to upload cloud-init snippets)"
type = string
default = "pve1.home.2rjus.net"
}
# Default values for VM configurations # Default values for VM configurations
# These can be overridden per-VM in vms.tf # These can be overridden per-VM in vms.tf

View File

@@ -32,17 +32,11 @@ locals {
# "minimal-vm" = {} # "minimal-vm" = {}
# "bootstrap-verify-test" = {} # "bootstrap-verify-test" = {}
"testvm01" = { "testvm01" = {
ip = "10.69.13.101/24" ip = "10.69.13.101/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "pipeline-testing-improvements"
}
"vault01" = {
ip = "10.69.13.19/24"
cpu_cores = 2 cpu_cores = 2
memory = 2048 memory = 2048
disk_size = "20G" disk_size = "20G"
flake_branch = "pipeline-testing-improvements"
} }
} }
@@ -110,9 +104,8 @@ resource "proxmox_vm_qemu" "vm" {
} }
ide { ide {
ide2 { ide2 {
# Reference the custom cloud-init disk created in cloud-init.tf cloudinit {
cdrom { storage = each.value.storage
iso = proxmox_cloud_init_disk.ci[each.key].id
} }
} }
} }
@@ -124,6 +117,18 @@ resource "proxmox_vm_qemu" "vm" {
# Agent # Agent
agent = 1 agent = 1
# Cloud-init configuration
ciuser = "root"
sshkeys = each.value.ssh_public_key
nameserver = each.value.nameservers
searchdomain = each.value.search_domain
# Network configuration - DHCP or static IP
ipconfig0 = each.value.ip != null ? "ip=${each.value.ip},gw=${each.value.gateway}" : "ip=dhcp"
# Custom cloud-init disk for branch configuration (if flake_branch is set)
cicustom = each.value.flake_branch != null ? "user=${each.value.storage}:snippets/cloud-init-${each.key}.yml" : null
# Skip IPv6 since we don't use it # Skip IPv6 since we don't use it
skip_ipv6 = true skip_ipv6 = true