All secrets are now managed by OpenBao (Vault). Remove the legacy
sops-nix infrastructure that is no longer in use.
Removed:
- sops-nix flake input
- system/sops.nix module
- .sops.yaml configuration file
- Age key generation from template prepare-host scripts
Updated:
- flake.nix - removed sops-nix references from all hosts
- flake.lock - removed sops-nix input
- scripts/create-host/ - removed sops references
- CLAUDE.md - removed SOPS documentation
Note: secrets/ directory should be manually removed by the user.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove the step-ca host and labmon flake input now that ACME has been
migrated to OpenBao PKI.
Removed:
- hosts/ca/ - step-ca host configuration
- services/ca/ - step-ca service module
- labmon flake input and module (no longer used)
Updated:
- flake.nix - removed ca host and labmon references
- flake.lock - removed labmon input
- rebuild-all.sh - removed ca from host list
- CLAUDE.md - updated documentation
Note: secrets/ca/ should be manually removed by the user.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set up a simple nginx server with an ACME certificate from the new
OpenBao PKI infrastructure. This allows testing the ACME migration
before deploying to production hosts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Switch all ACME certificate issuance from step-ca (ca.home.2rjus.net)
to OpenBao PKI (vault.home.2rjus.net:8200/v1/pki_int/acme/directory).
- Update default ACME server in system/acme.nix
- Update Caddy acme_ca in http-proxy and nix-cache services
- Remove labmon service from monitoring01 (step-ca monitoring)
- Remove labmon scrape target and certificate_rules alerts
- Remove alloy.nix (only used for labmon profiling)
- Add docs/plans/cert-monitoring.md for future cert monitoring needs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable vault.enable and homelab.deploy.enable on vault01 so it can
receive NATS-based remote deployments. Vault fetches secrets from
itself using AppRole after auto-unseal.
Add systemd ordering to ensure vault-secret services wait for openbao
to be unsealed before attempting to fetch secrets.
Also adds vault01 AppRole entry to Terraform.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable vault.enable and homelab.deploy.enable for these hosts to
allow NATS-based remote deployments and expose metrics on port 9972.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds log_to_loki function that pushes structured log entries to Loki
at key bootstrap stages (starting, network_ok, vault_*, building,
success, failed). Enables querying bootstrap state via LogQL without
console access.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TTY output was causing nixos-rebuild to fail. Keep the custom
greeting line to indicate bootstrap image, but use journal+console
for reliable logging.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Display bootstrap banner and live progress on tty1 instead of login prompt
- Add custom getty greeting on other ttys indicating this is a bootstrap image
- Disable getty on tty1 during bootstrap so output is visible
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add vault.enable = true to testvm01, testvm02, testvm03
- Add homelab.deploy.enable = true for remote deployment via NATS
- Update create-host template to include these by default
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three permanent test hosts for validating deployment and bootstrapping
workflow. Each host configured with:
- Static IP (10.69.13.20-22/24)
- Vault AppRole integration
- Bootstrap from deploy-test-hosts branch
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add homelab.deploy.enable option (requires vault.enable)
- Create shared homelab-deploy Vault policy for all hosts
- Enable homelab.deploy on all vault-enabled hosts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add system/homelab-deploy.nix module that automatically enables the
listener on all hosts with vault.enable=true. Uses homelab.host.tier
and homelab.host.role for NATS subject subscriptions.
- Add homelab-deploy access to all host AppRole policies
- Remove manual listener config from vaulttest01 (now handled by system module)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ensure homelab-deploy-listener waits for the NKey secret to be
fetched from Vault before starting.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add homelab-deploy flake input and NixOS module for message-based
deployments across the fleet. Configure DEPLOY account in NATS with
tiered access control (listener, test-deployer, admin-deployer).
Enable listener on vaulttest01 as initial test host.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a shared `homelab.host` module that provides host metadata for
multiple consumers:
- tier: deployment tier (test/prod) for future homelab-deploy service
- priority: alerting priority (high/low) for Prometheus label filtering
- role: primary role of the host (dns, database, monitoring, etc.)
- labels: free-form labels for additional metadata
Host configurations updated with appropriate values:
- ns1, ns2: role=dns with dns_role labels
- nix-cache01: priority=low, role=build-host
- vault01: role=vault
- jump: role=bastion
- template, template2, testvm01, vaulttest01: tier=test, priority=low
The module is now imported via commonModules in flake.nix, making it
available to all hosts including minimal configurations like template2.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Backups to the shared restic repository were all scheduled at exactly
midnight, causing lock conflicts. Adding RandomizedDelaySec spreads
them out over a 2-hour window to prevent simultaneous access.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Convert remaining writeShellScript usages to writeShellApplication for
shellcheck validation and strict bash options.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove auth01 host configuration and associated services in preparation
for new auth stack with different provisioning system.
Removed:
- hosts/auth01/ - host configuration
- services/authelia/ - authelia service module
- services/lldap/ - lldap service module
- secrets/auth01/ - sops secrets
- Reverse proxy entries for auth and lldap
- Monitoring alert rules for authelia and lldap
- SOPS configuration for auth01
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace sops-nix secrets with OpenBao vault secrets across all hosts.
Hardcode root password hash, add extractKey option to vault-secrets
module, update Terraform with secrets/policies for all hosts, and
create AppRole provisioning playbook.
Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01
Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add homelab.monitoring NixOS options (enable, scrapeTargets) following
the same pattern as homelab.dns. Prometheus scrape configs are now
auto-generated from flake host configurations and external targets,
replacing hardcoded target lists.
Also cleans up alert rules: snake_case naming, fix zigbee2mqtt typo,
remove duplicate pushgateway alert, add for clauses to monitoring_rules,
remove hardcoded WireGuard public key, and add new alerts for
certificates, proxmox, caddy, smartctl temperature, filesystem
prediction, systemd state, file descriptors, and host reboots.
Fixes grafana scrape target port from 3100 to 3000.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace static zone file with dynamically generated records:
- Add homelab.dns module with enable/cnames options
- Extract IPs from systemd.network configs (filters VPN interfaces)
- Use git commit timestamp as zone serial number
- Move external hosts to separate external-hosts.nix
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace custom backup-helper flake input with NixOS native
services.restic.backups module for ha1, monitoring01, and nixos-test1.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement dual improvements to enable efficient testing of pipeline changes
without polluting master branch:
1. Add --force flag to create-host script
- Skip hostname/IP uniqueness validation
- Overwrite existing host configurations
- Update entries in flake.nix and terraform/vms.tf (no duplicates)
- Useful for iterating on configurations during testing
2. Add branch support to bootstrap mechanism
- Bootstrap service reads NIXOS_FLAKE_BRANCH environment variable
- Defaults to master if not set
- Uses branch in git URL via ?ref= parameter
- Service loads environment from /etc/environment
3. Add cloud-init disk support for branch configuration
- VMs can specify flake_branch field in terraform/vms.tf
- Automatically generates cloud-init snippet setting NIXOS_FLAKE_BRANCH
- Uploads snippet to Proxmox via SSH
- Production VMs omit flake_branch and use master
4. Update documentation
- Document --force flag usage in create-host README
- Add branch testing examples in terraform README
- Update TODO.md with testing workflow
- Add .generated/ to gitignore
Testing workflow: Create feature branch, set flake_branch in VM definition,
deploy with terraform, iterate with --force flag, clean up before merging.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add filesystem configuration matching Proxmox image builder output
to allow template2 to build with both `nixos-rebuild build` and
`nixos-rebuild build-image --image-variant proxmox`.
Filesystem specs discovered from running VM:
- ext4 filesystem with label "nixos"
- x-systemd.growfs option for automatic partition growth
- No swap partition
Using lib.mkDefault ensures these definitions work for normal builds
while allowing the Proxmox image builder to override when needed.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add systemd service that automatically bootstraps freshly deployed VMs
with their host-specific NixOS configuration from the flake repository.
Changes:
- hosts/template2/bootstrap.nix: New systemd oneshot service that:
- Runs after cloud-init completes (ensures hostname is set)
- Reads hostname from hostnamectl (set by cloud-init from Terraform)
- Checks network connectivity via HTTPS (curl)
- Runs nixos-rebuild boot with flake URL
- Reboots on success, fails gracefully with clear errors on failure
- hosts/template2/configuration.nix: Configure cloud-init datasource
- Changed from NoCloud to ConfigDrive (used by Proxmox)
- Allows cloud-init to receive config from Proxmox
- hosts/template2/default.nix: Import bootstrap.nix module
- terraform/vms.tf: Add cloud-init disk to VMs
- Configure disks.ide.ide2.cloudinit block
- Removed invalid cloudinit_cdrom_storage parameter
- Enables Proxmox to inject cloud-init configuration
- TODO.md: Mark Phase 3 as completed
This eliminates the manual nixos-rebuild step from the deployment workflow.
VMs now automatically pull and apply their configuration on first boot.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add automated workflow for building and deploying NixOS VMs on Proxmox including template2 host configuration, Ansible playbook for image building/deployment, and OpenTofu configuration for VM provisioning with cloud-init.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>