- ns2 migrated to OpenTofu
- testvm02, testvm03 added to managed hosts
- Remove vaulttest01 (no longer exists)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure Unbound to query both ns1 and ns2 for the home.2rjus.net
zone, in addition to local NSD. This provides redundancy during
bootstrap or if local NSD is temporarily unavailable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bootstrap times can be improved by configuring the base template
to use the local nix cache during initial builds.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove hosts/template/ (legacy template1) and give each legacy host
its own hardware-configuration.nix copy
- Recreate ns2 using create-host with template2 base
- Add secondary DNS services (NSD + Unbound resolver)
- Configure Vault policy for shared DNS secrets
- Fix create-host IP uniqueness validator to check CIDR notation
(prevents false positives from DNS resolver entries)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove secrets/ directory (sops-nix no longer in use, all hosts use Vault)
- Move TODO.md to docs/plans/completed/automated-host-deployment-pipeline.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All secrets are now managed by OpenBao (Vault). Remove the legacy
sops-nix infrastructure that is no longer in use.
Removed:
- sops-nix flake input
- system/sops.nix module
- .sops.yaml configuration file
- Age key generation from template prepare-host scripts
Updated:
- flake.nix - removed sops-nix references from all hosts
- flake.lock - removed sops-nix input
- scripts/create-host/ - removed sops references
- CLAUDE.md - removed SOPS documentation
Note: secrets/ directory should be manually removed by the user.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove the step-ca host and labmon flake input now that ACME has been
migrated to OpenBao PKI.
Removed:
- hosts/ca/ - step-ca host configuration
- services/ca/ - step-ca service module
- labmon flake input and module (no longer used)
Updated:
- flake.nix - removed ca host and labmon references
- flake.lock - removed labmon input
- rebuild-all.sh - removed ca from host list
- CLAUDE.md - updated documentation
Note: secrets/ca/ should be manually removed by the user.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add note that gh pr create is not supported
- Remove labmon from Prometheus job names list
- Remove labmon from flake inputs list
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set up a simple nginx server with an ACME certificate from the new
OpenBao PKI infrastructure. This allows testing the ACME migration
before deploying to production hosts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Switch all ACME certificate issuance from step-ca (ca.home.2rjus.net)
to OpenBao PKI (vault.home.2rjus.net:8200/v1/pki_int/acme/directory).
- Update default ACME server in system/acme.nix
- Update Caddy acme_ca in http-proxy and nix-cache services
- Remove labmon service from monitoring01 (step-ca monitoring)
- Remove labmon scrape target and certificate_rules alerts
- Remove alloy.nix (only used for labmon profiling)
- Add docs/plans/cert-monitoring.md for future cert monitoring needs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable vault.enable and homelab.deploy.enable on vault01 so it can
receive NATS-based remote deployments. Vault fetches secrets from
itself using AppRole after auto-unseal.
Add systemd ordering to ensure vault-secret services wait for openbao
to be unsealed before attempting to fetch secrets.
Also adds vault01 AppRole entry to Terraform.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable vault.enable and homelab.deploy.enable for these hosts to
allow NATS-based remote deployments and expose metrics on port 9972.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add instructions for deploying to prod hosts using the CLI directly,
since the MCP server only handles test-tier deployments.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the new hostname and host metadata labels available on all
Prometheus scrape targets:
- hostname: short hostname for easy filtering
- role: host role (dns, build-host, vault)
- tier: deployment tier (test for test VMs)
- dns_role: primary/secondary for DNS servers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a `hostname` label to all Prometheus scrape targets, making it easy
to query all metrics for a host without wildcarding the instance label.
Example queries:
- {hostname="ns1"} - all metrics from ns1
- node_cpu_seconds_total{hostname="monitoring01"} - specific metric
For external targets (like gunter), the hostname is extracted from the
target string.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract homelab.host metadata (tier, priority, role, labels) from host
configurations and propagate them to Prometheus scrape targets. This
enables semantic alert filtering using labels instead of hardcoded
instance names.
Changes:
- lib/monitoring.nix: Extract host metadata, group targets by labels
- prometheus.nix: Use structured static_configs with labels
- rules.yml: Replace instance filters with role-based filters
Example labels in Prometheus:
- ns1/ns2: role=dns, dns_role=primary/secondary
- nix-cache01: role=build-host
- testvm*: tier=test
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move nats-deploy-service.md to completed/ folder
- Update prometheus-scrape-target-labels.md with implementation status
- Add status table showing which steps are complete/partial/not started
- Update cross-references to point to new location
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds log_to_loki function that pushes structured log entries to Loki
at key bootstrap stages (starting, network_ok, vault_*, building,
success, failed). Enables querying bootstrap state via LogQL without
console access.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TTY output was causing nixos-rebuild to fail. Keep the custom
greeting line to indicate bootstrap image, but use journal+console
for reliable logging.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a third play to build-and-deploy-template.yml that updates
terraform/variables.tf with the new template name after deploying
to Proxmox. Only updates if the template name has changed.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Display bootstrap banner and live progress on tty1 instead of login prompt
- Add custom getty greeting on other ttys indicating this is a bootstrap image
- Disable getty on tty1 during bootstrap so output is visible
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes false "Some deployments failed" warning in MCP server when
deployments are still in progress.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add three permanent test hosts (testvm01, testvm02, testvm03) with:
- Static IPs: 10.69.13.20-22
- Vault AppRole integration with homelab-deploy policy
- Remote deployment via NATS (homelab.deploy.enable)
- Test tier configuration
Also updates create-host template to include vault.enable and
homelab.deploy.enable by default.
VMs are now bootstrapped and running. Remove temporary flake_branch
and vault_wrapped_token settings so they use master going forward.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The homelab-deploy listener requires access to shared/homelab-deploy/*
secrets. Update hosts-generated.tf and the generator script to include
this policy automatically.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add vault.enable = true to testvm01, testvm02, testvm03
- Add homelab.deploy.enable = true for remote deployment via NATS
- Update create-host template to include these by default
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three permanent test hosts for validating deployment and bootstrapping
workflow. Each host configured with:
- Static IP (10.69.13.20-22/24)
- Vault AppRole integration
- Bootstrap from deploy-test-hosts branch
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove host entries from terraform/vault/approle.tf on --remove
- Detect and warn about secrets in terraform/vault/secrets.tf
- Include vault kv delete commands in removal instructions
- Update check_entries_exist to return approle status
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>