nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	e9857afc11	monitoring: use AppRole token for OpenBao metrics scraping All checks were successful Run nix flake check / flake-check (push) Successful in 2m12s Details Run nix flake check / flake-check (pull_request) Successful in 2m19s Details Instead of creating a long-lived Vault token in Terraform (which gets invalidated when Terraform recreates it), monitoring01 now uses its existing AppRole credentials to fetch a fresh token for Prometheus. Changes: - Add prometheus-metrics policy to monitoring01's AppRole - Remove vault_token.prometheus_metrics resource from Terraform - Remove openbao-token KV secret from Terraform - Add systemd service to fetch AppRole token on boot - Add systemd timer to refresh token every 30 minutes This ensures Prometheus always has a valid token without depending on Terraform state or manual intervention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 23:51:11 +01:00
Torjus Håkestad	3cccfc0487	monitoring: implement monitoring gaps coverage Some checks failed Run nix flake check / flake-check (push) Failing after 7m36s Details Add exporters and scrape targets for services lacking monitoring: - PostgreSQL: postgres-exporter on pgdb1 - Authelia: native telemetry metrics on auth01 - Unbound: unbound-exporter with remote-control on ns1/ns2 - NATS: HTTP monitoring endpoint on nats1 - OpenBao: telemetry config and Prometheus scrape with token auth - Systemd: systemd-exporter on all hosts for per-service metrics Add alert rules for postgres, auth (authelia + lldap), jellyfin, vault (openbao), plus extend existing nats and unbound rules. Add Terraform config for Prometheus metrics policy and token. The token is created via vault_token resource and stored in KV, so no manual token creation is needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 21:44:13 +01:00
Torjus Håkestad	ccb1c3fe2e	terraform: auto-generate backup password instead of manual All checks were successful Run nix flake check / flake-check (push) Successful in 2m19s Details Remove backup_helper_secret variable and switch shared/backup/password to auto_generate. New password will be added alongside existing restic repository key. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 18:58:39 +01:00
Torjus Håkestad	0700033c0a	secrets: migrate all hosts from sops to OpenBao vault Replace sops-nix secrets with OpenBao vault secrets across all hosts. Hardcode root password hash, add extractKey option to vault-secrets module, update Terraform with secrets/policies for all hosts, and create AppRole provisioning playbook. Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01 Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 18:43:09 +01:00
Torjus Håkestad	7ae474fd3e	pki: add new vault root ca to pki	2026-02-03 06:53:59 +01:00
Torjus Håkestad	01d4812280	vault: implement bootstrap integration Some checks failed Run nix flake check / flake-check (push) Successful in 2m31s Details Run nix flake check / flake-check (pull_request) Failing after 14m16s Details	2026-02-03 01:10:36 +01:00
Torjus Håkestad	3f2f91aedd	terraform: add vault pki management to terraform Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details	2026-02-01 23:23:03 +01:00
Torjus Håkestad	5d513fd5af	terraform: add vault secret managment to terraform	2026-02-01 23:07:47 +01:00
Torjus Håkestad	b6f1e80c2a	chore: run tofu fmt	2026-02-01 23:04:02 +01:00
Torjus Håkestad	ace848b29c	vault: replace vault with openbao	2026-02-01 22:16:52 +01:00
Torjus Håkestad	ab053c25bd	opentofu: add tmp device to vms	2026-02-01 20:54:05 +01:00
Torjus Håkestad	6d64e53586	hosts: add vault01 host All checks were successful Run nix flake check / flake-check (push) Successful in 2m20s Details	2026-02-01 20:08:48 +01:00
Torjus Håkestad	cec496dda7	terraform: use local storage for cloud-init disks Fix error "500 can't upload to storage type 'zfspool'" by using "local" storage pool for cloud-init disks instead of the VM's storage pool. Cloud-init disks require storage that supports ISO/snippet content types, which zfspool does not. The "local" storage pool (directory-based) supports this content type. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-01 17:41:04 +01:00
Torjus Håkestad	fca50562c3	terraform: fix cloud-init conditional type inconsistency Fix OpenTofu error where static IP and DHCP branches had different object structures in the subnets array. Move conditional to network_config level so both branches return complete, consistent yamlencode() results. Error was: "The true and false result expressions must have consistent types" Solution: Make network_config itself conditional rather than the subnets array, ensuring both branches return the same type (string from yamlencode). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-01 17:41:04 +01:00
Torjus Håkestad	1f1829dc2f	docs: update terraform README for cloud-init refactoring Remove mention of .generated/ directory and clarify that cloud-init.tf manages all cloud-init disks, not just branch-specific ones. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-01 17:41:04 +01:00
Torjus Håkestad	21a32e0521	terraform: refactor cloud-init to use proxmox_cloud_init_disk resource Replace SSH upload approach with native proxmox_cloud_init_disk resource for cleaner, more maintainable cloud-init management. Changes: - Use proxmox_cloud_init_disk for all VMs (not just branch-specific ones) - Include SSH keys, network config, and metadata in cloud-init disk - Conditionally include NIXOS_FLAKE_BRANCH for VMs with flake_branch set - Replace ide2 cloudinit disk with cdrom reference to cloud-init disk - Remove built-in cloud-init parameters (ciuser, sshkeys, etc.) - Remove cicustom parameter (no longer needed) - Remove proxmox_host variable (no SSH uploads required) - Remove .gitignore entry for .generated/ directory Benefits: - No SSH access to Proxmox required - All cloud-init config managed in Terraform - Consistent approach for all VMs - Cleaner state management Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-01 17:41:04 +01:00
Torjus Håkestad	7fe0aa0f54	test: add testvm01 for pipeline testing	2026-02-01 17:41:04 +01:00
Torjus Håkestad	83de9a3ffb	pipeline: add testing improvements for branch-based workflows Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Implement dual improvements to enable efficient testing of pipeline changes without polluting master branch: 1. Add --force flag to create-host script - Skip hostname/IP uniqueness validation - Overwrite existing host configurations - Update entries in flake.nix and terraform/vms.tf (no duplicates) - Useful for iterating on configurations during testing 2. Add branch support to bootstrap mechanism - Bootstrap service reads NIXOS_FLAKE_BRANCH environment variable - Defaults to master if not set - Uses branch in git URL via ?ref= parameter - Service loads environment from /etc/environment 3. Add cloud-init disk support for branch configuration - VMs can specify flake_branch field in terraform/vms.tf - Automatically generates cloud-init snippet setting NIXOS_FLAKE_BRANCH - Uploads snippet to Proxmox via SSH - Production VMs omit flake_branch and use master 4. Update documentation - Document --force flag usage in create-host README - Add branch testing examples in terraform README - Update TODO.md with testing workflow - Add .generated/ to gitignore Testing workflow: Create feature branch, set flake_branch in VM definition, deploy with terraform, iterate with --force flag, clean up before merging. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-01 16:34:28 +01:00
Torjus Håkestad	6f7aee3444	bootstrap: implement automated VM bootstrap mechanism for Phase 3 Some checks failed Run nix flake check / flake-check (pull_request) Failing after 1m20s Details Run nix flake check / flake-check (push) Failing after 1m54s Details Add systemd service that automatically bootstraps freshly deployed VMs with their host-specific NixOS configuration from the flake repository. Changes: - hosts/template2/bootstrap.nix: New systemd oneshot service that: - Runs after cloud-init completes (ensures hostname is set) - Reads hostname from hostnamectl (set by cloud-init from Terraform) - Checks network connectivity via HTTPS (curl) - Runs nixos-rebuild boot with flake URL - Reboots on success, fails gracefully with clear errors on failure - hosts/template2/configuration.nix: Configure cloud-init datasource - Changed from NoCloud to ConfigDrive (used by Proxmox) - Allows cloud-init to receive config from Proxmox - hosts/template2/default.nix: Import bootstrap.nix module - terraform/vms.tf: Add cloud-init disk to VMs - Configure disks.ide.ide2.cloudinit block - Removed invalid cloudinit_cdrom_storage parameter - Enables Proxmox to inject cloud-init configuration - TODO.md: Mark Phase 3 as completed This eliminates the manual nixos-rebuild step from the deployment workflow. VMs now automatically pull and apply their configuration on first boot. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-01 10:38:35 +01:00
Torjus Håkestad	7aa5137039	terraform: add parameterized multi-VM deployment system Some checks failed Run nix flake check / flake-check (push) Failing after 1m52s Details Run nix flake check / flake-check (pull_request) Failing after 1m24s Details Implements Phase 1 of the OpenTofu deployment plan: - Replace single-VM configuration with locals-based for_each pattern - Support multiple VMs in single deployment - Automatic DHCP vs static IP detection - Configurable defaults with per-VM overrides - Dynamic outputs for VM IPs and specifications New files: - outputs.tf: Dynamic outputs for deployed VMs - vms.tf: VM definitions using locals.vms map Updated files: - variables.tf: Added default variables for VM configuration - README.md: Comprehensive documentation and examples Removed files: - vm.tf: Replaced by new vms.tf (archived as vm.tf.old, then removed) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-31 23:30:00 +01:00
Torjus Håkestad	3a464bc323	proxmox: add VM automation with OpenTofu and Ansible Add automated workflow for building and deploying NixOS VMs on Proxmox including template2 host configuration, Ansible playbook for image building/deployment, and OpenTofu configuration for VM provisioning with cloud-init. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-31 21:54:08 +01:00

21 Commits