Commit Graph

82 Commits

Author SHA1 Message Date
eee3dde04f restic: add randomized delay to backup timers
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Backups to the shared restic repository were all scheduled at exactly
midnight, causing lock conflicts. Adding RandomizedDelaySec spreads
them out over a 2-hour window to prevent simultaneous access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 01:09:38 +01:00
bbb22e588e system: replace writeShellScript with writeShellApplication
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m3s
Run nix flake check / flake-check (push) Failing after 5m57s
Convert remaining writeShellScript usages to writeShellApplication for
shellcheck validation and strict bash options.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:17:24 +01:00
879e7aba60 templates: use writeShellApplication for prepare-host script
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:14:05 +01:00
59e1962d75 auth01: decommission host and remove authelia/lldap services
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m5s
Run nix flake check / flake-check (push) Failing after 18m1s
Remove auth01 host configuration and associated services in preparation
for new auth stack with different provisioning system.

Removed:
- hosts/auth01/ - host configuration
- services/authelia/ - authelia service module
- services/lldap/ - lldap service module
- secrets/auth01/ - sops secrets
- Reverse proxy entries for auth and lldap
- Monitoring alert rules for authelia and lldap
- SOPS configuration for auth01

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 23:35:45 +01:00
0700033c0a secrets: migrate all hosts from sops to OpenBao vault
Replace sops-nix secrets with OpenBao vault secrets across all hosts.
Hardcode root password hash, add extractKey option to vault-secrets
module, update Terraform with secrets/policies for all hosts, and
create AppRole provisioning playbook.

Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01
Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 18:43:09 +01:00
0ef63ad874 hosts: remove decommissioned media1, ns3, ns4, nixos-test1
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m47s
Run nix flake check / flake-check (pull_request) Successful in 3m20s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:36:57 +01:00
dd1b64de27 monitoring: auto-generate Prometheus scrape targets from host configs
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m49s
Run nix flake check / flake-check (push) Has been cancelled
Add homelab.monitoring NixOS options (enable, scrapeTargets) following
the same pattern as homelab.dns. Prometheus scrape configs are now
auto-generated from flake host configurations and external targets,
replacing hardcoded target lists.

Also cleans up alert rules: snake_case naming, fix zigbee2mqtt typo,
remove duplicate pushgateway alert, add for clauses to monitoring_rules,
remove hardcoded WireGuard public key, and add new alerts for
certificates, proxmox, caddy, smartctl temperature, filesystem
prediction, systemd state, file descriptors, and host reboots.

Fixes grafana scrape target port from 3100 to 3000.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:49:07 +01:00
cee1b264cd dns: auto-generate zone entries from host configurations
Replace static zone file with dynamically generated records:
- Add homelab.dns module with enable/cnames options
- Extract IPs from systemd.network configs (filters VPN interfaces)
- Use git commit timestamp as zone serial number
- Move external hosts to separate external-hosts.nix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:43:44 +01:00
d25fc99e1d backup: migrate to native services.restic.backups
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Run nix flake check / flake-check (pull_request) Successful in 4m0s
Replace custom backup-helper flake input with NixOS native
services.restic.backups module for ha1, monitoring01, and nixos-test1.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:41:40 +01:00
7ae474fd3e pki: add new vault root ca to pki 2026-02-03 06:53:59 +01:00
01d4812280 vault: implement bootstrap integration
Some checks failed
Run nix flake check / flake-check (push) Successful in 2m31s
Run nix flake check / flake-check (pull_request) Failing after 14m16s
2026-02-03 01:10:36 +01:00
4afb37d730 create-host: enable resolved in configuration.nix.j2
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m15s
2026-02-01 20:37:36 +01:00
a2c798bc30 vault: add minimal vault config
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2026-02-01 20:27:02 +01:00
6d64e53586 hosts: add vault01 host
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
2026-02-01 20:08:48 +01:00
7fe0aa0f54 test: add testvm01 for pipeline testing 2026-02-01 17:41:04 +01:00
83de9a3ffb pipeline: add testing improvements for branch-based workflows
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Implement dual improvements to enable efficient testing of pipeline changes
without polluting master branch:

1. Add --force flag to create-host script
   - Skip hostname/IP uniqueness validation
   - Overwrite existing host configurations
   - Update entries in flake.nix and terraform/vms.tf (no duplicates)
   - Useful for iterating on configurations during testing

2. Add branch support to bootstrap mechanism
   - Bootstrap service reads NIXOS_FLAKE_BRANCH environment variable
   - Defaults to master if not set
   - Uses branch in git URL via ?ref= parameter
   - Service loads environment from /etc/environment

3. Add cloud-init disk support for branch configuration
   - VMs can specify flake_branch field in terraform/vms.tf
   - Automatically generates cloud-init snippet setting NIXOS_FLAKE_BRANCH
   - Uploads snippet to Proxmox via SSH
   - Production VMs omit flake_branch and use master

4. Update documentation
   - Document --force flag usage in create-host README
   - Add branch testing examples in terraform README
   - Update TODO.md with testing workflow
   - Add .generated/ to gitignore

Testing workflow: Create feature branch, set flake_branch in VM definition,
deploy with terraform, iterate with --force flag, clean up before merging.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 16:34:28 +01:00
2aeed8f231 template2: add filesystem definitions to support normal builds
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m17s
Run nix flake check / flake-check (push) Failing after 16m59s
Add filesystem configuration matching Proxmox image builder output
to allow template2 to build with both `nixos-rebuild build` and
`nixos-rebuild build-image --image-variant proxmox`.

Filesystem specs discovered from running VM:
- ext4 filesystem with label "nixos"
- x-systemd.growfs option for automatic partition growth
- No swap partition

Using lib.mkDefault ensures these definitions work for normal builds
while allowing the Proxmox image builder to override when needed.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 11:17:48 +01:00
6f7aee3444 bootstrap: implement automated VM bootstrap mechanism for Phase 3
Some checks failed
Run nix flake check / flake-check (pull_request) Failing after 1m20s
Run nix flake check / flake-check (push) Failing after 1m54s
Add systemd service that automatically bootstraps freshly deployed VMs
with their host-specific NixOS configuration from the flake repository.

Changes:
- hosts/template2/bootstrap.nix: New systemd oneshot service that:
  - Runs after cloud-init completes (ensures hostname is set)
  - Reads hostname from hostnamectl (set by cloud-init from Terraform)
  - Checks network connectivity via HTTPS (curl)
  - Runs nixos-rebuild boot with flake URL
  - Reboots on success, fails gracefully with clear errors on failure

- hosts/template2/configuration.nix: Configure cloud-init datasource
  - Changed from NoCloud to ConfigDrive (used by Proxmox)
  - Allows cloud-init to receive config from Proxmox

- hosts/template2/default.nix: Import bootstrap.nix module

- terraform/vms.tf: Add cloud-init disk to VMs
  - Configure disks.ide.ide2.cloudinit block
  - Removed invalid cloudinit_cdrom_storage parameter
  - Enables Proxmox to inject cloud-init configuration

- TODO.md: Mark Phase 3 as completed

This eliminates the manual nixos-rebuild step from the deployment workflow.
VMs now automatically pull and apply their configuration on first boot.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 10:38:35 +01:00
3a464bc323 proxmox: add VM automation with OpenTofu and Ansible
Add automated workflow for building and deploying NixOS VMs on Proxmox including template2 host configuration, Ansible playbook for image building/deployment, and OpenTofu configuration for VM provisioning with cloud-init.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 21:54:08 +01:00
04f89fbda2 media1: renamed vaapi driver
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m1s
Periodic flake update / flake-update (push) Successful in 1m10s
2025-12-06 15:24:14 +01:00
a0e94430b4 nix-cache01: add actions runner
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-08-21 20:56:04 +02:00
ccd9bbf4da Remove incus hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 14m57s
Periodic flake update / flake-update (push) Successful in 3m35s
2025-07-07 21:30:04 +02:00
b9102b5a44 Add zram for nix-cache
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m22s
Periodic flake update / flake-update (push) Successful in 2m4s
2025-05-27 21:28:09 +02:00
ebcdefd0ca Add alloy
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-24 12:40:39 +02:00
c32e288273 Add pyroscope to labmon cert monitoring
Some checks failed
Run nix flake check / flake-check (push) Failing after 10m30s
2025-05-24 12:05:14 +02:00
2a46da3761 Add labmon to scrape config
Some checks failed
Run nix flake check / flake-check (push) Failing after 14m32s
2025-05-24 03:37:52 +02:00
6fda081dc8 Add labmon to monitoring01
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-24 03:27:59 +02:00
38c2fbca2c Add useNetworkd to wireguard
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m43s
Periodic flake update / flake-update (push) Successful in 2m7s
2025-05-23 01:35:31 +02:00
e609fed855 Add zram to jelly01
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m10s
Periodic flake update / flake-update (push) Successful in 4m13s
2025-05-19 20:05:12 +02:00
bd58d07001 Monitor wireguard
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m32s
2025-05-18 00:59:55 +02:00
6243ac3754 Fix wg ip
Some checks failed
Run nix flake check / flake-check (push) Failing after 14m15s
Periodic flake update / flake-update (push) Successful in 4m6s
2025-05-15 21:44:05 +02:00
c1cd25e865 Set wg mtu
Some checks failed
Run nix flake check / flake-check (push) Failing after 9m24s
2025-05-15 21:29:56 +02:00
3c52b81d99 Add name and endpoint to wg config
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m47s
2025-05-15 21:20:09 +02:00
6b85e87506 Add TODO not about wireguard networkd
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-15 21:16:08 +02:00
f15c318558 Add wireguard to http proxy
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-15 21:11:46 +02:00
6759653491 Add authelia to auth01
Some checks failed
Run nix flake check / flake-check (push) Failing after 29s
Periodic flake update / flake-update (push) Successful in 4m37s
2025-04-01 23:52:24 +02:00
cba1821f3b Add lldap to auth01 host 2025-04-01 22:23:59 +02:00
78c36c5384 Correctly load ptp_kvm
Some checks failed
Run nix flake check / flake-check (push) Failing after 12m5s
Periodic flake update / flake-update (push) Successful in 3m51s
2025-03-16 08:33:29 +01:00
e279e7d940 Add ptp_kvm for vms
Some checks failed
Run nix flake check / flake-check (push) Failing after 9m10s
2025-03-16 08:22:07 +01:00
41d5df4d1a Further change kernel config for media1
Some checks failed
Run nix flake check / flake-check (push) Failing after 0s
Periodic flake update / flake-update (push) Successful in 1m19s
2025-03-08 17:22:07 +01:00
529d5ae0d9 Change media1 kernel stuff
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-03-08 17:19:59 +01:00
3f05a965e2 Enable crash dump for media1
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-02-13 19:22:18 +01:00
07c422498e Configure media1 host
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-02-13 16:00:52 +01:00
5b64f40412 Add media1 host
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-02-12 22:44:30 +01:00
c43e2aa063 Add nats server
Some checks failed
Run nix flake check / flake-check (push) Failing after 17m6s
Periodic flake update / flake-update (push) Successful in 1m28s
2025-02-08 00:26:53 +01:00
4af1bded61 Add backups for monitoring01
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m30s
2025-01-27 23:03:45 +01:00
a9eeb8ada6 Add postgres host
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m3s
2025-01-25 02:28:44 +01:00
83b2a4a2e8 Add initial media1 host 2025-01-24 23:31:52 +01:00
e70e892ab2 Add build-flakes script for nix-cache
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m20s
2025-01-24 01:12:18 +01:00
3960ec40b9 Move nix-cache01 /nix
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m42s
Periodic flake update / flake-update (push) Successful in 1m34s
2025-01-23 23:42:16 +01:00