Commit Graph

60 Commits

Author SHA1 Message Date
6e08ba9720 ansible: restructure with dynamic inventory from flake
- Move playbooks/ to ansible/playbooks/
- Add dynamic inventory script that extracts hosts from flake
  - Groups by tier (tier_test, tier_prod) and role (role_dns, etc.)
  - Reads homelab.host.* options for metadata
- Add static inventory for non-flake hosts (Proxmox)
- Add ansible.cfg with inventory path and SSH optimizations
- Add group_vars/all.yml for common variables
- Add restart-service.yml playbook for restarting systemd services
- Update provision-approle.yml with single-host safeguard
- Add ANSIBLE_CONFIG to devshell for automatic inventory discovery
- Add ansible = "false" label to template2 to exclude from inventory
- Update CLAUDE.md to reference ansible/README.md for details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 21:41:29 +01:00
60c04a2052 nixos-exporter: enable NATS cache sharing
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m17s
Run nix flake check / flake-check (push) Failing after 5m16s
When one host fetches the latest flake revision, it publishes to NATS
and all other hosts receive the update immediately. This reduces
redundant nix flake metadata calls across the fleet.

- Add nkeys to devshell for key generation
- Add nixos-exporter user to NATS HOMELAB account
- Add Vault secret for NKey storage
- Configure all hosts to use NATS for revision sharing
- Update nixos-exporter input to version with NATS support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 23:57:28 +01:00
0b977808ca hosts: add monitoring02 configuration
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
New test-tier host for monitoring stack expansion with:
- Static IP 10.69.13.24
- 4 CPU cores, 4GB RAM, 20GB disk
- Vault integration and NATS-based deployment enabled

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 19:19:38 +01:00
54b6e37420 flake: add kanidm to devshell
Add kanidm_1_8 CLI for administering the Kanidm server.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:12:19 +01:00
ca0e3fd629 kanidm01: add kanidm authentication server
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
- New test-tier VM at 10.69.13.23 with role=auth
- Kanidm 1.8 server with HTTPS (443) and LDAPS (636)
- ACME certificate from internal CA (auth.home.2rjus.net)
- Provisioned groups: admins, users, ssh-users
- Provisioned user: torjus
- Daily backups at 22:00 (7 versions)
- Prometheus monitoring scrape target

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 00:13:59 +01:00
94feae82a0 ns1: recreate with OpenTofu workflow
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Old VM had incorrect hardware-configuration.nix with hardcoded UUIDs
that didn't match actual disk layout, causing boot failure (emergency mode).

Recreated using template2-based configuration for OpenTofu provisioning.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 23:18:08 +01:00
8ec2a083bd pgdb1: decommission postgresql host
Remove pgdb1 host configuration and postgres service module.
The only consumer (Open WebUI on gunter) has migrated to local PostgreSQL.

Removed:
- hosts/pgdb1/ - host configuration
- services/postgres/ - service module (only used by pgdb1)
- postgres_rules from monitoring rules
- rebuild-all.sh (obsolete script)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 22:54:50 +01:00
536daee4c7 ns2: migrate to OpenTofu management
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
- Remove hosts/template/ (legacy template1) and give each legacy host
  its own hardware-configuration.nix copy
- Recreate ns2 using create-host with template2 base
- Add secondary DNS services (NSD + Unbound resolver)
- Configure Vault policy for shared DNS secrets
- Fix create-host IP uniqueness validator to check CIDR notation
  (prevents false positives from DNS resolver entries)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 19:28:35 +01:00
aedccbd9a0 flake: remove sops-nix (no longer used)
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
All secrets are now managed by OpenBao (Vault). Remove the legacy
sops-nix infrastructure that is no longer in use.

Removed:
- sops-nix flake input
- system/sops.nix module
- .sops.yaml configuration file
- Age key generation from template prepare-host scripts

Updated:
- flake.nix - removed sops-nix references from all hosts
- flake.lock - removed sops-nix input
- scripts/create-host/ - removed sops references
- CLAUDE.md - removed SOPS documentation

Note: secrets/ directory should be manually removed by the user.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:46:24 +01:00
bdc6057689 hosts: decommission ca host and remove labmon
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Remove the step-ca host and labmon flake input now that ACME has been
migrated to OpenBao PKI.

Removed:
- hosts/ca/ - step-ca host configuration
- services/ca/ - step-ca service module
- labmon flake input and module (no longer used)

Updated:
- flake.nix - removed ca host and labmon references
- flake.lock - removed labmon input
- rebuild-all.sh - removed ca from host list
- CLAUDE.md - updated documentation

Note: secrets/ca/ should be manually removed by the user.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:41:49 +01:00
7bc465b414 hosts: add testvm01, testvm02, testvm03 test hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Three permanent test hosts for validating deployment and bootstrapping
workflow. Each host configured with:
- Static IP (10.69.13.20-22/24)
- Vault AppRole integration
- Bootstrap from deploy-test-hosts branch

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 13:34:16 +01:00
8d7bc50108 hosts: remove testvm01
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Test host no longer needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 12:58:24 +01:00
03e70ac094 hosts: remove vaulttest01
Test host no longer needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 12:55:38 +01:00
13c3897e86 flake: update homelab-deploy, add to devShell
Update homelab-deploy to include bugfix. Add CLI to devShell for
easier testing and deployment operations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:54:42 +01:00
ad8570f8db homelab-deploy: add NATS-based deployment system
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m45s
Add homelab-deploy flake input and NixOS module for message-based
deployments across the fleet. Configure DEPLOY account in NATS with
tiered access control (listener, test-deployer, admin-deployer).
Enable listener on vaulttest01 as initial test host.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:22:06 +01:00
12bf0683f5 modules: add homelab.host for host metadata
Add a shared `homelab.host` module that provides host metadata for
multiple consumers:
- tier: deployment tier (test/prod) for future homelab-deploy service
- priority: alerting priority (high/low) for Prometheus label filtering
- role: primary role of the host (dns, database, monitoring, etc.)
- labels: free-form labels for additional metadata

Host configurations updated with appropriate values:
- ns1, ns2: role=dns with dns_role labels
- nix-cache01: priority=low, role=build-host
- vault01: role=vault
- jump: role=bastion
- template, template2, testvm01, vaulttest01: tier=test, priority=low

The module is now imported via commonModules in flake.nix, making it
available to all hosts including minimal configurations like template2.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 02:49:58 +01:00
2034004280 flake: update nixos-exporter and set configurationRevision
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m33s
- Update nixos-exporter to 0.2.3
- Set system.configurationRevision for all hosts so the exporter
  can report the flake's git revision

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 01:06:47 +01:00
97ff774d3f monitoring: add nixos-exporter to all hosts
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m16s
Run nix flake check / flake-check (pull_request) Successful in 3m14s
Add nixos-exporter prometheus exporter to track NixOS generation metrics
and flake revision status across all hosts.

Changes:
- Add nixos-exporter flake input
- Add commonModules list in flake.nix for modules shared by all hosts
- Enable nixos-exporter in system/monitoring/metrics.nix
- Configure Prometheus to scrape nixos-exporter on all hosts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:55:29 +01:00
59e1962d75 auth01: decommission host and remove authelia/lldap services
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m5s
Run nix flake check / flake-check (push) Failing after 18m1s
Remove auth01 host configuration and associated services in preparation
for new auth stack with different provisioning system.

Removed:
- hosts/auth01/ - host configuration
- services/authelia/ - authelia service module
- services/lldap/ - lldap service module
- secrets/auth01/ - sops secrets
- Reverse proxy entries for auth and lldap
- Monitoring alert rules for authelia and lldap
- SOPS configuration for auth01

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 23:35:45 +01:00
0ef63ad874 hosts: remove decommissioned media1, ns3, ns4, nixos-test1
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m47s
Run nix flake check / flake-check (pull_request) Successful in 3m20s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:36:57 +01:00
d25fc99e1d backup: migrate to native services.restic.backups
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Run nix flake check / flake-check (pull_request) Successful in 4m0s
Replace custom backup-helper flake input with NixOS native
services.restic.backups module for ha1, monitoring01, and nixos-test1.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:41:40 +01:00
01d4812280 vault: implement bootstrap integration
Some checks failed
Run nix flake check / flake-check (push) Successful in 2m31s
Run nix flake check / flake-check (pull_request) Failing after 14m16s
2026-02-03 01:10:36 +01:00
4133eafc4e flake: add openbao to devshell
Some checks failed
Run nix flake check / flake-check (push) Failing after 18m52s
2026-02-01 22:16:52 +01:00
6d64e53586 hosts: add vault01 host
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
2026-02-01 20:08:48 +01:00
9908286062 scripts: fix create-host flake.nix insertion point
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m12s
Run nix flake check / flake-check (push) Failing after 8m24s
Fix bug where new hosts were added outside of nixosConfigurations block
instead of inside it.

Issues fixed:
1. Pattern was looking for "packages =" but actual text is "packages = forAllSystems"
2. Replacement was putting new entry AFTER closing brace instead of BEFORE
3. testvm01 was at top-level flake output instead of in nixosConfigurations

Changes:
- Update pattern to match "packages = forAllSystems"
- Put new entry BEFORE the closing brace of nixosConfigurations
- Move testvm01 to correct location inside nixosConfigurations block

Result: nix flake show now correctly shows testvm01 as NixOS configuration

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:41:04 +01:00
7fe0aa0f54 test: add testvm01 for pipeline testing 2026-02-01 17:41:04 +01:00
408554b477 scripts: add create-host tool for automated host configuration generation
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m50s
Run nix flake check / flake-check (pull_request) Failing after 1m49s
Implements Phase 2 of the automated deployment pipeline.

This commit adds a Python CLI tool that automates the creation of NixOS host
configurations, eliminating manual boilerplate and reducing errors.

Features:
- Python CLI using typer framework with rich terminal UI
- Comprehensive validation (hostname format/uniqueness, IP subnet/uniqueness)
- Jinja2 templates for NixOS configurations
- Automatic updates to flake.nix and terraform/vms.tf
- Support for both static IP and DHCP configurations
- Dry-run mode for safe previews
- Packaged as Nix derivation and added to devShell

Usage:
  create-host --hostname myhost --ip 10.69.13.50/24

The tool generates:
- hosts/<hostname>/default.nix
- hosts/<hostname>/configuration.nix
- Updates flake.nix with new nixosConfigurations entry
- Updates terraform/vms.tf with new VM definition

All generated configurations include full system imports (monitoring, SOPS,
autoupgrade, etc.) and are validated with nix flake check and tofu validate.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 02:27:57 +01:00
3a464bc323 proxmox: add VM automation with OpenTofu and Ansible
Add automated workflow for building and deploying NixOS VMs on Proxmox including template2 host configuration, Ansible playbook for image building/deployment, and OpenTofu configuration for VM provisioning with cloud-init.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 21:54:08 +01:00
7f72a72043 flake: add opentofu to devshell
Some checks failed
Run nix flake check / flake-check (push) Failing after 17m5s
2026-01-31 16:12:49 +01:00
f2963a150b flake: stable to 25.11
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m44s
2025-12-06 10:45:14 +01:00
ccd9bbf4da Remove incus hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 14m57s
Periodic flake update / flake-update (push) Successful in 3m35s
2025-07-07 21:30:04 +02:00
6fda081dc8 Add labmon to monitoring01
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-24 03:27:59 +02:00
5e9aff0590 Update stable to 25.05 2025-05-23 00:54:13 +02:00
cba1821f3b Add lldap to auth01 host 2025-04-01 22:23:59 +02:00
abb4cf58ea Add alerttonotify to monitoring host
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-02-11 22:25:54 +01:00
c43e2aa063 Add nats server
Some checks failed
Run nix flake check / flake-check (push) Failing after 17m6s
Periodic flake update / flake-update (push) Successful in 1m28s
2025-02-08 00:26:53 +01:00
002f934c70 Add ansible and playbook to trigger upgrade
Some checks failed
Run nix flake check / flake-check (push) Failing after 27m26s
Periodic flake update / flake-update (push) Successful in 1m24s
2025-02-07 00:28:05 +01:00
4af1bded61 Add backups for monitoring01
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m30s
2025-01-27 23:03:45 +01:00
a9eeb8ada6 Add postgres host
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m3s
2025-01-25 02:28:44 +01:00
83b2a4a2e8 Add initial media1 host 2025-01-24 23:31:52 +01:00
1eb100d4ba Add nix-cache01 2025-01-23 23:18:14 +01:00
79b6598d0d Add jellyfin
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m36s
Periodic flake update / flake-update (push) Successful in 1m29s
2024-12-22 04:33:00 +01:00
fcfafa03fa Switch nixpkgs to 24.11 2024-12-01 01:52:27 +01:00
3c3eaaa042 Add monitoring host 2024-12-01 01:51:34 +01:00
d16a35acb4 Remove unused flake input for sops
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m25s
2024-11-30 14:28:26 +01:00
d7a6e09ce3 Add ca host 2024-10-21 11:01:57 +02:00
a19161ca69 Make backup-helper follow unstable
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m43s
2024-10-21 10:58:27 +02:00
3e35c1ac0c Make sops-nix use same nixpkgs/stable 2024-10-21 10:57:14 +02:00
504be31412 Add http-proxy host
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m49s
2024-10-20 22:09:23 +02:00
50bd8505ec Add incus servers 2024-06-27 21:10:20 +02:00