nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	7933127d77	system: enable homelab-deploy listener for all vault hosts Add system/homelab-deploy.nix module that automatically enables the listener on all hosts with vault.enable=true. Uses homelab.host.tier and homelab.host.role for NATS subject subscriptions. - Add homelab-deploy access to all host AppRole policies - Remove manual listener config from vaulttest01 (now handled by system module) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 06:54:42 +01:00
Torjus Håkestad	13c3897e86	flake: update homelab-deploy, add to devShell Update homelab-deploy to include bugfix. Add CLI to devShell for easier testing and deployment operations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 06:54:42 +01:00
Torjus Håkestad	0643f23281	vaulttest01: add vault secret dependency to listener Some checks failed Run nix flake check / flake-check (push) Failing after 15m32s Details Ensure homelab-deploy-listener waits for the NKey secret to be fetched from Vault before starting. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 05:29:29 +01:00
Torjus Håkestad	ad8570f8db	homelab-deploy: add NATS-based deployment system Some checks failed Run nix flake check / flake-check (push) Failing after 3m45s Details Add homelab-deploy flake input and NixOS module for message-based deployments across the fleet. Configure DEPLOY account in NATS with tiered access control (listener, test-deployer, admin-deployer). Enable listener on vaulttest01 as initial test host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 05:22:06 +01:00
Torjus Håkestad	2f195d26d3	Merge pull request 'homelab-host-module' (#27 ) from homelab-host-module into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m8s Details Reviewed-on: #27	2026-02-07 01:56:38 +00:00
Torjus Håkestad	a926d34287	nix-cache01: set priority to high All checks were successful Run nix flake check / flake-check (pull_request) Successful in 2m14s Details Run nix flake check / flake-check (push) Successful in 2m17s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 02:54:32 +01:00
Torjus Håkestad	be2421746e	gitignore: add result-* for parallel nix builds Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m4s Details Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 02:51:27 +01:00
Torjus Håkestad	12bf0683f5	modules: add homelab.host for host metadata Add a shared `homelab.host` module that provides host metadata for multiple consumers: - tier: deployment tier (test/prod) for future homelab-deploy service - priority: alerting priority (high/low) for Prometheus label filtering - role: primary role of the host (dns, database, monitoring, etc.) - labels: free-form labels for additional metadata Host configurations updated with appropriate values: - ns1, ns2: role=dns with dns_role labels - nix-cache01: priority=low, role=build-host - vault01: role=vault - jump: role=bastion - template, template2, testvm01, vaulttest01: tier=test, priority=low The module is now imported via commonModules in flake.nix, making it available to all hosts including minimal configurations like template2. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 02:49:58 +01:00
Torjus Håkestad	e8a43c6715	docs: add deploy_admin tool with opt-in flag to homelab-deploy plan All checks were successful Run nix flake check / flake-check (push) Successful in 2m5s Details MCP exposes two tools: - deploy: test-tier only, always available - deploy_admin: all tiers, requires --enable-admin flag Three security layers: CLI flag, NATS authz, Claude Code permissions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 02:29:13 +01:00
Torjus Håkestad	eef52bb8c5	docs: add group deployment support to homelab-deploy plan All checks were successful Run nix flake check / flake-check (push) Successful in 2m3s Details Support deploying to all hosts in a tier or all hosts with a role: - deploy.<tier>.all - broadcast to all hosts in tier - deploy.<tier>.role.<role> - broadcast to hosts with matching role MCP can deploy to all test hosts at once, admin can deploy to any group. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 02:22:17 +01:00
Torjus Håkestad	c6cdbc6799	docs: move nixos-exporter plan to completed Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 02:13:14 +01:00
Torjus Håkestad	4d724329a6	docs: add homelab-deploy plan, unify host metadata Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Add plan for NATS-based deployment service (homelab-deploy) that enables on-demand NixOS configuration updates via messaging. Features tiered permissions (test/prod) enforced at NATS layer. Update prometheus-scrape-target-labels plan to share the homelab.host module for host metadata (tier, priority, role, labels) - single source of truth for both deployment tiers and prometheus labels. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 02:10:54 +01:00
Torjus Håkestad	881e70df27	monitoring: relax systemd_not_running alert threshold All checks were successful Run nix flake check / flake-check (push) Successful in 2m4s Details Increase duration from 5m to 10m and demote severity from critical to warning. Brief degraded states during nixos-rebuild are normal and were causing false positive alerts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 01:22:29 +01:00
Torjus Håkestad	b9a269d280	chore: rename metrics skill to observability, add logs reference All checks were successful Run nix flake check / flake-check (push) Successful in 2m4s Details Merge Prometheus metrics and Loki logs into a unified troubleshooting skill. Adds LogQL query patterns, label reference, and common service units for log searching. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 01:17:41 +01:00
Torjus Håkestad	fcf1a66103	chore: add metrics troubleshooting skill Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Reference guide for exploring Prometheus metrics when troubleshooting homelab issues, including the new nixos_flake_info metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 01:11:40 +01:00
Torjus Håkestad	2034004280	flake: update nixos-exporter and set configurationRevision Some checks failed Run nix flake check / flake-check (push) Failing after 4m33s Details - Update nixos-exporter to 0.2.3 - Set system.configurationRevision for all hosts so the exporter can report the flake's git revision Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 01:06:47 +01:00
torjus-bot	af43f88394	flake.lock: Update Flake lock file updates: • Updated input 'nixos-exporter': 'git+https://git.t-juice.club/torjus/nixos-exporter?ref=refs/heads/master&rev=9c29505814954352b2af99b97910ee12a736b8dd' (2026-02-06) → 'git+https://git.t-juice.club/torjus/nixos-exporter?ref=refs/heads/master&rev=04eba77ac028033b6dfed604eb1b5664b46acc77' (2026-02-06)	2026-02-07 00:01:02 +00:00
Torjus Håkestad	a834497fe8	flake: update nixos-exporter input Some checks failed Run nix flake check / flake-check (push) Failing after 6m27s Details Periodic flake update / flake-update (push) Successful in 1m7s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 00:17:54 +01:00
Torjus Håkestad	d3de2a1511	Merge pull request 'monitoring: add nixos-exporter to all hosts' (#26 ) from nixos-exporter into master All checks were successful Run nix flake check / flake-check (push) Successful in 3m6s Details Reviewed-on: #26	2026-02-06 22:56:04 +00:00
Torjus Håkestad	97ff774d3f	monitoring: add nixos-exporter to all hosts All checks were successful Run nix flake check / flake-check (push) Successful in 3m16s Details Run nix flake check / flake-check (pull_request) Successful in 3m14s Details Add nixos-exporter prometheus exporter to track NixOS generation metrics and flake revision status across all hosts. Changes: - Add nixos-exporter flake input - Add commonModules list in flake.nix for modules shared by all hosts - Enable nixos-exporter in system/monitoring/metrics.nix - Configure Prometheus to scrape nixos-exporter on all hosts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 23:55:29 +01:00
Torjus Håkestad	f2c30cc24f	chore: give claude the quick-plan skill Some checks failed Run nix flake check / flake-check (push) Failing after 13m57s Details	2026-02-06 21:58:30 +01:00
Torjus Håkestad	7e80d2e0bc	docs: add plans for nixos and homelab prometheus exporters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 21:56:55 +01:00
Torjus Håkestad	1f5b7b13e2	monitoring: enable restart-count and ip-accounting collectors All checks were successful Run nix flake check / flake-check (push) Successful in 2m11s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 21:30:47 +01:00
Torjus Håkestad	c53e36c3f3	Revert "monitoring: enable additional systemd-exporter collectors" This reverts commit `04a252b857`.	2026-02-06 21:30:05 +01:00
Torjus Håkestad	04a252b857	monitoring: enable additional systemd-exporter collectors Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Enables restart-count, file-descriptor-size, and ip-accounting collectors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 21:28:44 +01:00
Torjus Håkestad	5d26f52e0d	Revert "monitoring: enable cpu, memory, io collectors for systemd-exporter" This reverts commit `506a692548`.	2026-02-06 21:26:20 +01:00
Torjus Håkestad	506a692548	monitoring: enable cpu, memory, io collectors for systemd-exporter Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 21:23:19 +01:00
Torjus Håkestad	fa8f4f0784	docs: add notes about lib.getExe and not amending master Some checks failed Run nix flake check / flake-check (push) Failing after 6m11s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 19:41:45 +01:00
Torjus Håkestad	025570dea1	monitoring: fix openbao token refresh timer not triggering RemainAfterExit=true kept the service in "active" state, which prevented OnUnitActiveSec from scheduling new triggers since there was no new "activation" event. Removing it allows the service to properly go inactive, enabling the timer to reschedule correctly. Also fix ExecStart to use lib.getExe for proper path resolution with writeShellApplication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 19:41:45 +01:00
Torjus Håkestad	15c00393f1	monitoring: increase zigbee_sensor_stale threshold to 2 hours Some checks failed Run nix flake check / flake-check (push) Failing after 6m59s Details Sensors report every ~45-50 minutes on average, so 1 hour was too tight. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 19:26:56 +01:00
Torjus Håkestad	787c14c7a6	docs: add dns_role label to scrape target labels plan All checks were successful Run nix flake check / flake-check (push) Successful in 2m3s Details Add proposed dns_role label to distinguish primary/secondary DNS resolvers. This addresses the unbound_low_cache_hit_ratio alert firing on ns2, which has a cold cache due to low traffic. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 01:23:34 +01:00
Torjus Håkestad	eee3dde04f	restic: add randomized delay to backup timers Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Backups to the shared restic repository were all scheduled at exactly midnight, causing lock conflicts. Adding RandomizedDelaySec spreads them out over a 2-hour window to prevent simultaneous access. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 01:09:38 +01:00
torjus-bot	682b07b977	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/bf922a59c5c9998a6584645f7d0de689512e444c?narHash=sha256-ksTL7P9QC1WfZasNlaAdLOzqD8x5EPyods69YBqxSfk%3D' (2026-02-04) → 'github:nixos/nixpkgs/00c21e4c93d963c50d4c0c89bfa84ed6e0694df2?narHash=sha256-AYqlWrX09%2BHvGs8zM6ebZ1pwUqjkfpnv8mewYwAo%2BiM%3D' (2026-02-04)	2026-02-06 00:01:04 +00:00
Torjus Håkestad	70661ac3d9	Merge pull request 'home-assistant: fix zigbee battery value_template override key' (#25 ) from fix-zigbee-battery-template into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m3s Details Periodic flake update / flake-update (push) Successful in 1m11s Details Reviewed-on: #25	2026-02-05 23:56:45 +00:00
Torjus Håkestad	506e93a5e2	home-assistant: fix zigbee battery value_template override key Some checks failed Run nix flake check / flake-check (push) Failing after 5m39s Details Run nix flake check / flake-check (pull_request) Failing after 12m37s Details The homeassistant override key should match the entity type in the MQTT discovery topic path. For battery sensors, the topic is homeassistant/sensor/<device>/battery/config, so the key should be "battery" not "sensor_battery". Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:48:30 +01:00
Torjus Håkestad	b6c41aa910	system: add UTC suffix to MOTD commit timestamp Some checks failed Run nix flake check / flake-check (push) Failing after 7m32s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:34:24 +01:00
Torjus Håkestad	aa6e00a327	Merge pull request 'add-nixos-rebuild-test' (#24 ) from add-nixos-rebuild-test into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Reviewed-on: #24	2026-02-05 23:26:34 +00:00
Torjus Håkestad	258e350b89	system: add MOTD banner with hostname and commit info Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m8s Details Run nix flake check / flake-check (push) Failing after 3m53s Details Displays FQDN and flake commit hash with timestamp on login. Templates can override with their own MOTD via mkDefault. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:26:01 +01:00
Torjus Håkestad	eba195c192	docs: add nixos-rebuild-test usage to CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:19:49 +01:00
Torjus Håkestad	bbb22e588e	system: replace writeShellScript with writeShellApplication Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m3s Details Run nix flake check / flake-check (push) Failing after 5m57s Details Convert remaining writeShellScript usages to writeShellApplication for shellcheck validation and strict bash options. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:17:24 +01:00
Torjus Håkestad	879e7aba60	templates: use writeShellApplication for prepare-host script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:14:05 +01:00
Torjus Håkestad	39a4ea98ab	system: add nixos-rebuild-test helper script Adds a helper script deployed to all hosts for testing feature branches. Usage: nixos-rebuild-test <action> <branch> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:12:16 +01:00
Torjus Håkestad	1d90dc2181	Merge pull request 'monitoring: use AppRole token for OpenBao metrics scraping' (#23 ) from fix-prometheus-openbao-token into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m21s Details Reviewed-on: #23	2026-02-05 22:52:42 +00:00
Torjus Håkestad	e9857afc11	monitoring: use AppRole token for OpenBao metrics scraping All checks were successful Run nix flake check / flake-check (push) Successful in 2m12s Details Run nix flake check / flake-check (pull_request) Successful in 2m19s Details Instead of creating a long-lived Vault token in Terraform (which gets invalidated when Terraform recreates it), monitoring01 now uses its existing AppRole credentials to fetch a fresh token for Prometheus. Changes: - Add prometheus-metrics policy to monitoring01's AppRole - Remove vault_token.prometheus_metrics resource from Terraform - Remove openbao-token KV secret from Terraform - Add systemd service to fetch AppRole token on boot - Add systemd timer to refresh token every 30 minutes This ensures Prometheus always has a valid token without depending on Terraform state or manual intervention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 23:51:11 +01:00
Torjus Håkestad	88e9036cb4	Merge pull request 'auth01: decommission host and remove authelia/lldap services' (#22 ) from decommission-auth01 into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m5s Details Reviewed-on: #22	2026-02-05 22:37:38 +00:00
Torjus Håkestad	59e1962d75	auth01: decommission host and remove authelia/lldap services Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m5s Details Run nix flake check / flake-check (push) Failing after 18m1s Details Remove auth01 host configuration and associated services in preparation for new auth stack with different provisioning system. Removed: - hosts/auth01/ - host configuration - services/authelia/ - authelia service module - services/lldap/ - lldap service module - secrets/auth01/ - sops secrets - Reverse proxy entries for auth and lldap - Monitoring alert rules for authelia and lldap - SOPS configuration for auth01 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 23:35:45 +01:00
Torjus Håkestad	3dc4422ba0	docs: add NAS integration notes to auth plan All checks were successful Run nix flake check / flake-check (push) Successful in 2m4s Details Document TrueNAS CORE LDAP integration approach (NFS-only) and future NixOS NAS migration path with native Kanidm PAM/NSS. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 23:24:37 +01:00
Torjus Håkestad	f0963624bc	docs: add auth system replacement plan Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Evaluate options for replacing LLDAP+Authelia with a unified auth solution. Recommends Kanidm for its native NixOS PAM/NSS integration and built-in OIDC. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 23:18:38 +01:00
Torjus Håkestad	7b46f94e48	Merge pull request 'zigbee-battery-fix' (#21 ) from zigbee-battery-fix into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m20s Details Reviewed-on: #21	2026-02-05 21:51:41 +00:00
Torjus Håkestad	32968147b5	docs: move zigbee battery plan to completed All checks were successful Run nix flake check / flake-check (push) Successful in 2m17s Details Run nix flake check / flake-check (pull_request) Successful in 2m19s Details Updated plan with: - Full device inventory from ha1 - Backup verification details - Branch and commit references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 22:49:49 +01:00

1 2 3 4 5 ...

755 Commits