nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	7fb8df69a4	migrate git URLs from git.t-juice.club to code.t-juice.club Update all flake URLs to use the new Forgejo instance. This includes auto-upgrade, nixos-rebuild-test, homelab-deploy listener, nixos-exporter, nix-cache02 builder, and the bootstrap script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 23:34:22 +01:00
Torjus Håkestad	cf19ade34b	nix-cache02: add native nix forgejo runner instance Add a second runner instance (actions-native) that executes jobs directly on the host, giving workflows persistent nix store access and automatic binary cache population via Harmonia. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 23:15:51 +01:00
Torjus Håkestad	402fef8dc4	media1: add kitty terminal, Norwegian layout, HDMI audio priority - Add kitty on workspace 3 (Super+3) - Set Norwegian keyboard layout in Hyprland - WirePlumber rule to prefer HDMI audio over USB HID device Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 20:57:23 +01:00
Torjus Håkestad	a4426c50b9	media1: override ProtectHome for promtail to read kodi logs The NixOS promtail module sets ProtectHome=true which blocks access to /home entirely. Override to read-only so promtail can tail /home/kodi/.kodi/temp/kodi.log. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 20:20:07 +01:00
Torjus Håkestad	8abe7b1d07	media1: fix promtail permissions for kodi log scraping Add promtail to the kodi group and set kodi home to 750 so promtail can read ~/.kodi/temp/kodi.log. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 20:09:44 +01:00
Torjus Håkestad	672143806a	media1: ship kodi logs to loki Kodi logs to ~/.kodi/temp/kodi.log which isn't picked up by the journal or varlog scrape configs. Add a dedicated promtail scrape config for it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 20:06:32 +01:00
Torjus Håkestad	f87e004153	media1: use UWSM for Hyprland session management Matches the working pattern from gunter — UWSM properly sets up dbus and systemd targets, which is needed for PipeWire and xdg-desktop-portal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 19:22:33 +01:00
Torjus Håkestad	35e62dafbc	media1: add NixOS media PC configuration GMKtec G3 (Intel N100) replacing the old Ubuntu media PC on VLAN 31. Hyprland compositor with Kodi on workspace 1 and Firefox on workspace 2, greetd auto-login, PipeWire audio, VA-API hardware decode, and NFS mount for media from NAS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 19:09:23 +01:00
Torjus Håkestad	20875fb03f	pn02: disable sched_ext and document memtest results Memtest86 ran 38 passes (109 hours) with zero errors, ruling out RAM. Disable sched_ext scheduler to test whether kernel scheduler crashes stop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 12:16:55 +01:00
Torjus Håkestad	e9629c18b6	nrec-nixos01: mount Cinder volume for Forgejo packages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 21:11:37 +01:00
Torjus Håkestad	117e54a849	actions-runner: add Forgejo runner to nix-cache02 with Vault token Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 00:41:24 +01:00
Torjus Håkestad	93aa91f307	nrec-nixos02: add Forgejo Actions runner with Podman Adds a container-based Forgejo Actions runner on nrec-nixos02 connecting to code.t-juice.club, using Podman for sandboxed job execution with nix, node-bookworm, and alpine labels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 23:17:27 +01:00
Torjus Håkestad	00f46af628	nrec-nixos01: use code.t-juice.club for Forgejo Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 18:50:54 +01:00
Torjus Håkestad	a27e2ec213	nrec-nixos02: add Pocket ID with Caddy reverse proxy Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Run nix flake check / flake-check (pull_request) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 18:11:49 +01:00
Torjus Håkestad	cfc0c6f6cb	nrec-nixos01: add Forgejo with Caddy reverse proxy Some checks failed Run nix flake check / flake-check (push) Failing after 5m6s Details Run nix flake check / flake-check (pull_request) Failing after 4m31s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 14:49:48 +01:00
Torjus Håkestad	822380695e	nrec-nixos01: import qemu-guest profile for virtio modules Some checks failed Run nix flake check / flake-check (push) Failing after 6m6s Details The initrd was missing virtio drivers, preventing the root filesystem from being detected during boot. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 14:31:09 +01:00
Torjus Håkestad	0941bd52f5	nrec-nixos01: fix root filesystem device to use label Some checks failed Run nix flake check / flake-check (push) Failing after 4m22s Details The OpenStack image labels the root partition "nixos", so use /dev/disk/by-label/nixos instead of /dev/vda1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 14:22:24 +01:00
Torjus Håkestad	adc267bd95	nrec-nixos01: add host configuration with Caddy web server Some checks failed Run nix flake check / flake-check (push) Failing after 9m20s Details Run nix flake check / flake-check (pull_request) Failing after 3m58s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 14:10:05 +01:00
Torjus Håkestad	7ffe2d71d6	openstack-template: add minimal NixOS image for OpenStack Adds a new host configuration for building qcow2 images targeting OpenStack (NREC). Uses a nixos user with SSH key and sudo instead of root login, firewall enabled, and no internal services. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 13:56:55 +01:00
Torjus Håkestad	73d804105b	pn01, pn02: enable memtest86 and update stability docs Some checks failed Run nix flake check / flake-check (push) Failing after 6m15s Details Periodic flake update / flake-update (push) Successful in 2m50s Details Enable memtest86 in systemd-boot menu on both PN51 units to allow extended memory testing. Update stability document with March crash data from pstore/Loki — crashes now traced to sched_ext scheduler kernel oops, suggesting possible memory corruption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 23:02:28 +01:00
Torjus Håkestad	136116ab33	pn02: limit CPU to C1 power state for stability Some checks failed Run nix flake check / flake-check (push) Failing after 6m36s Details Periodic flake update / flake-update (push) Successful in 2m18s Details Known PN51 platform issue with deep C-states causing freezes. Limit to C1 to prevent deeper sleep states. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:58:41 +01:00
Torjus Håkestad	72acaa872b	pn02: add panic on lockup, NMI watchdog, and rasdaemon Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Enable kernel panic on soft/hard lockups with auto-reboot after 10s, and rasdaemon for hardware error logging. Should give us diagnostic data on the next freeze. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:48:21 +01:00
Torjus Håkestad	05e8556bda	pn02: blacklist amdgpu kernel module for stability testing Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details pn02 continues to hard freeze with no log evidence. Blacklisting the GPU driver to eliminate GPU/PSP firmware interactions as a possible cause. Console output will be lost but the host is managed over SSH. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:27:05 +01:00
Torjus Håkestad	7e19f51dfa	nix: move experimental-features to system/nix.nix Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details All hosts had identical nix-command/flakes settings in their configuration.nix. Centralize in system/nix.nix so new hosts (like pn01/pn02) get it automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 10:27:53 +01:00
Torjus Håkestad	72c3a938b0	hosts: enable vault on pn01 and pn02 Some checks failed Run nix flake check / flake-check (push) Failing after 10m12s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:56:05 +01:00
Torjus Håkestad	4a83363ee5	hosts: add pn01 and pn02 (ASUS PN51 mini PCs) Some checks failed Run nix flake check / flake-check (push) Failing after 5m33s Details Add two ASUS PN51 hosts on VLAN 12 for stability testing. pn01 at 10.69.12.60, pn02 at 10.69.12.61, both test-tier compute role. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:37:14 +01:00
Torjus Håkestad	7374d1ff7f	nix-cache02: increase builder timeout to 4 hours Some checks failed Run nix flake check / flake-check (push) Failing after 4m4s Details Periodic flake update / flake-update (push) Successful in 2m32s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:53:33 +01:00
Torjus Håkestad	4f593126c0	monitoring01: remove host and migrate services to monitoring02 Some checks failed Run nix flake check / flake-check (push) Failing after 3m15s Details Run nix flake check / flake-check (pull_request) Failing after 3m8s Details Remove monitoring01 host configuration and unused service modules (prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox, exportarr, and pve exporters to monitoring02 with scrape configs moved to VictoriaMetrics. Update alert rules, terraform vault policies/secrets, http-proxy entries, and documentation to reflect the monitoring02 migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:50:20 +01:00
Torjus Håkestad	a6013d3950	monitoring02: enable alerting and migrate CNAMEs from http-proxy Some checks failed Run nix flake check / flake-check (push) Failing after 6m25s Details Run nix flake check / flake-check (pull_request) Failing after 3m52s Details - Switch vmalert from blackhole mode to sending alerts to local Alertmanager - Import alerttonotify service so alerts route to NATS notifications - Move alertmanager and grafana CNAMEs from http-proxy to monitoring02 - Add monitoring CNAME to monitoring02 - Add Caddy reverse proxy entries for alertmanager and grafana - Remove prometheus, alertmanager, and grafana Caddy entries from http-proxy (now served directly by monitoring02) - Move monitoring02 Vault AppRole to hosts-generated.tf with extra_policies support and prometheus-metrics policy - Update Promtail to use authenticated loki.home.2rjus.net endpoint only (remove unauthenticated monitoring01 client) - Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with basic auth from Vault secret - Move migration plan to completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:23:21 +01:00
Torjus Håkestad	2903873d52	monitoring02: add loki CNAME and Caddy reverse proxy Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 19:48:06 +01:00
Torjus Håkestad	74e7c9faa4	monitoring02: add Loki service Some checks failed Run nix flake check / flake-check (push) Failing after 3m19s Details Add standalone Loki service module (services/loki/) with same config as monitoring01 and import it on monitoring02. Update Grafana Loki datasource to localhost. Defer Tempo and Pyroscope migration (not actively used). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 19:42:19 +01:00
Torjus Håkestad	4cbaa33475	monitoring02: add Caddy reverse proxy for VictoriaMetrics and vmalert Add metrics.home.2rjus.net and vmalert.home.2rjus.net CNAMEs with Caddy TLS termination via internal ACME CA. Refactors Grafana's Caddy config from configFile to globalConfig + virtualHosts so both modules can contribute routes to the same Caddy instance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	e329f87b0b	monitoring02: add VictoriaMetrics, vmalert, and Alertmanager Set up the core metrics stack on monitoring02 as Phase 2 of the monitoring migration. VictoriaMetrics replaces Prometheus with identical scrape configs (22 jobs including auto-generated targets). - VictoriaMetrics with 3-month retention and all scrape configs - vmalert evaluating existing rules.yml (notifier disabled) - Alertmanager with same routing config (no alerts during parallel op) - Grafana datasources updated: local VictoriaMetrics as default - Static user override for credential file access (OpenBao, Apiary) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	d485948df0	docs: update Loki queries from host to hostname label Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Update all LogQL examples, agent instructions, and scripts to use the hostname label instead of host, matching the Prometheus label naming convention. Also update pipe-to-loki and bootstrap scripts to push hostname instead of host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 23:43:47 +01:00
Torjus Håkestad	b2b6ab4799	garage01: add Garage S3 service with Caddy HTTPS proxy Configure Garage object storage on garage01 with S3 API, Vault secrets for RPC secret and admin token, and Caddy reverse proxy for HTTPS access at s3.home.2rjus.net via internal ACME CA. Includes flake entry, VM definition, and Vault policy for the host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 21:24:25 +01:00
Torjus Håkestad	fa8d65b612	nix-cache02: increase builder timeout to 2 hours Some checks failed Run nix flake check / flake-check (push) Failing after 14m21s Details Periodic flake update / flake-update (push) Successful in 5m17s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 14:44:55 +01:00
Torjus Håkestad	ed1821b073	nix-cache02: add scheduled builds timer Some checks failed Run nix flake check / flake-check (push) Failing after 5m7s Details Periodic flake update / flake-update (push) Successful in 2m18s Details Add a systemd timer that triggers builds for all hosts every 2 hours via NATS, keeping the binary cache warm. - Add scheduler.nix with timer (every 2h) and oneshot service - Add scheduler NATS user to DEPLOY account - Add Vault secret and variable for scheduler NKey - Increase nix-cache02 memory from 16GB to 20GB Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 00:50:09 +01:00
Torjus Håkestad	fa4a418007	restic: add --retry-lock=5m to all backup jobs Some checks failed Run nix flake check / flake-check (push) Failing after 23m42s Details Prevents lock conflicts when multiple backup jobs targeting the same repository run concurrently. Jobs will now retry acquiring the lock every 10 seconds for up to 5 minutes before failing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-11 01:22:00 +01:00
Torjus Håkestad	75210805d5	nix-cache01: decommission and remove all references Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Removed: - hosts/nix-cache01/ directory - services/nix-cache/build-flakes.{nix,sh} (replaced by NATS builder) - Vault secret and AppRole for nix-cache01 - Old signing key variable from terraform - Old trusted public key from system/nix.nix Updated: - flake.nix: removed nixosConfiguration - README.md: nix-cache01 -> nix-cache02 - Monitoring rules: removed build-flakes alerts, updated harmonia to nix-cache02 - Simplified proxy.nix (no longer needs hostname conditional) nix-cache02 is now the sole binary cache host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 23:40:51 +01:00
Torjus Håkestad	83fce5f927	nix-cache: switch DNS to nix-cache02 Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details - Move nix-cache CNAME from nix-cache01 to nix-cache02 - Remove actions1 CNAME (service removed) - Update proxy.nix to serve canonical domain on nix-cache02 - Promote nix-cache02 to prod tier with build-host role Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 23:22:23 +01:00
Torjus Håkestad	751edfc11d	nix-cache02: add Harmonia binary cache service Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details - Parameterize harmonia.nix to use hostname-based Vault paths - Add nix-cache services to nix-cache02 - Add Vault secret and variable for nix-cache02 signing key - Add nix-cache02 public key to trusted-public-keys on all hosts - Update plan doc to remove actions runner references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 23:08:48 +01:00
Torjus Håkestad	98a7301985	nix-cache: remove unused Gitea Actions runner All checks were successful Run nix flake check / flake-check (push) Successful in 2m23s Details The actions runner on nix-cache01 was never actively used. Removing it before migrating to nix-cache02. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 22:57:08 +01:00
Torjus Håkestad	47747329c4	nix-cache02: add homelab-deploy builder service Some checks failed Run nix flake check / flake-check (push) Failing after 4m51s Details - Configure builder to build nixos-servers and nixos (gunter) repos - Add builder NKey to Vault secrets - Update NATS permissions for builder, test-deployer, and admin-deployer - Grant nix-cache02 access to shared homelab-deploy secrets Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 22:26:40 +01:00
Torjus Håkestad	2d9ca2a73f	hosts: add nix-cache02 build host Some checks failed Run nix flake check / flake-check (push) Failing after 16m26s Details New build host to replace nix-cache01 with: - 8 CPU cores, 16GB RAM, 200GB disk - Static IP 10.69.13.25 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 21:53:29 +01:00
Torjus Håkestad	6e08ba9720	ansible: restructure with dynamic inventory from flake - Move playbooks/ to ansible/playbooks/ - Add dynamic inventory script that extracts hosts from flake - Groups by tier (tier_test, tier_prod) and role (role_dns, etc.) - Reads homelab.host.* options for metadata - Add static inventory for non-flake hosts (Proxmox) - Add ansible.cfg with inventory path and SSH optimizations - Add group_vars/all.yml for common variables - Add restart-service.yml playbook for restarting systemd services - Update provision-approle.yml with single-host safeguard - Add ANSIBLE_CONFIG to devshell for automatic inventory discovery - Add ansible = "false" label to template2 to exclude from inventory - Update CLAUDE.md to reference ansible/README.md for details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 21:41:29 +01:00
Torjus Håkestad	287141c623	hosts: add role metadata to all hosts Some checks failed Run nix flake check / flake-check (push) Failing after 13m51s Details Assign roles to hosts for better organization and filtering: - ha1: home-automation - monitoring01, monitoring02: monitoring - jelly01: media - nats1: messaging - http-proxy: proxy - testvm01-03: test Also promote kanidm01 and monitoring02 from test to prod tier. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 16:21:08 +01:00
Torjus Håkestad	030e8518c5	grafana: add Grafana on monitoring02 with Kanidm OIDC Some checks failed Run nix flake check / flake-check (push) Failing after 4m3s Details Deploy Grafana test instance on monitoring02 with: - Kanidm OIDC authentication (admins -> Admin role, others -> Viewer) - PKCE enabled for secure OAuth2 flow (required by Kanidm) - Declarative datasources for Prometheus and Loki on monitoring01 - Local Caddy for TLS termination via internal ACME CA - DNS CNAME grafana-test.home.2rjus.net Terraform changes add OAuth2 client secret and AppRole policies for kanidm01 and monitoring02. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:23:26 +01:00
Torjus Håkestad	0b977808ca	hosts: add monitoring02 configuration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details New test-tier host for monitoring stack expansion with: - Static IP 10.69.13.24 - 4 CPU cores, 4GB RAM, 20GB disk - Vault integration and NATS-based deployment enabled Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 19:19:38 +01:00
Torjus Håkestad	b845a8bb8b	system: add kanidm PAM/NSS client module Add homelab.kanidm.enable option for central authentication via Kanidm. The module configures: - PAM/NSS integration with kanidm-unixd - Client connection to auth.home.2rjus.net - Login authorization for ssh-users group Enable on testvm01-03 for testing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:12:19 +01:00
Torjus Håkestad	bfbf0cea68	template2: enable zram for bootstrap Some checks failed Run nix flake check / flake-check (push) Failing after 3m34s Details Prevents OOM during initial nixos-rebuild on 2GB VMs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 13:34:08 +01:00

1 2 3 4

161 Commits