nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	7e19f51dfa	nix: move experimental-features to system/nix.nix Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details All hosts had identical nix-command/flakes settings in their configuration.nix. Centralize in system/nix.nix so new hosts (like pn01/pn02) get it automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 10:27:53 +01:00
Torjus Håkestad	9f7aab86a0	pn51: update stability notes, TSC/PSP issues affect both units Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 09:25:28 +01:00
Torjus Håkestad	bb53b922fa	plans: add NixOS hypervisor plan (Incus on PN51s) Some checks failed Run nix flake check / flake-check (push) Failing after 5m40s Details Periodic flake update / flake-update (push) Failing after 4s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 00:47:09 +01:00
Torjus Håkestad	75cd7c6c2d	docs: add PN51 stability testing notes Some checks failed Run nix flake check / flake-check (push) Failing after 12m3s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 00:24:28 +01:00
Torjus Håkestad	72c3a938b0	hosts: enable vault on pn01 and pn02 Some checks failed Run nix flake check / flake-check (push) Failing after 10m12s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:56:05 +01:00
Torjus Håkestad	2f89d564f7	vault: add approles for pn01/pn02, fix provision playbook Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Add pn01 and pn02 to hosts-generated.tf for Vault AppRole access. Fix provision-approle.yml: the localhost play was skipped when using -l filter, since localhost didn't match the target. Merged into a single play using delegate_to: localhost for the bao commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:51:56 +01:00
Torjus Håkestad	4a83363ee5	hosts: add pn01 and pn02 (ASUS PN51 mini PCs) Some checks failed Run nix flake check / flake-check (push) Failing after 5m33s Details Add two ASUS PN51 hosts on VLAN 12 for stability testing. pn01 at 10.69.12.60, pn02 at 10.69.12.61, both test-tier compute role. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:37:14 +01:00
Torjus Håkestad	b578520905	media-pc: add JellyCon, display server, and HDR decisions Some checks failed Run nix flake check / flake-check (push) Failing after 4m45s Details Periodic flake update / flake-update (push) Successful in 2m16s Details Decided on Kodi + JellyCon with NFS direct path for media playback, Sway/Hyprland for display server with workspace-based browser switching, and noted HDR status for future reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 00:08:19 +01:00
Torjus Håkestad	8a5aa1c4f5	plans: add media PC replacement plan, update router hardware candidates Some checks failed Run nix flake check / flake-check (push) Failing after 4m30s Details New plan for replacing the media PC (i7-4770K/Ubuntu) with a NixOS mini PC running Kodi. Router plan updated with specific AliExpress hardware options and IDS/IPS considerations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 23:54:29 +01:00
Torjus Håkestad	0f8c4783a8	truenas-migration: drive trays ordered, resolve open question Some checks failed Run nix flake check / flake-check (push) Failing after 3m18s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 19:29:12 +01:00
Torjus Håkestad	2ca2509083	monitoring: increase filesystem_filling_up prediction window to 24h Some checks failed Run nix flake check / flake-check (push) Failing after 3m55s Details Reduces false positives from transient Nix store growth by basing the linear prediction on a 24h trend instead of 6h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 09:36:27 +01:00
Torjus Håkestad	58702bd10b	truenas-migration: note subnet issue for 10GbE traffic Some checks failed Run nix flake check / flake-check (push) Failing after 7m10s Details NAS and Proxmox are on the same 10GbE switch but different subnets, forcing traffic through the router. Need to fix during migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
Torjus Håkestad	c9f47acb01	truenas-migration: mdadm boot mirror, clean zfs export step Use TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root to keep the boot path ZFS-independent. Added zfs export step before shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
Torjus Håkestad	09ce018fb2	truenas-migration: switch from BTRFS to keeping ZFS, update plan BTRFS RAID5/6 write hole is still unresolved, and RAID1 wastes capacity with mixed disk sizes. Keep existing ZFS pool and import directly on NixOS instead. Updated migration strategy, disk purchase decision (2x 24TB ordered), SMART health notes, and vdev rebalancing guidance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
torjus-bot	3042803c4d	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/fa56d7d6de78f5a7f997b0ea2bc6efd5868ad9e8?narHash=sha256-X01Q3DgSpjeBpapoGA4rzKOn25qdKxbPnxHeMLNoHTU%3D' (2026-02-16) → 'github:nixos/nixpkgs/6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47?narHash=sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU%3D' (2026-02-18)	2026-02-20 00:07:01 +00:00
Torjus Håkestad	1e7200b494	quick-plan: add mermaid diagram guideline Some checks failed Run nix flake check / flake-check (push) Failing after 5m7s Details Periodic flake update / flake-update (push) Successful in 5m26s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:35:53 +01:00
Torjus Håkestad	eec1e374b2	docs: simplify mermaid diagram labels Some checks failed Run nix flake check / flake-check (push) Failing after 4m0s Details Use <br/> for line breaks and shorter node labels so the diagram renders cleanly in Gitea. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:29:52 +01:00
Torjus Håkestad	fcc410afad	docs: replace ASCII diagram with mermaid in remote-access plan Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:28:57 +01:00
Torjus Håkestad	59f0c7ceda	flake.lock: update homelab-deploy Some checks failed Run nix flake check / flake-check (push) Failing after 8m10s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 09:04:03 +01:00
torjus-bot	d713f06c6e	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/a82ccc39b39b621151d6732718e3e250109076fa?narHash=sha256-gf2AmWVTs8lEq7z/3ZAsgnZDhWIckkb%2BZnAo5RzSxJg%3D' (2026-02-13) → 'github:nixos/nixpkgs/0182a361324364ae3f436a63005877674cf45efb?narHash=sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ%3D' (2026-02-17)	2026-02-19 00:01:44 +00:00
Torjus Håkestad	7374d1ff7f	nix-cache02: increase builder timeout to 4 hours Some checks failed Run nix flake check / flake-check (push) Failing after 4m4s Details Periodic flake update / flake-update (push) Successful in 2m32s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:53:33 +01:00
torjus-bot	e912c75b6c	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/3aadb7ca9eac2891d52a9dec199d9580a6e2bf44?narHash=sha256-O1XDr7EWbRp%2BkHrNNgLWgIrB0/US5wvw9K6RERWAj6I%3D' (2026-02-14) → 'github:nixos/nixpkgs/fa56d7d6de78f5a7f997b0ea2bc6efd5868ad9e8?narHash=sha256-X01Q3DgSpjeBpapoGA4rzKOn25qdKxbPnxHeMLNoHTU%3D' (2026-02-16)	2026-02-18 00:01:34 +00:00
Torjus Håkestad	b218b4f8bc	docs: update migration plan for monitoring01 and pgdb1 completion Some checks failed Run nix flake check / flake-check (push) Failing after 16m37s Details Periodic flake update / flake-update (push) Successful in 2m21s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:26:23 +01:00
Torjus Håkestad	65acf13e6f	grafana: fix datasource UIDs for VictoriaMetrics migration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Update all dashboard datasource references from "prometheus" to "victoriametrics" to match the declared datasource UID. Enable prune and deleteDatasources to clean up the old Prometheus (monitoring01) datasource from Grafana's database. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:23:04 +01:00
Torjus Håkestad	95a96b2192	Merge pull request 'monitoring01: remove host and migrate services to monitoring02' (#43 ) from cleanup-monitoring01 into master Some checks failed Run nix flake check / flake-check (push) Failing after 4m2s Details Reviewed-on: #43	2026-02-17 21:08:00 +00:00
Torjus Håkestad	4f593126c0	monitoring01: remove host and migrate services to monitoring02 Some checks failed Run nix flake check / flake-check (push) Failing after 3m15s Details Run nix flake check / flake-check (pull_request) Failing after 3m8s Details Remove monitoring01 host configuration and unused service modules (prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox, exportarr, and pve exporters to monitoring02 with scrape configs moved to VictoriaMetrics. Update alert rules, terraform vault policies/secrets, http-proxy entries, and documentation to reflect the monitoring02 migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:50:20 +01:00
Torjus Håkestad	1bba6f106a	Merge pull request 'monitoring02: enable alerting and migrate CNAMEs from http-proxy' (#42 ) from monitoring02-enable-alerting into master Some checks failed Run nix flake check / flake-check (push) Failing after 5m5s Details Reviewed-on: #42	2026-02-17 20:24:16 +00:00
Torjus Håkestad	a6013d3950	monitoring02: enable alerting and migrate CNAMEs from http-proxy Some checks failed Run nix flake check / flake-check (push) Failing after 6m25s Details Run nix flake check / flake-check (pull_request) Failing after 3m52s Details - Switch vmalert from blackhole mode to sending alerts to local Alertmanager - Import alerttonotify service so alerts route to NATS notifications - Move alertmanager and grafana CNAMEs from http-proxy to monitoring02 - Add monitoring CNAME to monitoring02 - Add Caddy reverse proxy entries for alertmanager and grafana - Remove prometheus, alertmanager, and grafana Caddy entries from http-proxy (now served directly by monitoring02) - Move monitoring02 Vault AppRole to hosts-generated.tf with extra_policies support and prometheus-metrics policy - Update Promtail to use authenticated loki.home.2rjus.net endpoint only (remove unauthenticated monitoring01 client) - Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with basic auth from Vault secret - Move migration plan to completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:23:21 +01:00
Torjus Håkestad	7f69c0738a	Merge pull request 'loki-monitoring02' (#41 ) from loki-monitoring02 into master Some checks failed Run nix flake check / flake-check (push) Failing after 8m20s Details Reviewed-on: #41	2026-02-17 19:40:33 +00:00
Torjus Håkestad	35924c7b01	mcp: move config to .mcp.json.example, gitignore real config Some checks failed Run nix flake check / flake-check (push) Failing after 15m57s Details Run nix flake check / flake-check (pull_request) Failing after 16m45s Details The real .mcp.json now contains Loki credentials for basic auth, so it should not be committed. The example file has placeholders. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:35:14 +01:00
Torjus Håkestad	87d8571d62	promtail: fix vault secret ownership for loki auth Some checks failed Run nix flake check / flake-check (push) Failing after 12m24s Details The secret file needs to be owned by promtail since Promtail runs as a dedicated user and can't read root-owned files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:17:02 +01:00
Torjus Håkestad	43c81f6688	terraform: fix loki-push policy for generated hosts Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Revert ns1/ns2 from approle.tf (they're in hosts-generated.tf) and add loki-push policy to generated AppRoles instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:13:22 +01:00
Torjus Håkestad	58f901ad3e	terraform: add ns1 and ns2 to AppRole policies Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details They were missing from the host_policies map, so they didn't get shared policies like loki-push. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:10:37 +01:00
Torjus Håkestad	c13921d302	loki: add basic auth for log push and dual-ship promtail Some checks failed Run nix flake check / flake-check (push) Failing after 4m36s Details - Loki bound to localhost, Caddy reverse proxy with basic_auth - Vault secret (shared/loki/push-auth) for password, bcrypt hash generated at boot for Caddy environment - Promtail dual-ships to monitoring01 (direct) and loki.home.2rjus.net (with basic auth), conditional on vault.enable - Terraform: new shared loki-push policy added to all AppRoles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:00:08 +01:00
Torjus Håkestad	2903873d52	monitoring02: add loki CNAME and Caddy reverse proxy Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 19:48:06 +01:00
Torjus Håkestad	74e7c9faa4	monitoring02: add Loki service Some checks failed Run nix flake check / flake-check (push) Failing after 3m19s Details Add standalone Loki service module (services/loki/) with same config as monitoring01 and import it on monitoring02. Update Grafana Loki datasource to localhost. Defer Tempo and Pyroscope migration (not actively used). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 19:42:19 +01:00
Torjus Håkestad	471f536f1f	Merge pull request 'victoriametrics-monitoring02' (#40 ) from victoriametrics-monitoring02 into master Some checks failed Run nix flake check / flake-check (push) Failing after 4m3s Details Periodic flake update / flake-update (push) Successful in 3m29s Details Reviewed-on: #40	2026-02-16 23:56:04 +00:00
Torjus Håkestad	a013e80f1a	terraform: grant monitoring02 access to apiary-token secret Some checks failed Run nix flake check / flake-check (push) Failing after 3m59s Details Run nix flake check / flake-check (pull_request) Failing after 4m20s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	4cbaa33475	monitoring02: add Caddy reverse proxy for VictoriaMetrics and vmalert Add metrics.home.2rjus.net and vmalert.home.2rjus.net CNAMEs with Caddy TLS termination via internal ACME CA. Refactors Grafana's Caddy config from configFile to globalConfig + virtualHosts so both modules can contribute routes to the same Caddy instance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	e329f87b0b	monitoring02: add VictoriaMetrics, vmalert, and Alertmanager Set up the core metrics stack on monitoring02 as Phase 2 of the monitoring migration. VictoriaMetrics replaces Prometheus with identical scrape configs (22 jobs including auto-generated targets). - VictoriaMetrics with 3-month retention and all scrape configs - vmalert evaluating existing rules.yml (notifier disabled) - Alertmanager with same routing config (no alerts during parallel op) - Grafana datasources updated: local VictoriaMetrics as default - Static user override for credential file access (OpenBao, Apiary) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	c151f31011	grafana: fix apiary dashboard panels empty on short time ranges Some checks failed Run nix flake check / flake-check (push) Failing after 3m54s Details Set interval=60s on rate() panels to match the actual Prometheus scrape interval, so Grafana calculates $__rate_interval correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 20:03:26 +01:00
torjus-bot	f5362d6936	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/6c5e707c6b5339359a9a9e215c5e66d6d802fd7a?narHash=sha256-iKZMkr6Cm9JzWlRYW/VPoL0A9jVKtZYiU4zSrVeetIs%3D' (2026-02-11) → 'github:nixos/nixpkgs/3aadb7ca9eac2891d52a9dec199d9580a6e2bf44?narHash=sha256-O1XDr7EWbRp%2BkHrNNgLWgIrB0/US5wvw9K6RERWAj6I%3D' (2026-02-14) • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/ec7c70d12ce2fc37cb92aff673dcdca89d187bae?narHash=sha256-9xejG0KoqsoKEGp2kVbXRlEYtFFcDTHjidiuX8hGO44%3D' (2026-02-11) → 'github:nixos/nixpkgs/a82ccc39b39b621151d6732718e3e250109076fa?narHash=sha256-gf2AmWVTs8lEq7z/3ZAsgnZDhWIckkb%2BZnAo5RzSxJg%3D' (2026-02-13)	2026-02-16 00:07:10 +00:00
Torjus Håkestad	3e7aabc73a	grafana: fix apiary geomap and make it full-width Some checks failed Run nix flake check / flake-check (push) Failing after 5m6s Details Periodic flake update / flake-update (push) Successful in 5m25s Details Add gazetteer reference for country code lookup resolution. Remove unnecessary reduce transformation. Make geomap panel full-width (24 cols) and taller (h=10) on its own row. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:36:24 +01:00
Torjus Håkestad	361e7f2a1b	grafana: add apiary honeypot dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:31:06 +01:00
Torjus Håkestad	1942591d2e	monitoring: add apiary metrics scraping with bearer token auth Some checks failed Run nix flake check / flake-check (push) Failing after 12m52s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:36:26 +01:00
Torjus Håkestad	4d614d8716	docs: add new service candidates and NixOS router plans Some checks failed Run nix flake check / flake-check (push) Failing after 3m22s Details Periodic flake update / flake-update (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 13:21:34 +01:00
torjus-bot	fd7caf7f00	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/d6c71932130818840fc8fe9509cf50be8c64634f?narHash=sha256-ub1gpAONMFsT/GU2hV6ZWJjur8rJ6kKxdm9IlCT0j84%3D' (2026-02-08) → 'github:nixos/nixpkgs/ec7c70d12ce2fc37cb92aff673dcdca89d187bae?narHash=sha256-9xejG0KoqsoKEGp2kVbXRlEYtFFcDTHjidiuX8hGO44%3D' (2026-02-11)	2026-02-14 00:01:24 +00:00
Torjus Håkestad	af8e385b6e	docs: finalize remote access plan with WireGuard gateway design Some checks failed Run nix flake check / flake-check (push) Failing after 21m7s Details Periodic flake update / flake-update (push) Successful in 2m16s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 00:31:52 +01:00
Torjus Håkestad	0db9fc6802	docs: update Loki improvements plan with implementation status Some checks failed Run nix flake check / flake-check (push) Failing after 13m55s Details Mark retention, limits, labels, and level mapping as done. Add JSON logging audit results with per-service details. Update current state and disk usage notes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 00:04:16 +01:00
Torjus Håkestad	5d68662035	loki: add 30-day retention policy and ingestion limits Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Enable compactor-based retention with 30-day period to prevent unbounded disk growth. Add basic rate limits and stream guards to protect against runaway log generators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 23:55:27 +01:00

1 2 3 4 5 ...

1031 Commits