nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	55da459108	docs: add plan for local NTP with chrony Some checks failed Run nix flake check / flake-check (push) Failing after 9m52s Details Periodic flake update / flake-update (push) Successful in 5m19s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 19:33:28 +01:00
Torjus Håkestad	813c5c0f29	monitoring: separate node-exporter-only external targets Some checks failed Run nix flake check / flake-check (push) Failing after 3m7s Details Add nodeExporterOnly list to external-targets.nix for hosts that have node-exporter but not systemd-exporter (e.g. pve1). This prevents a down target in the systemd-exporter scrape job. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 19:17:39 +01:00
Torjus Håkestad	013ab8f621	monitoring: add pve1 node-exporter scrape target Some checks failed Run nix flake check / flake-check (push) Failing after 4m6s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 19:10:54 +01:00
torjus-bot	f75b773485	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/dd9b079222d43e1943b6ebd802f04fd959dc8e61?narHash=sha256-I45esRSssFtJ8p/gLHUZ1OUaaTaVLluNkABkk6arQwE%3D' (2026-02-27) → 'github:nixos/nixpkgs/cf59864ef8aa2e178cccedbe2c178185b0365705?narHash=sha256-izhTDFKsg6KeVBxJS9EblGeQ8y%2BO8eCa6RcW874vxEc%3D' (2026-03-02)	2026-03-03 00:07:07 +00:00
torjus-bot	58c3844950	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/2fc6539b481e1d2569f25f8799236694180c0993?narHash=sha256-0MAd%2B0mun3K/Ns8JATeHT1sX28faLII5hVLq0L3BdZU%3D' (2026-02-23) → 'github:nixos/nixpkgs/dd9b079222d43e1943b6ebd802f04fd959dc8e61?narHash=sha256-I45esRSssFtJ8p/gLHUZ1OUaaTaVLluNkABkk6arQwE%3D' (2026-02-27)	2026-03-01 00:01:26 +00:00
torjus-bot	80e5fa08fa	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/e764fc9a405871f1f6ca3d1394fb422e0a0c3951?narHash=sha256-sdaqdnsQCv3iifzxwB22tUwN/fSHoN7j2myFW5EIkGk%3D' (2026-02-24) → 'github:nixos/nixpkgs/1267bb4920d0fc06ea916734c11b0bf004bbe17e?narHash=sha256-7DaQVv4R97cii/Qdfy4tmDZMB2xxtyIvNGSwXBBhSmo%3D' (2026-02-25)	2026-02-28 00:07:22 +00:00
Torjus Håkestad	cf55d07ce5	docs: update pn51 stability with third freeze and conclusion Some checks failed Run nix flake check / flake-check (push) Failing after 4m1s Details Periodic flake update / flake-update (push) Successful in 5m37s Details pn02 crashed again after ~2d21h uptime despite all mitigations (amdgpu blacklist, max_cstate=1, NMI watchdog, rasdaemon). NMI watchdog didn't fire and rasdaemon recorded nothing, confirming hard lockup below NMI level. Unit is unreliable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 18:25:52 +01:00
torjus-bot	4941e38dac	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/afbbf774e2087c3d734266c22f96fca2e78d3620?narHash=sha256-nhZJPnBavtu40/L2aqpljrfUNb2rxmWTmSjK2c9UKds%3D' (2026-02-21) → 'github:nixos/nixpkgs/e764fc9a405871f1f6ca3d1394fb422e0a0c3951?narHash=sha256-sdaqdnsQCv3iifzxwB22tUwN/fSHoN7j2myFW5EIkGk%3D' (2026-02-24) • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/0182a361324364ae3f436a63005877674cf45efb?narHash=sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ%3D' (2026-02-17) → 'github:nixos/nixpkgs/2fc6539b481e1d2569f25f8799236694180c0993?narHash=sha256-0MAd%2B0mun3K/Ns8JATeHT1sX28faLII5hVLq0L3BdZU%3D' (2026-02-23)	2026-02-25 00:07:00 +00:00
torjus-bot	03ffcc1ad0	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/c217913993d6c6f6805c3b1a3bda5e639adfde6d?narHash=sha256-D1PA3xQv/s4W3lnR9yJFSld8UOLr0a/cBWMQMXS%2B1Qg%3D' (2026-02-20) → 'github:nixos/nixpkgs/afbbf774e2087c3d734266c22f96fca2e78d3620?narHash=sha256-nhZJPnBavtu40/L2aqpljrfUNb2rxmWTmSjK2c9UKds%3D' (2026-02-21)	2026-02-24 00:01:35 +00:00
Torjus Håkestad	5e92eb3220	docs: add plan for NixOS OpenStack image Some checks failed Run nix flake check / flake-check (push) Failing after 8m1s Details Periodic flake update / flake-update (push) Successful in 2m23s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 00:42:19 +01:00
torjus-bot	2321e191a2	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47?narHash=sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU%3D' (2026-02-18) → 'github:nixos/nixpkgs/c217913993d6c6f6805c3b1a3bda5e639adfde6d?narHash=sha256-D1PA3xQv/s4W3lnR9yJFSld8UOLr0a/cBWMQMXS%2B1Qg%3D' (2026-02-20)	2026-02-23 00:01:30 +00:00
Torjus Håkestad	136116ab33	pn02: limit CPU to C1 power state for stability Some checks failed Run nix flake check / flake-check (push) Failing after 6m36s Details Periodic flake update / flake-update (push) Successful in 2m18s Details Known PN51 platform issue with deep C-states causing freezes. Limit to C1 to prevent deeper sleep states. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:58:41 +01:00
Torjus Håkestad	c8cadd09c5	pn51: document diagnostic config (rasdaemon, NMI watchdog, panic) Some checks failed Run nix flake check / flake-check (push) Failing after 4m3s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:52:34 +01:00
Torjus Håkestad	72acaa872b	pn02: add panic on lockup, NMI watchdog, and rasdaemon Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Enable kernel panic on soft/hard lockups with auto-reboot after 10s, and rasdaemon for hardware error logging. Should give us diagnostic data on the next freeze. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:48:21 +01:00
Torjus Håkestad	a7c1ce932d	pn51: add remaining debug steps and auto-recovery fallback Some checks failed Run nix flake check / flake-check (push) Failing after 5m4s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:38:17 +01:00
Torjus Håkestad	2b42145d94	pn51: document BIOS tweaks, second pn02 freeze, amdgpu blacklist Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:28:19 +01:00
Torjus Håkestad	05e8556bda	pn02: blacklist amdgpu kernel module for stability testing Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details pn02 continues to hard freeze with no log evidence. Blacklisting the GPU driver to eliminate GPU/PSP firmware interactions as a possible cause. Console output will be lost but the host is managed over SSH. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:27:05 +01:00
Torjus Håkestad	75fdd7ae40	pn51: document stress test pass and TSC runtime test failure Some checks failed Run nix flake check / flake-check (push) Failing after 17m0s Details Both units survived 1h stress test at 80-85C. TSC clocksource is genuinely unstable at runtime (not just boot), HPET is the correct fallback for this platform. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 11:52:34 +01:00
Torjus Håkestad	5346889b73	pn51: add TSC runtime switch test to next steps Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 11:50:30 +01:00
Torjus Håkestad	7e19f51dfa	nix: move experimental-features to system/nix.nix Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details All hosts had identical nix-command/flakes settings in their configuration.nix. Centralize in system/nix.nix so new hosts (like pn01/pn02) get it automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 10:27:53 +01:00
Torjus Håkestad	9f7aab86a0	pn51: update stability notes, TSC/PSP issues affect both units Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 09:25:28 +01:00
Torjus Håkestad	bb53b922fa	plans: add NixOS hypervisor plan (Incus on PN51s) Some checks failed Run nix flake check / flake-check (push) Failing after 5m40s Details Periodic flake update / flake-update (push) Failing after 4s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 00:47:09 +01:00
Torjus Håkestad	75cd7c6c2d	docs: add PN51 stability testing notes Some checks failed Run nix flake check / flake-check (push) Failing after 12m3s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 00:24:28 +01:00
Torjus Håkestad	72c3a938b0	hosts: enable vault on pn01 and pn02 Some checks failed Run nix flake check / flake-check (push) Failing after 10m12s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:56:05 +01:00
Torjus Håkestad	2f89d564f7	vault: add approles for pn01/pn02, fix provision playbook Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Add pn01 and pn02 to hosts-generated.tf for Vault AppRole access. Fix provision-approle.yml: the localhost play was skipped when using -l filter, since localhost didn't match the target. Merged into a single play using delegate_to: localhost for the bao commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:51:56 +01:00
Torjus Håkestad	4a83363ee5	hosts: add pn01 and pn02 (ASUS PN51 mini PCs) Some checks failed Run nix flake check / flake-check (push) Failing after 5m33s Details Add two ASUS PN51 hosts on VLAN 12 for stability testing. pn01 at 10.69.12.60, pn02 at 10.69.12.61, both test-tier compute role. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:37:14 +01:00
Torjus Håkestad	b578520905	media-pc: add JellyCon, display server, and HDR decisions Some checks failed Run nix flake check / flake-check (push) Failing after 4m45s Details Periodic flake update / flake-update (push) Successful in 2m16s Details Decided on Kodi + JellyCon with NFS direct path for media playback, Sway/Hyprland for display server with workspace-based browser switching, and noted HDR status for future reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 00:08:19 +01:00
Torjus Håkestad	8a5aa1c4f5	plans: add media PC replacement plan, update router hardware candidates Some checks failed Run nix flake check / flake-check (push) Failing after 4m30s Details New plan for replacing the media PC (i7-4770K/Ubuntu) with a NixOS mini PC running Kodi. Router plan updated with specific AliExpress hardware options and IDS/IPS considerations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 23:54:29 +01:00
Torjus Håkestad	0f8c4783a8	truenas-migration: drive trays ordered, resolve open question Some checks failed Run nix flake check / flake-check (push) Failing after 3m18s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 19:29:12 +01:00
Torjus Håkestad	2ca2509083	monitoring: increase filesystem_filling_up prediction window to 24h Some checks failed Run nix flake check / flake-check (push) Failing after 3m55s Details Reduces false positives from transient Nix store growth by basing the linear prediction on a 24h trend instead of 6h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 09:36:27 +01:00
Torjus Håkestad	58702bd10b	truenas-migration: note subnet issue for 10GbE traffic Some checks failed Run nix flake check / flake-check (push) Failing after 7m10s Details NAS and Proxmox are on the same 10GbE switch but different subnets, forcing traffic through the router. Need to fix during migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
Torjus Håkestad	c9f47acb01	truenas-migration: mdadm boot mirror, clean zfs export step Use TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root to keep the boot path ZFS-independent. Added zfs export step before shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
Torjus Håkestad	09ce018fb2	truenas-migration: switch from BTRFS to keeping ZFS, update plan BTRFS RAID5/6 write hole is still unresolved, and RAID1 wastes capacity with mixed disk sizes. Keep existing ZFS pool and import directly on NixOS instead. Updated migration strategy, disk purchase decision (2x 24TB ordered), SMART health notes, and vdev rebalancing guidance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
torjus-bot	3042803c4d	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/fa56d7d6de78f5a7f997b0ea2bc6efd5868ad9e8?narHash=sha256-X01Q3DgSpjeBpapoGA4rzKOn25qdKxbPnxHeMLNoHTU%3D' (2026-02-16) → 'github:nixos/nixpkgs/6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47?narHash=sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU%3D' (2026-02-18)	2026-02-20 00:07:01 +00:00
Torjus Håkestad	1e7200b494	quick-plan: add mermaid diagram guideline Some checks failed Run nix flake check / flake-check (push) Failing after 5m7s Details Periodic flake update / flake-update (push) Successful in 5m26s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:35:53 +01:00
Torjus Håkestad	eec1e374b2	docs: simplify mermaid diagram labels Some checks failed Run nix flake check / flake-check (push) Failing after 4m0s Details Use <br/> for line breaks and shorter node labels so the diagram renders cleanly in Gitea. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:29:52 +01:00
Torjus Håkestad	fcc410afad	docs: replace ASCII diagram with mermaid in remote-access plan Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:28:57 +01:00
Torjus Håkestad	59f0c7ceda	flake.lock: update homelab-deploy Some checks failed Run nix flake check / flake-check (push) Failing after 8m10s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 09:04:03 +01:00
torjus-bot	d713f06c6e	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/a82ccc39b39b621151d6732718e3e250109076fa?narHash=sha256-gf2AmWVTs8lEq7z/3ZAsgnZDhWIckkb%2BZnAo5RzSxJg%3D' (2026-02-13) → 'github:nixos/nixpkgs/0182a361324364ae3f436a63005877674cf45efb?narHash=sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ%3D' (2026-02-17)	2026-02-19 00:01:44 +00:00
Torjus Håkestad	7374d1ff7f	nix-cache02: increase builder timeout to 4 hours Some checks failed Run nix flake check / flake-check (push) Failing after 4m4s Details Periodic flake update / flake-update (push) Successful in 2m32s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:53:33 +01:00
torjus-bot	e912c75b6c	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/3aadb7ca9eac2891d52a9dec199d9580a6e2bf44?narHash=sha256-O1XDr7EWbRp%2BkHrNNgLWgIrB0/US5wvw9K6RERWAj6I%3D' (2026-02-14) → 'github:nixos/nixpkgs/fa56d7d6de78f5a7f997b0ea2bc6efd5868ad9e8?narHash=sha256-X01Q3DgSpjeBpapoGA4rzKOn25qdKxbPnxHeMLNoHTU%3D' (2026-02-16)	2026-02-18 00:01:34 +00:00
Torjus Håkestad	b218b4f8bc	docs: update migration plan for monitoring01 and pgdb1 completion Some checks failed Run nix flake check / flake-check (push) Failing after 16m37s Details Periodic flake update / flake-update (push) Successful in 2m21s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:26:23 +01:00
Torjus Håkestad	65acf13e6f	grafana: fix datasource UIDs for VictoriaMetrics migration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Update all dashboard datasource references from "prometheus" to "victoriametrics" to match the declared datasource UID. Enable prune and deleteDatasources to clean up the old Prometheus (monitoring01) datasource from Grafana's database. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:23:04 +01:00
Torjus Håkestad	95a96b2192	Merge pull request 'monitoring01: remove host and migrate services to monitoring02' (#43 ) from cleanup-monitoring01 into master Some checks failed Run nix flake check / flake-check (push) Failing after 4m2s Details Reviewed-on: #43	2026-02-17 21:08:00 +00:00
Torjus Håkestad	4f593126c0	monitoring01: remove host and migrate services to monitoring02 Some checks failed Run nix flake check / flake-check (push) Failing after 3m15s Details Run nix flake check / flake-check (pull_request) Failing after 3m8s Details Remove monitoring01 host configuration and unused service modules (prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox, exportarr, and pve exporters to monitoring02 with scrape configs moved to VictoriaMetrics. Update alert rules, terraform vault policies/secrets, http-proxy entries, and documentation to reflect the monitoring02 migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:50:20 +01:00
Torjus Håkestad	1bba6f106a	Merge pull request 'monitoring02: enable alerting and migrate CNAMEs from http-proxy' (#42 ) from monitoring02-enable-alerting into master Some checks failed Run nix flake check / flake-check (push) Failing after 5m5s Details Reviewed-on: #42	2026-02-17 20:24:16 +00:00
Torjus Håkestad	a6013d3950	monitoring02: enable alerting and migrate CNAMEs from http-proxy Some checks failed Run nix flake check / flake-check (push) Failing after 6m25s Details Run nix flake check / flake-check (pull_request) Failing after 3m52s Details - Switch vmalert from blackhole mode to sending alerts to local Alertmanager - Import alerttonotify service so alerts route to NATS notifications - Move alertmanager and grafana CNAMEs from http-proxy to monitoring02 - Add monitoring CNAME to monitoring02 - Add Caddy reverse proxy entries for alertmanager and grafana - Remove prometheus, alertmanager, and grafana Caddy entries from http-proxy (now served directly by monitoring02) - Move monitoring02 Vault AppRole to hosts-generated.tf with extra_policies support and prometheus-metrics policy - Update Promtail to use authenticated loki.home.2rjus.net endpoint only (remove unauthenticated monitoring01 client) - Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with basic auth from Vault secret - Move migration plan to completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:23:21 +01:00
Torjus Håkestad	7f69c0738a	Merge pull request 'loki-monitoring02' (#41 ) from loki-monitoring02 into master Some checks failed Run nix flake check / flake-check (push) Failing after 8m20s Details Reviewed-on: #41	2026-02-17 19:40:33 +00:00
Torjus Håkestad	35924c7b01	mcp: move config to .mcp.json.example, gitignore real config Some checks failed Run nix flake check / flake-check (push) Failing after 15m57s Details Run nix flake check / flake-check (pull_request) Failing after 16m45s Details The real .mcp.json now contains Loki credentials for basic auth, so it should not be committed. The example file has placeholders. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:35:14 +01:00
Torjus Håkestad	87d8571d62	promtail: fix vault secret ownership for loki auth Some checks failed Run nix flake check / flake-check (push) Failing after 12m24s Details The secret file needs to be owned by promtail since Promtail runs as a dedicated user and can't read root-owned files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:17:02 +01:00

1 2 3 4 5 ...

1000 Commits