43 Commits

Author SHA1 Message Date
6a3e78a479 nrec-nixos01: enable Git LFS and hide explore page
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m47s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 15:10:59 +01:00
cfc0c6f6cb nrec-nixos01: add Forgejo with Caddy reverse proxy
Some checks failed
Run nix flake check / flake-check (push) Failing after 5m6s
Run nix flake check / flake-check (pull_request) Failing after 4m31s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 14:49:48 +01:00
822380695e nrec-nixos01: import qemu-guest profile for virtio modules
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m6s
The initrd was missing virtio drivers, preventing the root
filesystem from being detected during boot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 14:31:09 +01:00
0941bd52f5 nrec-nixos01: fix root filesystem device to use label
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m22s
The OpenStack image labels the root partition "nixos", so use
/dev/disk/by-label/nixos instead of /dev/vda1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 14:22:24 +01:00
9ebdd94773 Merge pull request 'nrec-nixos01' (#44) from nrec-nixos01 into master
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Reviewed-on: #44
2026-03-08 13:12:24 +00:00
adc267bd95 nrec-nixos01: add host configuration with Caddy web server
Some checks failed
Run nix flake check / flake-check (push) Failing after 9m20s
Run nix flake check / flake-check (pull_request) Failing after 3m58s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 14:10:05 +01:00
7ffe2d71d6 openstack-template: add minimal NixOS image for OpenStack
Adds a new host configuration for building qcow2 images targeting
OpenStack (NREC). Uses a nixos user with SSH key and sudo instead
of root login, firewall enabled, and no internal services.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 13:56:55 +01:00
dd9ba42eb5 devshell: add openstack cli client
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m16s
2026-03-08 13:31:54 +01:00
3ee0433a6f flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/fabb8c9deee281e50b1065002c9828f2cf7b2239?narHash=sha256-YaHht/C35INEX3DeJQNWjNaTcPjYmBwwjFJ2jdtr%2B5U%3D' (2026-03-04)
  → 'github:nixos/nixpkgs/71caefce12ba78d84fe618cf61644dce01cf3a96?narHash=sha256-yf3iYLGbGVlIthlQIk5/4/EQDZNNEmuqKZkQssMljuw%3D' (2026-03-06)
• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/80bdc1e5ce51f56b19791b52b2901187931f5353?narHash=sha256-QKyJ0QGWBn6r0invrMAK8dmJoBYWoOWy7lN%2BUHzW1jc%3D' (2026-03-04)
  → 'github:nixos/nixpkgs/aca4d95fce4914b3892661bcb80b8087293536c6?narHash=sha256-E1bxHxNKfDoQUuvriG71%2Bf%2Bs/NT0qWkImXsYZNFFfCs%3D' (2026-03-06)
2026-03-08 00:02:42 +00:00
73d804105b pn01, pn02: enable memtest86 and update stability docs
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m15s
Periodic flake update / flake-update (push) Successful in 2m50s
Enable memtest86 in systemd-boot menu on both PN51 units to allow
extended memory testing. Update stability document with March crash
data from pstore/Loki — crashes now traced to sched_ext scheduler
kernel oops, suggesting possible memory corruption.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:02:28 +01:00
d2a4e4a0a1 grafana: add storage query performance panels to apiary dashboard
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m23s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:47:30 +01:00
28eba49d68 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/8c809a146a140c5c8806f13399592dbcb1bb5dc4?narHash=sha256-WGV2hy%2BVIeQsYXpsLjdr4GvHv5eECMISX1zKLTedhdg%3D' (2026-03-03)
  → 'github:nixos/nixpkgs/80bdc1e5ce51f56b19791b52b2901187931f5353?narHash=sha256-QKyJ0QGWBn6r0invrMAK8dmJoBYWoOWy7lN%2BUHzW1jc%3D' (2026-03-04)
2026-03-06 00:07:07 +00:00
4bf726a674 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/c581273b8d5bdf1c6ce7e0a54da9841e6a763913?narHash=sha256-ywy9troNEfpgh0Ee%2BzaV1UTgU8kYBVKtvPSxh6clYGU%3D' (2026-03-02)
  → 'github:nixos/nixpkgs/fabb8c9deee281e50b1065002c9828f2cf7b2239?narHash=sha256-YaHht/C35INEX3DeJQNWjNaTcPjYmBwwjFJ2jdtr%2B5U%3D' (2026-03-04)
2026-03-05 00:07:31 +00:00
774fd92524 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/1267bb4920d0fc06ea916734c11b0bf004bbe17e?narHash=sha256-7DaQVv4R97cii/Qdfy4tmDZMB2xxtyIvNGSwXBBhSmo%3D' (2026-02-25)
  → 'github:nixos/nixpkgs/c581273b8d5bdf1c6ce7e0a54da9841e6a763913?narHash=sha256-ywy9troNEfpgh0Ee%2BzaV1UTgU8kYBVKtvPSxh6clYGU%3D' (2026-03-02)
• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/cf59864ef8aa2e178cccedbe2c178185b0365705?narHash=sha256-izhTDFKsg6KeVBxJS9EblGeQ8y%2BO8eCa6RcW874vxEc%3D' (2026-03-02)
  → 'github:nixos/nixpkgs/8c809a146a140c5c8806f13399592dbcb1bb5dc4?narHash=sha256-WGV2hy%2BVIeQsYXpsLjdr4GvHv5eECMISX1zKLTedhdg%3D' (2026-03-03)
2026-03-04 00:06:56 +00:00
55da459108 docs: add plan for local NTP with chrony
Some checks failed
Run nix flake check / flake-check (push) Failing after 9m52s
Periodic flake update / flake-update (push) Successful in 5m19s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:33:28 +01:00
813c5c0f29 monitoring: separate node-exporter-only external targets
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m7s
Add nodeExporterOnly list to external-targets.nix for hosts that
have node-exporter but not systemd-exporter (e.g. pve1). This
prevents a down target in the systemd-exporter scrape job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:17:39 +01:00
013ab8f621 monitoring: add pve1 node-exporter scrape target
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m6s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:10:54 +01:00
f75b773485 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/dd9b079222d43e1943b6ebd802f04fd959dc8e61?narHash=sha256-I45esRSssFtJ8p/gLHUZ1OUaaTaVLluNkABkk6arQwE%3D' (2026-02-27)
  → 'github:nixos/nixpkgs/cf59864ef8aa2e178cccedbe2c178185b0365705?narHash=sha256-izhTDFKsg6KeVBxJS9EblGeQ8y%2BO8eCa6RcW874vxEc%3D' (2026-03-02)
2026-03-03 00:07:07 +00:00
58c3844950 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/2fc6539b481e1d2569f25f8799236694180c0993?narHash=sha256-0MAd%2B0mun3K/Ns8JATeHT1sX28faLII5hVLq0L3BdZU%3D' (2026-02-23)
  → 'github:nixos/nixpkgs/dd9b079222d43e1943b6ebd802f04fd959dc8e61?narHash=sha256-I45esRSssFtJ8p/gLHUZ1OUaaTaVLluNkABkk6arQwE%3D' (2026-02-27)
2026-03-01 00:01:26 +00:00
80e5fa08fa flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/e764fc9a405871f1f6ca3d1394fb422e0a0c3951?narHash=sha256-sdaqdnsQCv3iifzxwB22tUwN/fSHoN7j2myFW5EIkGk%3D' (2026-02-24)
  → 'github:nixos/nixpkgs/1267bb4920d0fc06ea916734c11b0bf004bbe17e?narHash=sha256-7DaQVv4R97cii/Qdfy4tmDZMB2xxtyIvNGSwXBBhSmo%3D' (2026-02-25)
2026-02-28 00:07:22 +00:00
cf55d07ce5 docs: update pn51 stability with third freeze and conclusion
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m1s
Periodic flake update / flake-update (push) Successful in 5m37s
pn02 crashed again after ~2d21h uptime despite all mitigations
(amdgpu blacklist, max_cstate=1, NMI watchdog, rasdaemon).
NMI watchdog didn't fire and rasdaemon recorded nothing,
confirming hard lockup below NMI level. Unit is unreliable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 18:25:52 +01:00
4941e38dac flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/afbbf774e2087c3d734266c22f96fca2e78d3620?narHash=sha256-nhZJPnBavtu40/L2aqpljrfUNb2rxmWTmSjK2c9UKds%3D' (2026-02-21)
  → 'github:nixos/nixpkgs/e764fc9a405871f1f6ca3d1394fb422e0a0c3951?narHash=sha256-sdaqdnsQCv3iifzxwB22tUwN/fSHoN7j2myFW5EIkGk%3D' (2026-02-24)
• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/0182a361324364ae3f436a63005877674cf45efb?narHash=sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ%3D' (2026-02-17)
  → 'github:nixos/nixpkgs/2fc6539b481e1d2569f25f8799236694180c0993?narHash=sha256-0MAd%2B0mun3K/Ns8JATeHT1sX28faLII5hVLq0L3BdZU%3D' (2026-02-23)
2026-02-25 00:07:00 +00:00
03ffcc1ad0 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/c217913993d6c6f6805c3b1a3bda5e639adfde6d?narHash=sha256-D1PA3xQv/s4W3lnR9yJFSld8UOLr0a/cBWMQMXS%2B1Qg%3D' (2026-02-20)
  → 'github:nixos/nixpkgs/afbbf774e2087c3d734266c22f96fca2e78d3620?narHash=sha256-nhZJPnBavtu40/L2aqpljrfUNb2rxmWTmSjK2c9UKds%3D' (2026-02-21)
2026-02-24 00:01:35 +00:00
5e92eb3220 docs: add plan for NixOS OpenStack image
Some checks failed
Run nix flake check / flake-check (push) Failing after 8m1s
Periodic flake update / flake-update (push) Successful in 2m23s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 00:42:19 +01:00
2321e191a2 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47?narHash=sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU%3D' (2026-02-18)
  → 'github:nixos/nixpkgs/c217913993d6c6f6805c3b1a3bda5e639adfde6d?narHash=sha256-D1PA3xQv/s4W3lnR9yJFSld8UOLr0a/cBWMQMXS%2B1Qg%3D' (2026-02-20)
2026-02-23 00:01:30 +00:00
136116ab33 pn02: limit CPU to C1 power state for stability
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m36s
Periodic flake update / flake-update (push) Successful in 2m18s
Known PN51 platform issue with deep C-states causing freezes.
Limit to C1 to prevent deeper sleep states.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:58:41 +01:00
c8cadd09c5 pn51: document diagnostic config (rasdaemon, NMI watchdog, panic)
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m3s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:52:34 +01:00
72acaa872b pn02: add panic on lockup, NMI watchdog, and rasdaemon
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Enable kernel panic on soft/hard lockups with auto-reboot after
10s, and rasdaemon for hardware error logging. Should give us
diagnostic data on the next freeze.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:48:21 +01:00
a7c1ce932d pn51: add remaining debug steps and auto-recovery fallback
Some checks failed
Run nix flake check / flake-check (push) Failing after 5m4s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:38:17 +01:00
2b42145d94 pn51: document BIOS tweaks, second pn02 freeze, amdgpu blacklist
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:28:19 +01:00
05e8556bda pn02: blacklist amdgpu kernel module for stability testing
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
pn02 continues to hard freeze with no log evidence. Blacklisting
the GPU driver to eliminate GPU/PSP firmware interactions as a
possible cause. Console output will be lost but the host is
managed over SSH.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:27:05 +01:00
75fdd7ae40 pn51: document stress test pass and TSC runtime test failure
Some checks failed
Run nix flake check / flake-check (push) Failing after 17m0s
Both units survived 1h stress test at 80-85C. TSC clocksource
is genuinely unstable at runtime (not just boot), HPET is the
correct fallback for this platform.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 11:52:34 +01:00
5346889b73 pn51: add TSC runtime switch test to next steps
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 11:50:30 +01:00
7e19f51dfa nix: move experimental-features to system/nix.nix
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
All hosts had identical nix-command/flakes settings in their
configuration.nix. Centralize in system/nix.nix so new hosts
(like pn01/pn02) get it automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 10:27:53 +01:00
9f7aab86a0 pn51: update stability notes, TSC/PSP issues affect both units
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 09:25:28 +01:00
bb53b922fa plans: add NixOS hypervisor plan (Incus on PN51s)
Some checks failed
Run nix flake check / flake-check (push) Failing after 5m40s
Periodic flake update / flake-update (push) Failing after 4s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:47:09 +01:00
75cd7c6c2d docs: add PN51 stability testing notes
Some checks failed
Run nix flake check / flake-check (push) Failing after 12m3s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:24:28 +01:00
72c3a938b0 hosts: enable vault on pn01 and pn02
Some checks failed
Run nix flake check / flake-check (push) Failing after 10m12s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:56:05 +01:00
2f89d564f7 vault: add approles for pn01/pn02, fix provision playbook
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Add pn01 and pn02 to hosts-generated.tf for Vault AppRole access.

Fix provision-approle.yml: the localhost play was skipped when using
-l filter, since localhost didn't match the target. Merged into a
single play using delegate_to: localhost for the bao commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:51:56 +01:00
4a83363ee5 hosts: add pn01 and pn02 (ASUS PN51 mini PCs)
Some checks failed
Run nix flake check / flake-check (push) Failing after 5m33s
Add two ASUS PN51 hosts on VLAN 12 for stability testing.
pn01 at 10.69.12.60, pn02 at 10.69.12.61, both test-tier compute role.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:37:14 +01:00
b578520905 media-pc: add JellyCon, display server, and HDR decisions
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m45s
Periodic flake update / flake-update (push) Successful in 2m16s
Decided on Kodi + JellyCon with NFS direct path for media playback,
Sway/Hyprland for display server with workspace-based browser switching,
and noted HDR status for future reference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 00:08:19 +01:00
8a5aa1c4f5 plans: add media PC replacement plan, update router hardware candidates
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m30s
New plan for replacing the media PC (i7-4770K/Ubuntu) with a NixOS mini PC
running Kodi. Router plan updated with specific AliExpress hardware options
and IDS/IPS considerations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:54:29 +01:00
0f8c4783a8 truenas-migration: drive trays ordered, resolve open question
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m18s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 19:29:12 +01:00
43 changed files with 1500 additions and 104 deletions

View File

@@ -23,14 +23,12 @@
when: ansible_play_hosts | length != 1 when: ansible_play_hosts | length != 1
run_once: true run_once: true
- name: Fetch AppRole credentials from OpenBao - name: Provision AppRole credentials
hosts: localhost hosts: all
connection: local
gather_facts: false gather_facts: false
vars: vars:
target_host: "{{ groups['all'] | first }}" target_hostname: "{{ inventory_hostname.split('.')[0] }}"
target_hostname: "{{ hostvars[target_host]['short_hostname'] | default(target_host.split('.')[0]) }}"
tasks: tasks:
- name: Display target host - name: Display target host
@@ -45,6 +43,7 @@
BAO_SKIP_VERIFY: "1" BAO_SKIP_VERIFY: "1"
register: role_id_result register: role_id_result
changed_when: false changed_when: false
delegate_to: localhost
- name: Generate secret-id for host - name: Generate secret-id for host
ansible.builtin.command: ansible.builtin.command:
@@ -54,21 +53,8 @@
BAO_SKIP_VERIFY: "1" BAO_SKIP_VERIFY: "1"
register: secret_id_result register: secret_id_result
changed_when: true changed_when: true
delegate_to: localhost
- name: Store credentials for next play
ansible.builtin.set_fact:
vault_role_id: "{{ role_id_result.stdout }}"
vault_secret_id: "{{ secret_id_result.stdout }}"
- name: Deploy AppRole credentials to host
hosts: all
gather_facts: false
vars:
vault_role_id: "{{ hostvars['localhost']['vault_role_id'] }}"
vault_secret_id: "{{ hostvars['localhost']['vault_secret_id'] }}"
tasks:
- name: Create AppRole directory - name: Create AppRole directory
ansible.builtin.file: ansible.builtin.file:
path: /var/lib/vault/approle path: /var/lib/vault/approle
@@ -79,7 +65,7 @@
- name: Write role-id - name: Write role-id
ansible.builtin.copy: ansible.builtin.copy:
content: "{{ vault_role_id }}" content: "{{ role_id_result.stdout }}"
dest: /var/lib/vault/approle/role-id dest: /var/lib/vault/approle/role-id
mode: "0600" mode: "0600"
owner: root owner: root
@@ -87,7 +73,7 @@
- name: Write secret-id - name: Write secret-id
ansible.builtin.copy: ansible.builtin.copy:
content: "{{ vault_secret_id }}" content: "{{ secret_id_result.stdout }}"
dest: /var/lib/vault/approle/secret-id dest: /var/lib/vault/approle/secret-id
mode: "0600" mode: "0600"
owner: root owner: root

View File

@@ -0,0 +1,79 @@
# Local NTP with Chrony
## Overview/Goal
Set up pve1 as a local NTP server and switch all NixOS VMs from systemd-timesyncd to chrony, pointing at pve1 as the sole time source. This eliminates clock drift issues that cause false `host_reboot` alerts.
## Current State
- All NixOS hosts use `systemd-timesyncd` with default NixOS pool servers (`0.nixos.pool.ntp.org` etc.)
- No NTP/timesyncd configuration exists in the repo — all defaults
- pve1 (Proxmox, bare metal) already runs chrony but only as a client
- VMs drift noticeably — ns1 (~19ms) and jelly01 (~39ms) are worst offenders
- Clock step corrections from timesyncd trigger false `host_reboot` alerts via `changes(node_boot_time_seconds[10m]) > 0`
- pve1 itself stays at 0ms offset thanks to chrony
## Why systemd-timesyncd is Insufficient
- Minimal SNTP client, no proper clock discipline or frequency tracking
- Backs off polling interval when it thinks clock is stable, missing drift
- Corrects via step adjustments rather than gradual slewing, causing metric jumps
- Each VM resolves to different pool servers with varying accuracy
## Implementation Steps
### 1. Configure pve1 as NTP Server
Add to pve1's `/etc/chrony/chrony.conf`:
```
# Allow NTP clients from the infrastructure subnet
allow 10.69.13.0/24
```
Restart chrony on pve1.
### 2. Add Chrony to NixOS System Config
Create `system/chrony.nix` (applied to all hosts via system imports):
```nix
{
# Disable systemd-timesyncd (chrony takes over)
services.timesyncd.enable = false;
# Enable chrony pointing at pve1
services.chrony = {
enable = true;
servers = [ "pve1.home.2rjus.net" ];
serverOption = "iburst";
};
}
```
### 3. Optional: Add Chrony Exporter
For better visibility into NTP sync quality:
```nix
services.prometheus.exporters.chrony.enable = true;
```
Add chrony exporter scrape targets via `homelab.monitoring.scrapeTargets` and create a Grafana dashboard for NTP offset across all hosts.
### 4. Roll Out
- Deploy to a test-tier host first to verify
- Then deploy to all hosts via auto-upgrade
## Open Questions
- [ ] Does pve1's chrony config need `local stratum 10` as fallback if upstream is unreachable?
- [ ] Should we also enable `enableRTCTrimming` for the VMs?
- [ ] Worth adding a chrony exporter on pve1 as well (manual install like node-exporter)?
## Notes
- No fallback NTP servers needed on VMs — if pve1 is down, all VMs are down too
- The `host_reboot` alert rule (`changes(node_boot_time_seconds[10m]) > 0`) should stop false-firing once clock corrections are slewed instead of stepped
- pn01/pn02 are bare metal but still benefit from syncing to pve1 for consistency

View File

@@ -0,0 +1,244 @@
# Media PC Replacement
## Overview
Replace the aging Linux+Kodi media PC connected to the TV with a modern, compact solution. Primary use cases are Jellyfin/Kodi playback and watching Twitch/YouTube. The current machine (`media`, 10.69.31.50) is on VLAN 31.
## Current State
### Hardware
- **CPU**: Intel Core i7-4770K @ 3.50GHz (Haswell, 4C/8T, 2013)
- **GPU**: Nvidia GeForce GT 710 (Kepler, GK208B)
- **OS**: Ubuntu 22.04.5 LTS (Jammy)
- **Software**: Kodi
- **Network**: `media.home.2rjus.net` at `10.69.31.50` (VLAN 31)
### Control & Display
- **Input**: Wireless keyboard (works well, useful for browser)
- **TV**: 1080p (no 4K/HDR currently, but may upgrade TV later)
- **Audio**: Surround system connected via HDMI ARC from TV (PC → HDMI → TV → ARC → surround)
### Notes on Current Hardware
- The i7-4770K is massively overpowered for media playback — it's a full desktop CPU from 2013
- The GT 710 is a low-end passive GPU; supports NVDEC for H.264/H.265 hardware decode but limited to 4K@30Hz over HDMI 1.4
- Ubuntu 22.04 is approaching EOL (April 2027) and is not managed by this repo
- The whole system is likely in a full-size or mid-tower case — not ideal for a TV setup
### Integration
- **Media source**: Jellyfin on `jelly01` (10.69.13.14) serves media from NAS via NFS
- **DNS**: A record in `services/ns/external-hosts.nix`
- **Not managed**: Not a NixOS host in this repo, no monitoring/auto-updates
## Options
### Option 1: Dedicated Streaming Device (Apple TV / Nvidia Shield)
| Aspect | Apple TV 4K | Nvidia Shield Pro |
|--------|-------------|-------------------|
| **Price** | ~$130-180 | ~$200 |
| **Jellyfin** | Swiftfin app (good) | Jellyfin Android TV (good) |
| **Kodi** | Not available (tvOS) | Full Kodi support |
| **Twitch** | Native app | Native app |
| **YouTube** | Native app | Native app |
| **HDR/DV** | Dolby Vision + HDR10 | Dolby Vision + HDR10 |
| **4K** | Yes | Yes |
| **Form factor** | Tiny, silent | Small, silent |
| **Remote** | Excellent Siri remote | Decent, supports CEC |
| **Homelab integration** | None | Minimal (Plex/Kodi only) |
**Pros:**
- Zero maintenance - appliance experience
- Excellent app ecosystem (native Twitch, YouTube, streaming services)
- Silent, tiny form factor
- Great remote control / CEC support
- Hardware-accelerated codec support out of the box
**Cons:**
- No NixOS management, monitoring, or auto-updates
- Can't run arbitrary software
- Jellyfin clients are decent but not as mature as Kodi
- Vendor lock-in (Apple ecosystem / Google ecosystem)
- No SSH access for troubleshooting
### Option 2: NixOS Mini PC (Kodi Appliance)
A small form factor PC (Intel NUC, Beelink, MinisForum, etc.) running NixOS with Kodi as the desktop environment.
**NixOS has built-in support:**
- `services.xserver.desktopManager.kodi.enable` - boots directly into Kodi
- `kodi-gbm` package - Kodi with direct DRM/KMS rendering (no X11/Wayland needed)
- `kodiPackages.jellycon` - Jellyfin integration for Kodi
- `kodiPackages.sendtokodi` - plays streams via yt-dlp (Twitch, YouTube)
- `kodiPackages.inputstream-adaptive` - adaptive streaming support
**Example NixOS config sketch:**
```nix
{ pkgs, ... }:
{
services.xserver.desktopManager.kodi = {
enable = true;
package = pkgs.kodi.withPackages (p: [
p.jellycon
p.sendtokodi
p.inputstream-adaptive
]);
};
# Auto-login to Kodi session
services.displayManager.autoLogin = {
enable = true;
user = "kodi";
};
}
```
**Pros:**
- Full NixOS management (monitoring, auto-updates, vault, promtail)
- Kodi is a proven TV interface with excellent remote/CEC support
- JellyCon integrates Jellyfin library directly into Kodi
- Twitch/YouTube via sendtokodi + yt-dlp or Kodi browser addons
- Can run arbitrary services (e.g., Home Assistant dashboard)
- Declarative, reproducible config in this repo
**Cons:**
- More maintenance than an appliance
- NixOS + Kodi on bare metal needs GPU driver setup (Intel iGPU is usually fine)
- Kodi YouTube/Twitch addons are less polished than native apps
- Need to buy hardware (~$150-400 for a decent mini PC)
- Power consumption higher than a streaming device
### Option 3: NixOS Mini PC (Wayland Desktop)
A mini PC running NixOS with a lightweight Wayland compositor, launching Kodi for media and a browser for Twitch/YouTube.
**Pros:**
- Best of both worlds: Kodi for media, Firefox/Chromium for Twitch/YouTube
- Full NixOS management
- Can switch between Kodi and browser easily
- Native web experience for streaming sites
**Cons:**
- More complex setup (compositor + Kodi + browser)
- Harder to get a good "10-foot UI" experience
- Keyboard/mouse may be needed alongside remote
- Significantly more maintenance
## Comparison
| Criteria | Dedicated Device | NixOS Kodi | NixOS Desktop |
|----------|-----------------|------------|---------------|
| **Maintenance** | None | Low | Medium |
| **Media experience** | Excellent | Excellent | Good |
| **Twitch/YouTube** | Excellent (native apps) | Good (addons/yt-dlp) | Excellent (browser) |
| **Homelab integration** | None | Full | Full |
| **Form factor** | Tiny | Small | Small |
| **Cost** | $130-200 | $150-400 | $150-400 |
| **Silent operation** | Yes | Likely (fanless options) | Likely |
| **CEC remote** | Yes | Yes (Kodi) | Partial |
## Decision: NixOS Mini PC with Kodi (Option 2)
**Rationale:**
- Already comfortable with Kodi + wireless keyboard workflow
- Browser access for Twitch/YouTube is important — Kodi can launch a browser when needed
- Homelab integration comes for free (monitoring, auto-updates, vault)
- Natural fit alongside the other 16 NixOS hosts in this repo
- Dedicated devices lose the browser/keyboard workflow
### Display Server: Sway/Hyprland
Options evaluated:
| Approach | Pros | Cons |
|----------|------|------|
| Cage (kiosk) | Simplest, single-app | No browser without TTY switching |
| kodi-gbm (no compositor) | Best HDR support | No browser at all, ALSA-only audio |
| **Sway/Hyprland** | **Workspace switching, VA-API in browser** | **Slightly more config** |
| Full DE (GNOME/KDE) | Everything works | Overkill, heavy |
**Decision: Sway or Hyprland** (Hyprland preferred — same as desktop)
- Kodi fullscreen on workspace 1, Firefox on workspace 2
- Switch via keybinding on wireless keyboard
- Auto-start both on login via greetd
- Minimal config — no bar, no decorations, just workspaces
- VA-API hardware decode works in Firefox on Wayland (important for YouTube/Twitch)
- Can revisit kodi-gbm later if HDR becomes a priority (just a config change)
### Twitch/YouTube
Firefox on workspace 2, switched to via keyboard. Kodi addons (sendtokodi, YouTube plugin) available as secondary options but a real browser is the primary approach.
### Media Playback: Kodi + JellyCon + NFS Direct Path
Three options were evaluated for media playback:
| Approach | Transcoding | Library management | Watch state sync |
|----------|-------------|-------------------|-----------------|
| Jellyfin only (browser) | Yes — browsers lack codec support for DTS, PGS subs, etc. | Jellyfin | Jellyfin |
| Kodi + NFS only | No — Kodi plays everything natively | Kodi local DB | None |
| **Kodi + JellyCon + NFS** | **No — Kodi's native player, direct path via NFS** | **Jellyfin** | **Jellyfin** |
**Decision: Kodi + JellyCon with NFS direct path**
- JellyCon presents the Jellyfin library inside Kodi's UI (browse, search, metadata, artwork)
- Playback uses Kodi's native player — direct play, no transcoding, full codec support including surround passthrough
- JellyCon's "direct path" mode maps Jellyfin paths to local NFS mounts, so playback goes straight over NFS without streaming through Jellyfin's HTTP layer
- Watch state, resume position, etc. sync back to Jellyfin — accessible from other devices too
- NFS mount follows the same pattern as jelly01 (`nas.home.2rjus.net:/mnt/hdd-pool/media`)
### Audio Passthrough
Kodi on NixOS supports HDMI audio passthrough for surround formats (AC3, DTS, etc.). The ARC chain (PC → HDMI → TV → ARC → surround) works transparently — Kodi just needs to be configured for passthrough rather than decoding audio locally.
## Hardware
### Leading Candidate: GMKtec G3
- **CPU**: Intel N100 (Alder Lake-N, 4C/4T)
- **RAM**: 16GB
- **Storage**: 512GB NVMe
- **Price**: ~NOK 2800 (~$250 USD)
- **Source**: AliExpress
The N100 supports hardware decode for all relevant 4K codecs:
| Codec | Support | Used by |
|-------|---------|---------|
| H.264/AVC | Yes (Quick Sync) | Older media |
| H.265/HEVC 10-bit | Yes (Quick Sync) | Most 4K media, HDR |
| VP9 | Yes (Quick Sync) | YouTube 4K |
| AV1 | Yes (Quick Sync) | YouTube, Twitch, newer encodes |
16GB RAM is comfortable for Kodi + browser + NixOS system services (node-exporter, promtail, etc.) with plenty of headroom.
### Key Requirements
- HDMI 2.0+ for 4K future-proofing (current TV is 1080p)
- Hardware video decode via VA-API / Intel Quick Sync
- HDR support (for future TV upgrade)
- Fanless or near-silent operation
## Implementation Steps
1. **Choose and order hardware**
2. **Create host configuration** (`hosts/media1/`)
- Kodi desktop manager with Jellyfin + streaming addons
- Intel/AMD iGPU driver and VA-API hardware decode
- HDMI audio passthrough for surround
- NFS mount for media (same pattern as jelly01)
- Browser package (Firefox/Chromium) for Twitch/YouTube fallback
- Standard system modules (monitoring, promtail, vault, auto-upgrade)
3. **Install NixOS** on the mini PC
4. **Configure Kodi** (Jellyfin server, addons, audio passthrough)
5. **Update DNS** - point `media.home.2rjus.net` to new IP (or keep on VLAN 31)
6. **Retire old media PC**
## Open Questions
- [x] What are the current media PC specs? — i7-4770K, GT 710, Ubuntu 22.04. Overkill CPU, weak GPU, large form factor. Not worth reusing if goal is compact/silent.
- [x] VLAN? — Keep on VLAN 31 for now, same as current media PC. Can revisit later.
- [x] Is CEC needed? — No, not using it currently. Can add later if desired.
- [x] Is 4K HDR output needed? — TV is 1080p now, but want 4K/HDR capability for future TV upgrade
- [x] Audio setup? — Surround system via HDMI ARC from TV. Media PC outputs HDMI to TV, TV passes audio to surround via ARC. Kodi/any player just needs HDMI audio output with surround passthrough.
- [x] Are there streaming service apps needed? — No. Only Twitch/YouTube, which work fine in any browser.
- [x] Budget? — ~NOK 2800 for GMKtec G3 (N100, 16GB, 512GB NVMe)

View File

@@ -0,0 +1,232 @@
# NixOS Hypervisor
## Overview
Experiment with running a NixOS-based hypervisor as an alternative/complement to the current Proxmox setup. Goal is better homelab integration — declarative config, monitoring, auto-updates — while retaining the ability to run VMs with a Terraform-like workflow.
## Motivation
- Proxmox works but doesn't integrate with the NixOS-managed homelab (no monitoring, no auto-updates, no vault, no declarative config)
- The PN51 units (once stable) are good candidates for experimentation — test-tier, plenty of RAM (32-64GB), 8C/16T
- Long-term: could reduce reliance on Proxmox or provide a secondary hypervisor pool
- **VM migration**: Currently all VMs (including both nameservers) run on a single Proxmox host. Being able to migrate VMs between hypervisors would allow rebooting a host for kernel updates without downtime for critical services like DNS.
## Hardware Candidates
| | pn01 | pn02 |
|---|---|---|
| **CPU** | Ryzen 7 5700U (8C/16T) | Ryzen 7 5700U (8C/16T) |
| **RAM** | 64GB (2x32GB) | 32GB (1x32GB, second slot available) |
| **Storage** | 1TB NVMe | 1TB SATA SSD (NVMe planned) |
| **Status** | Stability testing | Stability testing |
## Options
### Option 1: Incus
Fork of LXD (after Canonical made LXD proprietary). Supports both containers (LXC) and VMs (QEMU/KVM).
**NixOS integration:**
- `virtualisation.incus.enable` module in nixpkgs
- Manages storage pools, networks, and instances
- REST API for automation
- CLI tool (`incus`) for management
**Terraform integration:**
- `lxd` provider works with Incus (API-compatible)
- Dedicated `incus` Terraform provider also exists
- Can define VMs/containers in OpenTofu, similar to current Proxmox workflow
**Migration:**
- Built-in live and offline migration via `incus move <instance> --target <host>`
- Clustering makes hosts aware of each other — migration is a first-class operation
- Shared storage (NFS, Ceph) or Incus can transfer storage during migration
- Stateful stop-and-move also supported for offline migration
**Pros:**
- Supports both containers and VMs
- REST API + CLI for automation
- Built-in clustering and migration — closest to Proxmox experience
- Good NixOS module support
- Image-based workflow (can build NixOS images and import)
- Active development and community
**Cons:**
- Another abstraction layer on top of QEMU/KVM
- Less mature Terraform provider than libvirt
- Container networking can be complex
- NixOS guests in Incus VMs need some setup
### Option 2: libvirt/QEMU
Standard Linux virtualization stack. Thin wrapper around QEMU/KVM.
**NixOS integration:**
- `virtualisation.libvirtd.enable` module in nixpkgs
- Mature and well-tested
- virsh CLI for management
**Terraform integration:**
- `dmacvicar/libvirt` provider — mature, well-maintained
- Supports cloud-init, volume management, network config
- Very similar workflow to current Proxmox+OpenTofu setup
- Can reuse cloud-init patterns from existing `terraform/` config
**Migration:**
- Supports live and offline migration via `virsh migrate`
- Requires shared storage (NFS, Ceph, or similar) for live migration
- Requires matching CPU models between hosts (or CPU model masking)
- Works but is manual — no cluster awareness, must specify target URI
- No built-in orchestration for multi-host scenarios
**Pros:**
- Closest to current Proxmox+Terraform workflow
- Most mature Terraform provider
- Minimal abstraction — direct QEMU/KVM management
- Well-understood, massive community
- Cloud-init works identically to Proxmox workflow
- Can reuse existing template-building patterns
**Cons:**
- VMs only (no containers without adding LXC separately)
- No built-in REST API (would need to expose libvirt socket)
- No web UI without adding cockpit or virt-manager
- Migration works but requires manual setup — no clustering, no orchestration
- Less feature-rich than Incus for multi-host scenarios
### Option 3: microvm.nix
NixOS-native microVM framework. VMs defined as NixOS modules in the host's flake.
**NixOS integration:**
- VMs are NixOS configurations in the same flake
- Supports multiple backends: cloud-hypervisor, QEMU, firecracker, kvmtool
- Lightweight — shares host's nix store with guests via virtiofs
- Declarative network, storage, and resource allocation
**Terraform integration:**
- None — everything is defined in Nix
- Fundamentally different workflow from current Proxmox+Terraform approach
**Pros:**
- Most NixOS-native approach
- VMs defined right alongside host configs in this repo
- Very lightweight — fast boot, minimal overhead
- Shares nix store with host (no duplicate packages)
- No cloud-init needed — guest config is part of the flake
**Migration:**
- No migration support — VMs are tied to the host's NixOS config
- Moving a VM means rebuilding it on another host
**Cons:**
- Very niche, smaller community
- Different mental model from current workflow
- Only NixOS guests (no Ubuntu, FreeBSD, etc.)
- No Terraform integration
- No migration support
- Less isolation than full QEMU VMs
- Would need to learn a new deployment pattern
## Comparison
| Criteria | Incus | libvirt | microvm.nix |
|----------|-------|---------|-------------|
| **Workflow similarity** | Medium | High | Low |
| **Terraform support** | Yes (lxd/incus provider) | Yes (mature provider) | No |
| **NixOS module** | Yes | Yes | Yes |
| **Containers + VMs** | Both | VMs only | VMs only |
| **Non-NixOS guests** | Yes | Yes | No |
| **Live migration** | Built-in (first-class) | Yes (manual setup) | No |
| **Offline migration** | Built-in | Yes (manual setup) | No (rebuild) |
| **Clustering** | Built-in | Manual | No |
| **Learning curve** | Medium | Low | Medium |
| **Community/maturity** | Growing | Very mature | Niche |
| **Overhead** | Low | Minimal | Minimal |
## Recommendation
Start with **Incus**. Migration and clustering are key requirements:
- Built-in clustering makes two PN51s a proper hypervisor pool
- Live and offline migration are first-class operations, similar to Proxmox
- Can move VMs between hosts for maintenance (kernel updates, hardware work) without downtime
- Supports both containers and VMs — flexibility for future use
- Terraform provider exists (less mature than libvirt's, but functional)
- REST API enables automation beyond what Terraform covers
libvirt could achieve similar results but requires significantly more manual setup for migration and has no clustering awareness. For a two-node setup where migration is a priority, Incus provides much more out of the box.
**microvm.nix** is off the table given the migration requirement.
## Implementation Plan
### Phase 1: Single-Node Setup (on one PN51)
1. Enable `virtualisation.incus` on pn01 (or whichever is stable)
2. Initialize Incus (`incus admin init`) — configure storage pool (local NVMe) and network bridge
3. Configure bridge networking for VM traffic on VLAN 12
4. Build a NixOS VM image and import it into Incus
5. Create a test VM manually with `incus launch` to validate the setup
### Phase 2: Two-Node Cluster (PN51s only)
1. Enable Incus on the second PN51
2. Form a cluster between both nodes
3. Configure shared storage (NFS from NAS, or Ceph if warranted)
4. Test offline migration: `incus move <vm> --target <other-node>`
5. Test live migration with shared storage
6. CPU compatibility is not an issue here — both nodes have identical Ryzen 7 5700U CPUs
### Phase 3: Terraform Integration
1. Add Incus Terraform provider to `terraform/`
2. Define a test VM in OpenTofu (cloud-init, static IP, vault provisioning)
3. Verify the full pipeline: tofu apply -> VM boots -> cloud-init -> vault credentials -> NixOS rebuild
4. Compare workflow with existing Proxmox pipeline
### Phase 4: Evaluate and Expand
- Is the workflow comparable to Proxmox?
- Migration reliability — does live migration work cleanly?
- Performance overhead acceptable on Ryzen 5700U?
- Worth migrating some test-tier VMs from Proxmox?
- Could ns1/ns2 run on separate Incus nodes instead of the single Proxmox host?
### Phase 5: Proxmox Replacement (optional)
If Incus works well on the PN51s, consider replacing Proxmox entirely for a three-node cluster.
**CPU compatibility for mixed cluster:**
| Node | CPU | Architecture | x86-64-v3 |
|------|-----|-------------|-----------|
| Proxmox host | AMD Ryzen 9 3900X (12C/24T) | Zen 2 | Yes |
| pn01 | AMD Ryzen 7 5700U (8C/16T) | Zen 3 | Yes |
| pn02 | AMD Ryzen 7 5700U (8C/16T) | Zen 3 | Yes |
All three CPUs are AMD and support `x86-64-v3`. The 3900X (Zen 2) is the oldest, so it defines the feature ceiling — but `x86-64-v3` is well within its capabilities. VMs configured with `x86-64-v3` can migrate freely between all three nodes.
Being all-AMD also avoids the trickier Intel/AMD cross-vendor migration edge cases (different CPUID layouts, virtualization extensions).
The 3900X (12C/24T) would be the most powerful node, making it the natural home for heavier workloads, with the PN51s (8C/16T each) handling lighter VMs or serving as migration targets during maintenance.
Steps:
1. Install NixOS + Incus on the Proxmox host (or a replacement machine)
2. Join it to the existing Incus cluster with `x86-64-v3` CPU baseline
3. Migrate VMs from Proxmox to the Incus cluster
4. Decommission Proxmox
## Prerequisites
- [ ] PN51 units pass stability testing (see `pn51-stability.md`)
- [ ] Decide which unit to use first (pn01 preferred — 64GB RAM, NVMe, currently more stable)
## Open Questions
- How to handle VM storage? Local NVMe, NFS from NAS, or Ceph between the two nodes?
- Network topology: bridge on VLAN 12, or trunk multiple VLANs to the PN51?
- Should VMs be on the same VLAN as the hypervisor host, or separate?
- Incus clustering with only two nodes — any quorum issues? Three nodes (with Proxmox replacement) would solve this
- How to handle NixOS guest images? Build with nixos-generators, or use Incus image builder?
- ~~What CPU does the current Proxmox host have?~~ AMD Ryzen 9 3900X (Zen 2) — `x86-64-v3` confirmed, all-AMD cluster
- If replacing Proxmox: migrate VMs first, or fresh start and rebuild?

View File

@@ -42,10 +42,24 @@ Needs a small x86 box with:
- 4-8 GB RAM (plenty for routing + DHCP + NetFlow accounting) - 4-8 GB RAM (plenty for routing + DHCP + NetFlow accounting)
- Low power consumption, fanless preferred for always-on use - Low power consumption, fanless preferred for always-on use
Candidates: **Leading candidate:** [Topton Solid Mini PC](https://www.aliexpress.com/item/1005008981218625.html)
- Topton / CWWK mini PC with dual/quad Intel 2.5GbE (~100-150 EUR) with Intel i3-N300 (8 E-cores), 2x10GbE SFP+ + 3x2.5GbE (~NOK 3000 barebones). The N300
- Protectli Vault (more expensive, ~200-300 EUR, proven in pfSense/OPNsense community) gives headroom for ntopng DPI and potential Suricata IDS without being overkill.
- Any mini PC with one onboard NIC + one USB 2.5GbE adapter (cheapest, less ideal)
### Hardware Alternatives
Domestic availability for firewall mini PCs is limited — likely ordering from AliExpress.
Key things to verify:
- NIC chipset: Intel i225-V/i226-V preferred over Realtek for Linux driver support
- RAM/storage: some listings are barebones, check what's included
- Import duties: factor in ~25% on top of listing price
| Option | NICs | Notes | Price |
|--------|------|-------|-------|
| [Topton Solid Firewall Router](https://www.aliexpress.com/item/1005008059819023.html) | 2x10GbE SFP+, 4x2.5GbE | No RAM/SSD, only Intel N150 available currently | ~NOK 2500 |
| [Topton Solid Mini PC](https://www.aliexpress.com/item/1005008981218625.html) | 2x10GbE SFP+, 3x2.5GbE | No RAM/SSD, only Intel i3-N300 available currently | ~NOK 3000 |
| [MINISFORUM MS-01](https://www.aliexpress.com/item/1005007308262492.html) | 2x10GbE SFP+, 2x2.5GbE | No RAM/SSD, i5-12600H | ~NOK 4500 |
The LAN port would carry a VLAN trunk to the MikroTik switch, with sub-interfaces The LAN port would carry a VLAN trunk to the MikroTik switch, with sub-interfaces
for each VLAN. WAN port connects to the ISP uplink. for each VLAN. WAN port connects to the ISP uplink.
@@ -89,6 +103,12 @@ The router is treated differently from the rest of the fleet:
- nftables flow accounting or softflowd for NetFlow export - nftables flow accounting or softflowd for NetFlow export
- Export to future ntopng instance (see new-services.md) - Export to future ntopng instance (see new-services.md)
**IDS/IPS (future consideration):**
- Suricata for inline intrusion detection/prevention on the WAN interface
- Signature-based threat detection, protocol anomaly detection
- CPU-intensive — feasible at typical home internet speeds (500Mbps-1Gbps) on the N300
- Not a day-one requirement, but the hardware should support it
### Monitoring Integration ### Monitoring Integration
Since this is a NixOS host in the flake, it gets the standard monitoring stack for free: Since this is a NixOS host in the flake, it gets the standard monitoring stack for free:

View File

@@ -0,0 +1,104 @@
# NixOS OpenStack Image
## Overview
Build and upload a NixOS base image to the OpenStack cluster at work, enabling NixOS-based VPS instances to replace the current Debian+Podman setup. This image will serve as the foundation for multiple external services:
- **Forgejo** (replacing Gitea on docker2)
- **WireGuard gateway** (replacing docker2's tunnel role, feeding into the remote-access plan)
- Any future externally-hosted services
## Current State
- VPS hosting runs on an OpenStack cluster with a personal quota
- Current VPS (`docker2.t-juice.club`) runs Debian with Podman containers
- Homelab already has a working Proxmox image pipeline: `template2` builds via `nixos-rebuild build-image --image-variant proxmox`, deployed via Ansible
- nixpkgs has a built-in `openstack` image variant in the same `image.modules` system used for Proxmox
## Decisions
- **No cloud-init dependency** - SSH key baked into the image, no need for metadata service
- **No bootstrap script** - VPS deployments are infrequent; manual `nixos-rebuild` after first boot is fine
- **No Vault access** - secrets handled manually until WireGuard access is set up (see remote-access plan)
- **Separate from homelab services** - no logging/metrics integration initially; revisit after remote-access WireGuard is in place
- **Repo placement TBD** - keep in this flake for now for convenience, but external hosts may move to a separate flake later since they can't use most shared `system/` modules (no Vault, no internal DNS, no Promtail)
- **OpenStack CLI in devshell** - add `openstackclient` package; credentials (`clouds.yaml`) stay outside the repo
- **Parallel deployment** - new Forgejo instance runs alongside docker2 initially, then CNAME moves over
## Approach
Follow the same pattern as the Proxmox template (`hosts/template2`), but targeting OpenStack's qcow2 format.
### What nixpkgs provides
The `image.modules.openstack` module produces a qcow2 image with:
- `openstack-config.nix`: EC2 metadata fetcher, SSH enabled, GRUB bootloader, serial console, auto-growing root partition
- `qemu-guest.nix` profile (virtio drivers)
- ext4 root filesystem with `autoResize`
### What we need to customize
The stock OpenStack image pulls SSH keys and hostname from EC2-style metadata. Since we're baking the SSH key into the image, we need a simpler configuration:
- SSH authorized keys baked into the image
- Base packages (age, vim, wget, git)
- Nix substituters (`cache.nixos.org` only - internal cache not reachable)
- systemd-networkd with DHCP
- GRUB bootloader
- Firewall enabled (public-facing host)
### Differences from template2
| Aspect | template2 (Proxmox) | openstack-template (OpenStack) |
|--------|---------------------|-------------------------------|
| Image format | VMA (`.vma.zst`) | qcow2 (`.qcow2`) |
| Image variant | `proxmox` | `openstack` |
| Cloud-init | ConfigDrive + NoCloud | Not used (SSH key baked in) |
| Nix cache | Internal + nixos.org | `cache.nixos.org` only |
| Vault | AppRole via wrapped token | None |
| Bootstrap | Automatic nixos-rebuild on first boot | Manual |
| Network | Internal DHCP | OpenStack DHCP |
| DNS | Internal ns1/ns2 | Public DNS |
| Firewall | Disabled (trusted network) | Enabled |
| System modules | Full `../../system` import | Minimal (sshd, packages only) |
## Implementation Steps
### Phase 1: Build the image
1. Create `hosts/openstack-template/` with minimal configuration
- `default.nix` - imports (only sshd and packages from `system/`, not the full set)
- `configuration.nix` - base config: SSH key, DHCP, GRUB, base packages, firewall on
- `hardware-configuration.nix` - qemu-guest profile with virtio drivers
- Exclude from DNS and monitoring (`homelab.dns.enable = false`, `homelab.monitoring.enable = false`)
- May need to override parts of `image.modules.openstack` to disable the EC2 metadata fetcher if it causes boot delays
2. Build with `nixos-rebuild build-image --image-variant openstack --flake .#openstack-template`
3. Verify the qcow2 image is produced in `result/`
### Phase 2: Upload and test
1. Add `openstackclient` to the devshell
2. Upload image: `openstack image create --disk-format qcow2 --file result/<image>.qcow2 nixos-template`
3. Boot a test instance from the image
4. Verify: SSH access works, DHCP networking, Nix builds work
5. Test manual `nixos-rebuild switch --flake` against the instance
### Phase 3: Automation (optional, later)
Consider an Ansible playbook similar to `build-and-deploy-template.yml` for image builds + uploads. Low priority since this will be done rarely.
## Open Questions
- [ ] Should external VPS hosts eventually move to a separate flake? (Depends on how different they end up being from homelab hosts)
- [ ] Will the stock `openstack-config.nix` metadata fetcher cause boot delays/errors if the metadata service isn't reachable? May need to disable it.
- [ ] **Flavor selection** - investigate what flavors are available in the quota. The standard small flavors likely have insufficient root disk for a NixOS host (Nix store grows fast). Options:
- Use a larger flavor with adequate root disk
- Create a custom flavor (if permissions allow)
- Cinder block storage is an option in theory, but was very slow last time it was tested - avoid if possible
- [ ] Consolidation opportunity - currently running multiple smaller VMs on OpenStack. Could a single larger NixOS VM replace several of them?
## Notes
- `nixos-rebuild build-image --image-variant openstack` uses the same `image.modules` system as Proxmox
- nixpkgs also has an `openstack-zfs` variant if ZFS root is ever wanted
- The stock OpenStack module imports `ec2-data.nix` and `amazon-init.nix` - these may need to be disabled or overridden if they cause issues without a metadata service

View File

@@ -0,0 +1,231 @@
# ASUS PN51 Stability Testing
## Overview
Two ASUS PN51-E1 mini PCs (Ryzen 7 5700U) purchased years ago but shelved due to stability issues. Revisiting them to potentially add to the homelab.
## Hardware
| | pn01 (10.69.12.60) | pn02 (10.69.12.61) |
|---|---|---|
| **CPU** | AMD Ryzen 7 5700U (8C/16T) | AMD Ryzen 7 5700U (8C/16T) |
| **RAM** | 2x 32GB DDR4 SO-DIMM (64GB) | 1x 32GB DDR4 SO-DIMM (32GB) |
| **Storage** | 1TB NVMe | 1TB Samsung 870 EVO (SATA SSD) |
| **BIOS** | 0508 (2023-11-08) | Updated 2026-02-21 (latest from ASUS) |
## Original Issues
- **pn01**: Would boot but freeze randomly after some time. No console errors, completely unresponsive. memtest86 passed.
- **pn02**: Had trouble booting — would start loading kernel from installer USB then instantly reboot. When it did boot, would also freeze randomly.
## Debugging Steps
### 2026-02-21: Initial Setup
1. **Disabled fTPM** (labeled "Security Device" in ASUS BIOS) on both units
- AMD Ryzen 5000 series had a known fTPM bug causing random hard freezes with no console output
- Both units booted the NixOS installer successfully after this change
2. Installed NixOS on both, added to repo as `pn01` and `pn02` on VLAN 12
3. Configured monitoring (node-exporter, promtail, nixos-exporter)
### 2026-02-21: pn02 First Freeze
- pn02 froze approximately 1 hour after boot
- All three Prometheus targets went down simultaneously — hard freeze, not graceful shutdown
- Journal on next boot: `system.journal corrupted or uncleanly shut down`
- Kernel warnings from boot log before freeze:
- **TSC clocksource unstable**: `Marking clocksource 'tsc' as unstable because the skew is too large` — TSC skewing ~3.8ms over 500ms relative to HPET watchdog
- **AMD PSP error**: `psp gfx command LOAD_TA(0x1) failed and response status is (0x7)` — Platform Security Processor failing to load trusted application
- pn01 did not show these warnings on this particular boot, but has shown them historically (see below)
### 2026-02-21: pn02 BIOS Update
- Updated pn02 BIOS to latest version from ASUS website
- **TSC still unstable** after BIOS update — same ~3.8ms skew
- **PSP LOAD_TA still failing** after BIOS update
- Monitoring back up, letting it run to see if freeze recurs
### 2026-02-22: TSC/PSP Confirmed on Both Units
- Checked kernel logs after ~9 hours uptime — both units still running
- **pn01 now shows TSC unstable and PSP LOAD_TA failure** on this boot (same ~3.8ms TSC skew, same PSP error)
- pn01 had these same issues historically when tested years ago — the earlier clean boot was just lucky TSC calibration timing
- **Conclusion**: TSC instability and PSP LOAD_TA are platform-level quirks of the PN51-E1 / Ryzen 5700U, present on both units
- The kernel handles TSC instability gracefully (falls back to HPET), and PSP LOAD_TA is non-fatal
- Neither issue is likely the cause of the hard freezes — the fTPM bug remains the primary suspect
### 2026-02-22: Stress Test (1 hour)
- Ran `stress-ng --cpu 16 --vm 2 --vm-bytes 8G --timeout 1h` on both units
- CPU temps peaked at ~85°C, settled to ~80°C sustained (throttle limit is 105°C)
- Both survived the full hour with no freezes, no MCE errors, no kernel issues
- No concerning log entries during or after the test
### 2026-02-22: TSC Runtime Switch Test
- Attempted to switch clocksource back to TSC at runtime on pn01:
```
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
```
- Kernel watchdog immediately reverted to HPET — TSC skew is ongoing, not just a boot-time issue
- **Conclusion**: TSC is genuinely unstable on the PN51-E1 platform. HPET is the correct clocksource.
- For virtualization (Incus), this means guest VMs will use HPET-backed timing. Performance impact is minimal for typical server workloads (DNS, monitoring, light services) but would matter for latency-sensitive applications.
### 2026-02-22: BIOS Tweaks (Both Units)
- Disabled ErP Ready on both (EU power efficiency mode — aggressively cuts power in idle)
- Disabled WiFi and Bluetooth in BIOS on both
- **TSC still unstable** after these changes — same ~3.8ms skew on both units
- ErP/power states are not the cause of the TSC issue
### 2026-02-22: pn02 Second Freeze
- pn02 froze again ~5.5 hours after boot (at idle, not under load)
- All Prometheus targets down simultaneously — same hard freeze pattern
- Last log entry was normal nix-daemon activity — zero warning/error logs before crash
- Survived the 1h stress test earlier but froze at idle later — not thermal
- pn01 remains stable throughout
- **Action**: Blacklisted `amdgpu` kernel module on pn02 (`boot.blacklistedKernelModules = [ "amdgpu" ]`) to eliminate GPU/PSP firmware interactions as a cause. No console output but managed via SSH.
- **Action**: Added diagnostic/recovery config to pn02:
- `panic=10` + `nmi_watchdog=1` kernel params — auto-reboot after 10s on panic
- `softlockup_panic` + `hardlockup_panic` sysctls — convert lockups to panics with stack traces
- `hardware.rasdaemon` with recording — logs hardware errors (MCE, PCIe AER, memory) to sqlite database, survives reboots
- Check recorded errors: `ras-mc-ctl --summary`, `ras-mc-ctl --errors`
## Benign Kernel Errors (Both Units)
These appear on both units and can be ignored:
- `clocksource: Marking clocksource 'tsc' as unstable` — TSC skew vs HPET, kernel falls back gracefully. Platform-level quirk on PN51-E1, not always reproducible on every boot.
- `psp gfx command LOAD_TA(0x1) failed` — AMD PSP firmware error, non-fatal. Present on both units across all BIOS versions.
- `pcie_mp2_amd: amd_sfh_hid_client_init failed err -95` — AMD Sensor Fusion Hub, no sensors connected
- `Bluetooth: hci0: Reading supported features failed` — Bluetooth init quirk
- `Serial bus multi instantiate pseudo device driver INT3515:00: error -ENXIO` — unused serial bus device
- `snd_hda_intel: no codecs found` — no audio device connected, headless server
- `ata2.00: supports DRM functions and may not be fully accessible` — Samsung SSD DRM quirk (pn02 only)
### 2026-02-23: processor.max_cstate=1 and Proxmox Forums
- Found a thread on the Proxmox forums about PN51 units with similar freeze issues
- Many users reporting identical symptoms — random hard freezes, no log evidence
- No conclusive fix. Some have frequent freezes, others only a few times a month
- Some reported BIOS updates helped, but results inconsistent
- Added `processor.max_cstate=1` kernel parameter to pn02 — limits CPU to C1 halt state, preventing deep C-state sleep transitions that may trigger freezes on AMD mobile chips
- Also applied: amdgpu blacklist, panic=10, nmi_watchdog=1, softlockup/hardlockup panic, rasdaemon
### 2026-02-23: logind D-Bus Deadlock (pn02)
- node-exporter alert fired — but host was NOT frozen
- logind was running (PID 871) but deadlocked on D-Bus — not responding to `org.freedesktop.login1` requests
- Every node-exporter scrape blocked for 25s waiting for logind, causing scrape timeouts
- Likely related to amdgpu blacklist — no DRM device means no graphical seat, logind may have deadlocked during seat enumeration at boot
- Fix: `systemctl restart systemd-logind` + `systemctl restart prometheus-node-exporter`
- After restart, logind responded normally and reported seat0
### 2026-02-27: pn02 Third Freeze
- pn02 crashed again after ~2 days 21 hours uptime (longest run so far)
- Evidence of crash:
- Journal file corrupted: `system.journal corrupted or uncleanly shut down`
- Boot partition fsck: `Dirty bit is set. Fs was not properly unmounted`
- No orderly shutdown logs from previous boot
- No auto-upgrade triggered
- **NMI watchdog did NOT fire** — no kernel panic logged. This is a true hard lockup below NMI level
- **rasdaemon recorded nothing** — no MCE, AER, or memory errors in the sqlite database
- **Positive**: The system auto-rebooted this time (likely hardware watchdog), unlike previous freezes that required manual power cycle
- `processor.max_cstate=1` may have extended uptime (2d21h vs previous 1h and 5.5h) but did not prevent the freeze
### 2026-02-27 to 2026-03-03: Relative Stability
- pn02 ran without crashes for approximately one week after the third freeze
- pn01 continued to be completely stable throughout this period
- Auto-upgrade reboots continued daily (~4am) on both units — these are planned and healthy
### 2026-03-04: pn02 Fourth Crash — sched_ext Kernel Oops (pstore captured)
- pn02 crashed after ~5.8 days uptime (504566s)
- **First crash captured by pstore** — kernel oops and panic stack traces preserved across reboot
- Journal corruption confirmed: `system.journal corrupted or uncleanly shut down`
- **Crash location**: `RIP: 0010:set_next_task_scx+0x6e/0x210` — crash in the **sched_ext (SCX) scheduler** subsystem
- **Call trace**: `sysvec_apic_timer_interrupt` → `cpuidle_enter_state` — crashed during CPU idle, triggered by APIC timer interrupt
- **CR2**: `ffffffffffffff89` — dereferencing an obviously invalid kernel pointer
- **Kernel**: 6.12.74 (NixOS 25.11)
- **Significance**: This is the first crash with actual diagnostic output. Previous crashes were silent sub-NMI freezes. The sched_ext scheduler path is a new finding — earlier crashes were assumed to be hardware-level.
### 2026-03-06: pn02 Fifth Crash
- pn02 crashed again — journal corruption on next boot
- No pstore data captured for this crash
### 2026-03-07: pn02 Sixth and Seventh Crashes — Two in One Day
**First crash (~11:06 UTC):**
- ~26.6 hours uptime (95994s)
- **pstore captured both Oops and Panic**
- **Crash location**: Scheduler code path — `pick_next_task_fair` → `__pick_next_task`
- **CR2**: `000000c000726000` — invalid pointer dereference
- **Notable**: `dbus-daemon` segfaulted ~50 minutes before the kernel crash (`segfault at 0` in `libdbus-1.so.3.32.4` on CPU 0) — may indicate memory corruption preceding the kernel crash
**Second crash (~21:15 UTC):**
- Journal corruption confirmed on next boot
- No pstore data captured
### 2026-03-07: pn01 Status
- pn01 has had **zero crashes** since initial setup on Feb 21
- Zero journal corruptions, zero pstore dumps in 30 days
- Same BOOT_ID maintained between daily auto-upgrade reboots — consistently clean shutdown/reboot cycles
- All 8 reboots in 30 days are planned auto-upgrade reboots
- **pn01 is fully stable**
## Crash Summary
| Date | Uptime Before Crash | Crash Type | Diagnostic Data |
|------|---------------------|------------|-----------------|
| Feb 21 | ~1h | Silent freeze | None — sub-NMI |
| Feb 22 | ~5.5h | Silent freeze | None — sub-NMI |
| Feb 27 | ~2d 21h | Silent freeze | None — sub-NMI, rasdaemon empty |
| Mar 4 | ~5.8d | **Kernel oops** | pstore: `set_next_task_scx` (sched_ext) |
| Mar 6 | Unknown | Crash | Journal corruption only |
| Mar 7 | ~26.6h | **Kernel oops + panic** | pstore: `pick_next_task_fair` (scheduler) + dbus segfault |
| Mar 7 | Unknown | Crash | Journal corruption only |
## Conclusion
**pn02 is unreliable.** After exhausting mitigations (fTPM disabled, BIOS updated, WiFi/BT disabled, ErP disabled, amdgpu blacklisted, processor.max_cstate=1, NMI watchdog, rasdaemon), the unit still crashes every few days. 26 reboots in 30 days (7 unclean crashes + daily auto-upgrade reboots).
The pstore crash dumps from March reveal a new dimension: at least some crashes are **kernel scheduler bugs in sched_ext**, not just silent hardware-level freezes. The `set_next_task_scx` and `pick_next_task_fair` crash sites, combined with the dbus-daemon segfault before one crash, suggest possible memory corruption that manifests in the scheduler. It's unclear whether this is:
1. A sched_ext kernel bug exposed by the PN51's hardware quirks (unstable TSC, C-state behavior)
2. Hardware-induced memory corruption that happens to hit scheduler data structures
3. A pure software bug in the 6.12.74 kernel's sched_ext implementation
**pn01 is stable** — zero crashes in 30 days of continuous operation. Both units have identical kernel and NixOS configuration (minus pn02's diagnostic mitigations), so the difference points toward a hardware defect specific to the pn02 board.
## Next Steps
- **pn02 memtest**: Run memtest86 for 24h+ (available in systemd-boot menu). The crash signatures (userspace segfaults before kernel panics, corrupted pointers in scheduler structures) are consistent with intermittent RAM errors that a quick pass wouldn't catch. If memtest finds errors, swap the DIMM.
- **pn02**: Consider scrapping or repurposing for non-critical workloads that tolerate random reboots (auto-recovery via hardware watchdog is now working)
- **pn02 investigation**: Could try disabling sched_ext (`boot.kernelParams = [ "sched_ext.enabled=0" ]` or equivalent) to test whether the crashes stop — would help distinguish kernel bug from hardware defect
- **pn01**: Continue monitoring. If it remains stable long-term, it is viable for light workloads
- If pn01 eventually crashes, apply the same mitigations (amdgpu blacklist, max_cstate=1) to see if they help
- For the Incus hypervisor plan: likely need different hardware. Evaluating GMKtec G3 (Intel) as an alternative. Note: mixed Intel/AMD cluster complicates live migration
## Diagnostics and Auto-Recovery (pn02)
Currently deployed on pn02:
```nix
boot.blacklistedKernelModules = [ "amdgpu" ];
boot.kernelParams = [ "panic=10" "nmi_watchdog=1" "processor.max_cstate=1" ];
boot.kernel.sysctl."kernel.softlockup_panic" = 1;
boot.kernel.sysctl."kernel.hardlockup_panic" = 1;
hardware.rasdaemon.enable = true;
hardware.rasdaemon.record = true;
```
**Crash recovery is working**: pstore now captures kernel oops/panic data, and the system auto-reboots via `panic=10` or SP5100 TCO hardware watchdog.
**After reboot, check:**
- `ras-mc-ctl --summary` — overview of hardware errors
- `ras-mc-ctl --errors` — detailed error list
- `journalctl -b -1 -p err` — kernel logs from crashed boot (if panic was logged)
- pstore data is automatically archived by `systemd-pstore.service` and forwarded to Loki via promtail

View File

@@ -152,5 +152,5 @@ will function fine with uneven distribution, just slightly suboptimal for perfor
- [ ] IP address/subnet: NAS and Proxmox are both on 10GbE to the same switch but different subnets, forcing traffic through the router (bottleneck). Move to same subnet during migration. - [ ] IP address/subnet: NAS and Proxmox are both on 10GbE to the same switch but different subnets, forcing traffic through the router (bottleneck). Move to same subnet during migration.
- [x] Boot drive: Reuse TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root (no ZFS on boot path) - [x] Boot drive: Reuse TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root (no ZFS on boot path)
- [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed) - [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed)
- [ ] Drive trays: do new 24TB drives fit, or order trays from abroad? - [x] Drive trays: ordered domestically (expected 2026-02-25 to 2026-03-03)
- [ ] Timeline/maintenance window for NixOS swap? - [ ] Timeline/maintenance window for NixOS swap?

12
flake.lock generated
View File

@@ -64,11 +64,11 @@
}, },
"nixpkgs": { "nixpkgs": {
"locked": { "locked": {
"lastModified": 1771419570, "lastModified": 1772822230,
"narHash": "sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU=", "narHash": "sha256-yf3iYLGbGVlIthlQIk5/4/EQDZNNEmuqKZkQssMljuw=",
"owner": "nixos", "owner": "nixos",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47", "rev": "71caefce12ba78d84fe618cf61644dce01cf3a96",
"type": "github" "type": "github"
}, },
"original": { "original": {
@@ -80,11 +80,11 @@
}, },
"nixpkgs-unstable": { "nixpkgs-unstable": {
"locked": { "locked": {
"lastModified": 1771369470, "lastModified": 1772773019,
"narHash": "sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ=", "narHash": "sha256-E1bxHxNKfDoQUuvriG71+f+s/NT0qWkImXsYZNFFfCs=",
"owner": "nixos", "owner": "nixos",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "0182a361324364ae3f436a63005877674cf45efb", "rev": "aca4d95fce4914b3892661bcb80b8087293536c6",
"type": "github" "type": "github"
}, },
"original": { "original": {

View File

@@ -200,6 +200,42 @@
./hosts/garage01 ./hosts/garage01
]; ];
}; };
pn01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self;
};
modules = commonModules ++ [
./hosts/pn01
];
};
pn02 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self;
};
modules = commonModules ++ [
./hosts/pn02
];
};
nrec-nixos01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self;
};
modules = commonModules ++ [
./hosts/nrec-nixos01
];
};
openstack-template = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self;
};
modules = commonModules ++ [
./hosts/openstack-template
];
};
}; };
packages = forAllSystems ( packages = forAllSystems (
{ pkgs }: { pkgs }:
@@ -218,6 +254,7 @@
pkgs.openbao pkgs.openbao
pkgs.kanidm_1_8 pkgs.kanidm_1_8
pkgs.nkeys pkgs.nkeys
pkgs.openstackclient
(pkgs.callPackage ./scripts/create-host { }) (pkgs.callPackage ./scripts/create-host { })
homelab-deploy.packages.${pkgs.system}.default homelab-deploy.packages.${pkgs.system}.default
]; ];

View File

@@ -54,10 +54,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -46,10 +46,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -52,10 +52,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
vault.enable = true; vault.enable = true;
homelab.deploy.enable = true; homelab.deploy.enable = true;

View File

@@ -44,10 +44,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -55,10 +55,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -53,10 +53,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -44,10 +44,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -53,10 +53,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -0,0 +1,78 @@
{
lib,
pkgs,
...
}:
{
services.openssh = {
enable = true;
settings = {
PermitRootLogin = lib.mkForce "no";
PasswordAuthentication = false;
};
};
users.users.nixos = {
isNormalUser = true;
extraGroups = [ "wheel" ];
shell = pkgs.zsh;
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
];
};
security.sudo.wheelNeedsPassword = false;
programs.zsh.enable = true;
homelab.dns.enable = false;
homelab.monitoring.enable = false;
homelab.host.labels.ansible = "false";
fileSystems."/" = {
device = "/dev/disk/by-label/nixos";
fsType = "ext4";
autoResize = true;
};
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "nrec-nixos01";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
systemd.network.enable = true;
systemd.network.networks."ens3" = {
matchConfig.Name = "ens3";
networkConfig.DHCP = "ipv4";
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
networking.firewall.enable = true;
networking.firewall.allowedTCPPorts = [
22
80
443
];
nix.settings.substituters = [
"https://cache.nixos.org"
];
nix.settings.trusted-public-keys = [
"cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY="
];
services.caddy = {
enable = true;
virtualHosts."nrec-nixos01.t-juice.club" = {
extraConfig = ''
reverse_proxy 127.0.0.1:3000
'';
};
};
zramSwap.enable = true;
system.stateVersion = "25.11";
}

View File

@@ -0,0 +1,9 @@
{ modulesPath, ... }:
{
imports = [
./configuration.nix
../../system/packages.nix
../../services/forgejo
(modulesPath + "/profiles/qemu-guest.nix")
];
}

View File

@@ -58,10 +58,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -58,10 +58,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -0,0 +1,72 @@
{
lib,
pkgs,
...
}:
{
services.openssh = {
enable = true;
settings = {
PermitRootLogin = lib.mkForce "no";
PasswordAuthentication = false;
};
};
users.users.nixos = {
isNormalUser = true;
extraGroups = [ "wheel" ];
shell = pkgs.zsh;
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
];
};
security.sudo.wheelNeedsPassword = false;
programs.zsh.enable = true;
homelab.dns.enable = false;
homelab.monitoring.enable = false;
homelab.host.labels.ansible = "false";
# Minimal fileSystems for evaluation; openstack-config.nix overrides this at image build time
fileSystems."/" = {
device = lib.mkDefault "/dev/vda1";
fsType = lib.mkDefault "ext4";
};
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "nixos-openstack-template";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
systemd.network.enable = true;
systemd.network.networks."ens3" = {
matchConfig.Name = "ens3";
networkConfig.DHCP = "ipv4";
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
networking.firewall.enable = true;
networking.firewall.allowedTCPPorts = [ 22 ];
nix.settings.substituters = [
"https://cache.nixos.org"
];
nix.settings.trusted-public-keys = [
"cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY="
];
environment.systemPackages = with pkgs; [
age
vim
wget
git
];
zramSwap.enable = true;
system.stateVersion = "25.11";
}

View File

@@ -0,0 +1,7 @@
{ ... }:
{
imports = [
./configuration.nix
../../system/packages.nix
];
}

View File

@@ -0,0 +1,54 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
./hardware-configuration.nix
../../system
];
boot.loader.systemd-boot.enable = true;
boot.loader.systemd-boot.memtest86.enable = true;
boot.loader.efi.canTouchEfiVariables = true;
networking.hostName = "pn01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
networking.firewall.enable = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."enp2s0" = {
matchConfig.Name = "enp2s0";
address = [
"10.69.12.60/24"
];
routes = [
{ Gateway = "10.69.12.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
homelab.host = {
tier = "test";
priority = "low";
role = "compute";
};
vault.enable = true;
nixpkgs.config.allowUnfree = true;
system.stateVersion = "25.11";
}

5
hosts/pn01/default.nix Normal file
View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -0,0 +1,33 @@
# Do not modify this file! It was generated by nixos-generate-config
# and may be overwritten by future invocations. Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:
{
imports =
[ (modulesPath + "/installer/scan/not-detected.nix")
];
boot.initrd.availableKernelModules = [ "xhci_pci" "nvme" "ahci" "usb_storage" "usbhid" "sd_mod" "rtsx_usb_sdmmc" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ "kvm-amd" ];
boot.extraModulePackages = [ ];
fileSystems."/" =
{ device = "/dev/disk/by-uuid/9444cf54-80e0-4315-adca-8ddd5037217c";
fsType = "ext4";
};
fileSystems."/boot" =
{ device = "/dev/disk/by-uuid/D897-146F";
fsType = "vfat";
options = [ "fmask=0022" "dmask=0022" ];
};
swapDevices =
[ { device = "/dev/disk/by-uuid/6c1e775f-342e-463a-a7f9-d7ce6593a482"; }
];
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
hardware.cpu.amd.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
}

View File

@@ -0,0 +1,61 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
./hardware-configuration.nix
../../system
];
boot.loader.systemd-boot.enable = true;
boot.loader.systemd-boot.memtest86.enable = true;
boot.loader.efi.canTouchEfiVariables = true;
boot.blacklistedKernelModules = [ "amdgpu" ];
boot.kernelParams = [ "panic=10" "nmi_watchdog=1" "processor.max_cstate=1" ];
boot.kernel.sysctl."kernel.softlockup_panic" = 1;
boot.kernel.sysctl."kernel.hardlockup_panic" = 1;
hardware.rasdaemon.enable = true;
hardware.rasdaemon.record = true;
networking.hostName = "pn02";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
networking.firewall.enable = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."enp2s0" = {
matchConfig.Name = "enp2s0";
address = [
"10.69.12.61/24"
];
routes = [
{ Gateway = "10.69.12.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
homelab.host = {
tier = "test";
priority = "low";
role = "compute";
};
vault.enable = true;
nixpkgs.config.allowUnfree = true;
system.stateVersion = "25.11";
}

5
hosts/pn02/default.nix Normal file
View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -0,0 +1,33 @@
# Do not modify this file! It was generated by nixos-generate-config
# and may be overwritten by future invocations. Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:
{
imports =
[ (modulesPath + "/installer/scan/not-detected.nix")
];
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "usb_storage" "usbhid" "sd_mod" "rtsx_usb_sdmmc" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ "kvm-amd" ];
boot.extraModulePackages = [ ];
fileSystems."/" =
{ device = "/dev/disk/by-uuid/1d28b629-51ae-4f0e-b440-9388c2e48413";
fsType = "ext4";
};
fileSystems."/boot" =
{ device = "/dev/disk/by-uuid/A5A7-C7B2";
fsType = "vfat";
options = [ "fmask=0022" "dmask=0022" ];
};
swapDevices =
[ { device = "/dev/disk/by-uuid/f2570894-0922-4746-84c7-2b2fe7601ea1"; }
];
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
hardware.cpu.amd.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
}

View File

@@ -54,10 +54,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
nix.settings.substituters = [ nix.settings.substituters = [
"https://nix-cache.home.2rjus.net" "https://nix-cache.home.2rjus.net"

View File

@@ -55,10 +55,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -55,10 +55,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -55,10 +55,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -45,10 +45,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -94,7 +94,15 @@ let
}) })
(externalTargets.nodeExporter or [ ]); (externalTargets.nodeExporter or [ ]);
allEntries = flakeEntries ++ externalEntries; # Node-exporter-only external targets (no systemd-exporter)
externalOnlyEntries = map
(target: {
inherit target;
labels = { hostname = extractHostnameFromTarget target; };
})
(externalTargets.nodeExporterOnly or [ ]);
allEntries = flakeEntries ++ externalEntries ++ externalOnlyEntries;
# Group entries by their label set for efficient static_configs # Group entries by their label set for efficient static_configs
# Convert labels attrset to a string key for grouping # Convert labels attrset to a string key for grouping
@@ -203,7 +211,18 @@ let
in in
flakeScrapeConfigs ++ externalScrapeConfigs; flakeScrapeConfigs ++ externalScrapeConfigs;
# Generate systemd-exporter targets (excludes nodeExporterOnly hosts)
generateSystemdExporterTargets = self: externalTargets:
let
nodeTargets = generateNodeExporterTargets self (externalTargets // { nodeExporterOnly = [ ]; });
in
map
(cfg: cfg // {
targets = map (t: builtins.replaceStrings [ ":9100" ] [ ":9558" ] t) cfg.targets;
})
nodeTargets;
in in
{ {
inherit extractHostMonitoring generateNodeExporterTargets generateScrapeConfigs; inherit extractHostMonitoring generateNodeExporterTargets generateScrapeConfigs generateSystemdExporterTargets;
} }

View File

@@ -56,10 +56,7 @@
}; };
time.timeZone = "Europe/Oslo"; time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0; nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
vim vim

View File

@@ -0,0 +1,19 @@
{ ... }:
{
services.forgejo = {
enable = true;
database.type = "sqlite3";
settings = {
server = {
DOMAIN = "nrec-nixos01.t-juice.club";
ROOT_URL = "https://nrec-nixos01.t-juice.club/";
HTTP_ADDR = "127.0.0.1";
HTTP_PORT = 3000;
};
server.LFS_START_SERVER = true;
service.DISABLE_REGISTRATION = true;
"service.explore".REQUIRE_SIGNIN_VIEW = true;
session.COOKIE_SECURE = true;
};
};
}

View File

@@ -386,6 +386,107 @@
"tooltip": {"mode": "multi", "sort": "desc"} "tooltip": {"mode": "multi", "sort": "desc"}
}, },
"description": "Rate of commands executed in honeypot shells" "description": "Rate of commands executed in honeypot shells"
},
{
"id": 11,
"title": "Storage Query Duration by Method",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 38},
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
"interval": "60s",
"targets": [
{
"expr": "rate(oubliette_storage_query_duration_seconds_sum{job=\"apiary\"}[$__rate_interval]) / rate(oubliette_storage_query_duration_seconds_count{job=\"apiary\"}[$__rate_interval])",
"legendFormat": "{{method}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"custom": {
"drawStyle": "line",
"lineInterpolation": "smooth",
"fillOpacity": 10,
"pointSize": 5,
"showPoints": "auto",
"stacking": {"mode": "none"}
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "Average query duration per storage method over time"
},
{
"id": 12,
"title": "Storage Query Rate by Method",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 38},
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
"interval": "60s",
"targets": [
{
"expr": "rate(oubliette_storage_query_duration_seconds_count{job=\"apiary\"}[$__rate_interval])",
"legendFormat": "{{method}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineInterpolation": "smooth",
"fillOpacity": 10,
"pointSize": 5,
"showPoints": "auto",
"stacking": {"mode": "none"}
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "Query execution rate per storage method"
},
{
"id": 13,
"title": "Storage Query Errors",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 46},
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
"targets": [
{
"expr": "sum(oubliette_storage_query_errors_total{job=\"apiary\"})",
"legendFormat": "Errors",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 10}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none",
"textMode": "auto"
},
"description": "Total storage query errors"
} }
] ]
} }

View File

@@ -4,6 +4,10 @@
nodeExporter = [ nodeExporter = [
"gunter.home.2rjus.net:9100" "gunter.home.2rjus.net:9100"
]; ];
# Hosts with node-exporter but no systemd-exporter
nodeExporterOnly = [
"pve1.home.2rjus.net:9100"
];
scrapeConfigs = [ scrapeConfigs = [
{ job_name = "smartctl"; targets = [ "gunter.home.2rjus.net:9633" ]; } { job_name = "smartctl"; targets = [ "gunter.home.2rjus.net:9633" ]; }
{ job_name = "ghettoptt"; targets = [ "gunter.home.2rjus.net:8989" ]; } { job_name = "ghettoptt"; targets = [ "gunter.home.2rjus.net:8989" ]; }

View File

@@ -4,6 +4,7 @@ let
externalTargets = import ../monitoring/external-targets.nix; externalTargets = import ../monitoring/external-targets.nix;
nodeExporterTargets = monLib.generateNodeExporterTargets self externalTargets; nodeExporterTargets = monLib.generateNodeExporterTargets self externalTargets;
systemdExporterTargets = monLib.generateSystemdExporterTargets self externalTargets;
autoScrapeConfigs = monLib.generateScrapeConfigs self externalTargets; autoScrapeConfigs = monLib.generateScrapeConfigs self externalTargets;
# TLS endpoints to monitor for certificate expiration via blackbox exporter # TLS endpoints to monitor for certificate expiration via blackbox exporter
@@ -70,14 +71,10 @@ let
job_name = "node-exporter"; job_name = "node-exporter";
static_configs = nodeExporterTargets; static_configs = nodeExporterTargets;
} }
# Systemd exporter on all hosts (same targets, different port) # Systemd exporter on hosts that have it (excludes nodeExporterOnly hosts)
{ {
job_name = "systemd-exporter"; job_name = "systemd-exporter";
static_configs = map static_configs = systemdExporterTargets;
(cfg: cfg // {
targets = map (t: builtins.replaceStrings [ ":9100" ] [ ":9558" ] t) cfg.targets;
})
nodeExporterTargets;
} }
# Local monitoring services # Local monitoring services
{ {

View File

@@ -31,6 +31,10 @@ in
}; };
settings = { settings = {
experimental-features = [
"nix-command"
"flakes"
];
trusted-substituters = [ trusted-substituters = [
"https://nix-cache.home.2rjus.net" "https://nix-cache.home.2rjus.net"
"https://cache.nixos.org" "https://cache.nixos.org"

View File

@@ -53,6 +53,16 @@ locals {
] ]
extra_policies = ["prometheus-metrics"] extra_policies = ["prometheus-metrics"]
} }
"pn01" = {
paths = [
"secret/data/hosts/pn01/*",
]
}
"pn02" = {
paths = [
"secret/data/hosts/pn02/*",
]
}
} }