The initrd was missing virtio drivers, preventing the root
filesystem from being detected during boot.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The OpenStack image labels the root partition "nixos", so use
/dev/disk/by-label/nixos instead of /dev/vda1.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a new host configuration for building qcow2 images targeting
OpenStack (NREC). Uses a nixos user with SSH key and sudo instead
of root login, firewall enabled, and no internal services.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable memtest86 in systemd-boot menu on both PN51 units to allow
extended memory testing. Update stability document with March crash
data from pstore/Loki — crashes now traced to sched_ext scheduler
kernel oops, suggesting possible memory corruption.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add nodeExporterOnly list to external-targets.nix for hosts that
have node-exporter but not systemd-exporter (e.g. pve1). This
prevents a down target in the systemd-exporter scrape job.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pn02 crashed again after ~2d21h uptime despite all mitigations
(amdgpu blacklist, max_cstate=1, NMI watchdog, rasdaemon).
NMI watchdog didn't fire and rasdaemon recorded nothing,
confirming hard lockup below NMI level. Unit is unreliable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Known PN51 platform issue with deep C-states causing freezes.
Limit to C1 to prevent deeper sleep states.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable kernel panic on soft/hard lockups with auto-reboot after
10s, and rasdaemon for hardware error logging. Should give us
diagnostic data on the next freeze.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pn02 continues to hard freeze with no log evidence. Blacklisting
the GPU driver to eliminate GPU/PSP firmware interactions as a
possible cause. Console output will be lost but the host is
managed over SSH.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both units survived 1h stress test at 80-85C. TSC clocksource
is genuinely unstable at runtime (not just boot), HPET is the
correct fallback for this platform.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All hosts had identical nix-command/flakes settings in their
configuration.nix. Centralize in system/nix.nix so new hosts
(like pn01/pn02) get it automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add pn01 and pn02 to hosts-generated.tf for Vault AppRole access.
Fix provision-approle.yml: the localhost play was skipped when using
-l filter, since localhost didn't match the target. Merged into a
single play using delegate_to: localhost for the bao commands.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two ASUS PN51 hosts on VLAN 12 for stability testing.
pn01 at 10.69.12.60, pn02 at 10.69.12.61, both test-tier compute role.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Decided on Kodi + JellyCon with NFS direct path for media playback,
Sway/Hyprland for display server with workspace-based browser switching,
and noted HDR status for future reference.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New plan for replacing the media PC (i7-4770K/Ubuntu) with a NixOS mini PC
running Kodi. Router plan updated with specific AliExpress hardware options
and IDS/IPS considerations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduces false positives from transient Nix store growth by basing the
linear prediction on a 24h trend instead of 6h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NAS and Proxmox are on the same 10GbE switch but different subnets,
forcing traffic through the router. Need to fix during migration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root to keep
the boot path ZFS-independent. Added zfs export step before shutdown.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BTRFS RAID5/6 write hole is still unresolved, and RAID1 wastes
capacity with mixed disk sizes. Keep existing ZFS pool and import
directly on NixOS instead. Updated migration strategy, disk purchase
decision (2x 24TB ordered), SMART health notes, and vdev rebalancing
guidance.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>