diff --git a/docs/plans/pn51-stability.md b/docs/plans/pn51-stability.md index 427c9e3..132a4fb 100644 --- a/docs/plans/pn51-stability.md +++ b/docs/plans/pn51-stability.md @@ -71,6 +71,22 @@ Two ASUS PN51-E1 mini PCs (Ryzen 7 5700U) purchased years ago but shelved due to - **Conclusion**: TSC is genuinely unstable on the PN51-E1 platform. HPET is the correct clocksource. - For virtualization (Incus), this means guest VMs will use HPET-backed timing. Performance impact is minimal for typical server workloads (DNS, monitoring, light services) but would matter for latency-sensitive applications. +### 2026-02-22: BIOS Tweaks (Both Units) + +- Disabled ErP Ready on both (EU power efficiency mode — aggressively cuts power in idle) +- Disabled WiFi and Bluetooth in BIOS on both +- **TSC still unstable** after these changes — same ~3.8ms skew on both units +- ErP/power states are not the cause of the TSC issue + +### 2026-02-22: pn02 Second Freeze + +- pn02 froze again ~5.5 hours after boot (at idle, not under load) +- All Prometheus targets down simultaneously — same hard freeze pattern +- Last log entry was normal nix-daemon activity — zero warning/error logs before crash +- Survived the 1h stress test earlier but froze at idle later — not thermal +- pn01 remains stable throughout +- **Action**: Blacklisted `amdgpu` kernel module on pn02 (`boot.blacklistedKernelModules = [ "amdgpu" ]`) to eliminate GPU/PSP firmware interactions as a cause. No console output but managed via SSH. + ## Benign Kernel Errors (Both Units) These appear on both units and can be ignored: @@ -84,8 +100,8 @@ These appear on both units and can be ignored: ## Next Steps -- Monitor both units for stability over the next few days -- If either freezes again, try disabling unused hardware in BIOS (GPU, WiFi, Bluetooth, audio) -- If still freezing, may be a hardware defect +- Monitor pn02 with amdgpu blacklisted — if stable, try the less impactful `amdgpu.runpm=0 amdgpu.dpm=0` kernel params instead +- If pn02 still freezes without amdgpu, likely a hardware defect on this unit +- pn01 continues to be stable — keep monitoring - Once stable: add second RAM stick back to pn02, reinstall with NVMe - Evaluate for Incus hypervisor use (see `nixos-hypervisor.md`)