pn51: document BIOS tweaks, second pn02 freeze, amdgpu blacklist
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-22 18:28:19 +01:00
parent 05e8556bda
commit 2b42145d94

View File

@@ -71,6 +71,22 @@ Two ASUS PN51-E1 mini PCs (Ryzen 7 5700U) purchased years ago but shelved due to
- **Conclusion**: TSC is genuinely unstable on the PN51-E1 platform. HPET is the correct clocksource. - **Conclusion**: TSC is genuinely unstable on the PN51-E1 platform. HPET is the correct clocksource.
- For virtualization (Incus), this means guest VMs will use HPET-backed timing. Performance impact is minimal for typical server workloads (DNS, monitoring, light services) but would matter for latency-sensitive applications. - For virtualization (Incus), this means guest VMs will use HPET-backed timing. Performance impact is minimal for typical server workloads (DNS, monitoring, light services) but would matter for latency-sensitive applications.
### 2026-02-22: BIOS Tweaks (Both Units)
- Disabled ErP Ready on both (EU power efficiency mode — aggressively cuts power in idle)
- Disabled WiFi and Bluetooth in BIOS on both
- **TSC still unstable** after these changes — same ~3.8ms skew on both units
- ErP/power states are not the cause of the TSC issue
### 2026-02-22: pn02 Second Freeze
- pn02 froze again ~5.5 hours after boot (at idle, not under load)
- All Prometheus targets down simultaneously — same hard freeze pattern
- Last log entry was normal nix-daemon activity — zero warning/error logs before crash
- Survived the 1h stress test earlier but froze at idle later — not thermal
- pn01 remains stable throughout
- **Action**: Blacklisted `amdgpu` kernel module on pn02 (`boot.blacklistedKernelModules = [ "amdgpu" ]`) to eliminate GPU/PSP firmware interactions as a cause. No console output but managed via SSH.
## Benign Kernel Errors (Both Units) ## Benign Kernel Errors (Both Units)
These appear on both units and can be ignored: These appear on both units and can be ignored:
@@ -84,8 +100,8 @@ These appear on both units and can be ignored:
## Next Steps ## Next Steps
- Monitor both units for stability over the next few days - Monitor pn02 with amdgpu blacklisted — if stable, try the less impactful `amdgpu.runpm=0 amdgpu.dpm=0` kernel params instead
- If either freezes again, try disabling unused hardware in BIOS (GPU, WiFi, Bluetooth, audio) - If pn02 still freezes without amdgpu, likely a hardware defect on this unit
- If still freezing, may be a hardware defect - pn01 continues to be stable — keep monitoring
- Once stable: add second RAM stick back to pn02, reinstall with NVMe - Once stable: add second RAM stick back to pn02, reinstall with NVMe
- Evaluate for Incus hypervisor use (see `nixos-hypervisor.md`) - Evaluate for Incus hypervisor use (see `nixos-hypervisor.md`)