Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
74 lines
4.1 KiB
Markdown
74 lines
4.1 KiB
Markdown
# ASUS PN51 Stability Testing
|
|
|
|
## Overview
|
|
|
|
Two ASUS PN51-E1 mini PCs (Ryzen 7 5700U) purchased years ago but shelved due to stability issues. Revisiting them to potentially add to the homelab.
|
|
|
|
## Hardware
|
|
|
|
| | pn01 (10.69.12.60) | pn02 (10.69.12.61) |
|
|
|---|---|---|
|
|
| **CPU** | AMD Ryzen 7 5700U (8C/16T) | AMD Ryzen 7 5700U (8C/16T) |
|
|
| **RAM** | 2x 32GB DDR4 SO-DIMM (64GB) | 1x 32GB DDR4 SO-DIMM (32GB) |
|
|
| **Storage** | 1TB NVMe | 1TB Samsung 870 EVO (SATA SSD) |
|
|
| **BIOS** | 0508 (2023-11-08) | Updated 2026-02-21 (latest from ASUS) |
|
|
|
|
## Original Issues
|
|
|
|
- **pn01**: Would boot but freeze randomly after some time. No console errors, completely unresponsive. memtest86 passed.
|
|
- **pn02**: Had trouble booting — would start loading kernel from installer USB then instantly reboot. When it did boot, would also freeze randomly.
|
|
|
|
## Debugging Steps
|
|
|
|
### 2026-02-21: Initial Setup
|
|
|
|
1. **Disabled fTPM** (labeled "Security Device" in ASUS BIOS) on both units
|
|
- AMD Ryzen 5000 series had a known fTPM bug causing random hard freezes with no console output
|
|
- Both units booted the NixOS installer successfully after this change
|
|
2. Installed NixOS on both, added to repo as `pn01` and `pn02` on VLAN 12
|
|
3. Configured monitoring (node-exporter, promtail, nixos-exporter)
|
|
|
|
### 2026-02-21: pn02 First Freeze
|
|
|
|
- pn02 froze approximately 1 hour after boot
|
|
- All three Prometheus targets went down simultaneously — hard freeze, not graceful shutdown
|
|
- Journal on next boot: `system.journal corrupted or uncleanly shut down`
|
|
- Kernel warnings from boot log before freeze:
|
|
- **TSC clocksource unstable**: `Marking clocksource 'tsc' as unstable because the skew is too large` — TSC skewing ~3.8ms over 500ms relative to HPET watchdog
|
|
- **AMD PSP error**: `psp gfx command LOAD_TA(0x1) failed and response status is (0x7)` — Platform Security Processor failing to load trusted application
|
|
- pn01 did not show these warnings on this particular boot, but has shown them historically (see below)
|
|
|
|
### 2026-02-21: pn02 BIOS Update
|
|
|
|
- Updated pn02 BIOS to latest version from ASUS website
|
|
- **TSC still unstable** after BIOS update — same ~3.8ms skew
|
|
- **PSP LOAD_TA still failing** after BIOS update
|
|
- Monitoring back up, letting it run to see if freeze recurs
|
|
|
|
### 2026-02-22: TSC/PSP Confirmed on Both Units
|
|
|
|
- Checked kernel logs after ~9 hours uptime — both units still running
|
|
- **pn01 now shows TSC unstable and PSP LOAD_TA failure** on this boot (same ~3.8ms TSC skew, same PSP error)
|
|
- pn01 had these same issues historically when tested years ago — the earlier clean boot was just lucky TSC calibration timing
|
|
- **Conclusion**: TSC instability and PSP LOAD_TA are platform-level quirks of the PN51-E1 / Ryzen 5700U, present on both units
|
|
- The kernel handles TSC instability gracefully (falls back to HPET), and PSP LOAD_TA is non-fatal
|
|
- Neither issue is likely the cause of the hard freezes — the fTPM bug remains the primary suspect
|
|
|
|
## Benign Kernel Errors (Both Units)
|
|
|
|
These appear on both units and can be ignored:
|
|
- `clocksource: Marking clocksource 'tsc' as unstable` — TSC skew vs HPET, kernel falls back gracefully. Platform-level quirk on PN51-E1, not always reproducible on every boot.
|
|
- `psp gfx command LOAD_TA(0x1) failed` — AMD PSP firmware error, non-fatal. Present on both units across all BIOS versions.
|
|
- `pcie_mp2_amd: amd_sfh_hid_client_init failed err -95` — AMD Sensor Fusion Hub, no sensors connected
|
|
- `Bluetooth: hci0: Reading supported features failed` — Bluetooth init quirk
|
|
- `Serial bus multi instantiate pseudo device driver INT3515:00: error -ENXIO` — unused serial bus device
|
|
- `ata2.00: supports DRM functions and may not be fully accessible` — Samsung SSD DRM quirk (pn02 only)
|
|
|
|
## Next Steps
|
|
|
|
- Monitor both units for stability (fTPM disabled on both, BIOS updated on pn02)
|
|
- If either freezes again, try adding `tsc=unstable` kernel parameter (unlikely to help but easy to rule out)
|
|
- If freezes continue, try disabling unused hardware in BIOS (GPU, WiFi, Bluetooth, audio)
|
|
- If still freezing, may be a hardware defect
|
|
- Once stable: add second RAM stick back to pn02, reinstall with NVMe
|