diff --git a/docs/gunter-monitor-issues.md b/docs/gunter-monitor-issues.md new file mode 100644 index 0000000..5477baa --- /dev/null +++ b/docs/gunter-monitor-issues.md @@ -0,0 +1,104 @@ +# Gunter Monitor Boot Issues + +## Problem Description + +Two of the four monitors on gunter (desktop) intermittently fail to work on startup. The affected monitors are always the two Samsung LS27A600U displays, which are connected via DisplayPort daisy-chaining (MST - Multi-Stream Transport). Power cycling the monitors typically resolves the issue until the next reboot. + +## System Configuration + +- **GPU**: NVIDIA GeForce RTX 3080 Ti +- **Driver**: NVIDIA open driver 590.48.01 (beta) +- **Kernel**: 6.18.8 +- **Compositor**: Hyprland + +### Monitor Setup + +| Port | Monitor | Resolution | Connection | +|-------|--------------------------|----------------|---------------| +| DP-1 | Acer XB271HU (center) | 2560x1440@120Hz | Direct | +| DP-3 | BenQ G2420HDBL (top) | 1920x1080@60Hz | Direct | +| DP-4 | Samsung LS27A600U (right)| 2560x1440@75Hz | Daisy-chained | +| DP-5 | Samsung LS27A600U (left) | 2560x1440@75Hz | Daisy-chained | + +The GPU only has 3 DisplayPort outputs, so one Samsung monitor is connected to the other via DP daisy-chaining (MST). + +## Diagnostic Findings + +### Kernel Errors + +The following errors appear in the kernel log during boot (17-27 seconds after boot start): + +``` +[drm:nv_drm_dev_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to add connector for NvKmsKapiDisplay 0x00000800 +[drm:nv_drm_dev_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to get dynamic displays +``` + +"Dynamic displays" in NVIDIA terminology refers to MST-connected monitors. These errors indicate the driver is failing to enumerate the daisy-chained displays during initialization. + +### Root Cause Analysis + +1. **MST timing issues** - The downstream Samsung monitor isn't ready when the driver tries to enumerate the daisy chain during boot +2. **NVIDIA open driver MST bugs** - The open-source driver (`hardware.nvidia.open = true`) has historically had more MST issues than the proprietary one +3. **Power sequencing** - The monitors may need more time to negotiate the MST link during cold boot + +## Potential Solutions + +1. **Switch to proprietary driver** - Change `hardware.nvidia.open = false` in `hosts/gunter/configuration.nix` + +2. **Add boot delay for nvidia-drm** - Add kernel parameter `nvidia-drm.load_on_init=0` to defer initialization + +3. **Try different nvidia module options** - Add to `boot.extraModprobeConfig`: + ```nix + options nvidia-drm modeset=1 fbdev=1 + options nvidia NVreg_PreserveVideoMemoryAllocations=1 + ``` + +4. **Check monitor firmware** - Samsung LS27A600U monitors have had MST firmware updates + +5. **Reduce link rate during boot** - Lower refresh rate to 60Hz initially to reduce bandwidth requirements + +## Useful Diagnostic Commands + +### Kernel logs for display/nvidia issues +```bash +journalctl -k --no-pager | grep -iE '(nvidia|drm|display|edid|dp|hdmi|monitor)' +``` + +### Kernel errors and warnings +```bash +journalctl -k --no-pager | grep -iE '(error|fail|warn)' +``` + +### Current monitor state (Hyprland) +```bash +hyprctl monitors all +``` + +### DRM connector status +```bash +cat /sys/class/drm/*/status +ls -la /sys/class/drm/ +``` + +### GPU and driver info +```bash +nvidia-smi --query-gpu=name,driver_version --format=csv,noheader +``` + +### Check EDID data for each connector +```bash +for f in /sys/class/drm/card1-DP-*/; do + echo "=== $(basename $f) ==="; + cat "$f/enabled" 2>/dev/null; + cat "$f/edid" 2>/dev/null | xxd | head -5; +done +``` + +## Related Configuration Files + +- `hosts/gunter/configuration.nix` - NVIDIA driver settings +- `home/hosts/gunter/default.nix` - Hyprland monitor configuration + +## Date Investigated + +2025-02-06