gunter: switch to proprietary nvidia driver and load in initrd
All checks were successful
Run nix flake check / flake-check (push) Successful in 1m26s
All checks were successful
Run nix flake check / flake-check (push) Successful in 1m26s
The open nvidia driver had significant issues with DP MST displays, including flip event timeouts and kernel warnings. The proprietary driver handles MST failures more gracefully. Loading nvidia modules in initrd eliminates the ~22 second black screen during boot. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -7,9 +7,11 @@ Two of the four monitors on gunter (desktop) intermittently fail to work on star
|
||||
## System Configuration
|
||||
|
||||
- **GPU**: NVIDIA GeForce RTX 3080 Ti
|
||||
- **Driver**: NVIDIA open driver 590.48.01 (beta)
|
||||
- **Kernel**: 6.18.8
|
||||
- **Driver**: NVIDIA proprietary driver 590.48.01 (beta)
|
||||
- **Kernel**: 6.18.12
|
||||
- **Compositor**: Hyprland
|
||||
- **Open driver**: `false` (switched from open to proprietary 2026-02-21)
|
||||
- **Initrd nvidia modules**: `nvidia`, `nvidia_modeset`, `nvidia_uvm`, `nvidia_drm`
|
||||
|
||||
### Monitor Setup
|
||||
|
||||
@@ -26,7 +28,7 @@ The GPU only has 3 DisplayPort outputs, so one Samsung monitor is connected to t
|
||||
|
||||
### Kernel Errors
|
||||
|
||||
The following errors appear in the kernel log during boot (17-27 seconds after boot start):
|
||||
The following errors appear in the kernel log during boot:
|
||||
|
||||
```
|
||||
[drm:nv_drm_dev_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to add connector for NvKmsKapiDisplay 0x00000800
|
||||
@@ -37,25 +39,44 @@ The following errors appear in the kernel log during boot (17-27 seconds after b
|
||||
|
||||
### Root Cause Analysis
|
||||
|
||||
1. **MST timing issues** - The downstream Samsung monitor isn't ready when the driver tries to enumerate the daisy chain during boot
|
||||
2. **NVIDIA open driver MST bugs** - The open-source driver (`hardware.nvidia.open = true`) has historically had more MST issues than the proprietary one
|
||||
3. **Power sequencing** - The monitors may need more time to negotiate the MST link during cold boot
|
||||
1. **MST timing issues** - The downstream Samsung monitor isn't ready when the driver tries to enumerate the daisy chain during boot. The MST topology hasn't been negotiated yet when the driver first probes, regardless of how early or late it loads.
|
||||
2. **Power sequencing** - The monitors may need more time to negotiate the MST link during cold boot
|
||||
|
||||
## Potential Solutions
|
||||
## Changes Made
|
||||
|
||||
1. **Switch to proprietary driver** - Change `hardware.nvidia.open = false` in `hosts/gunter/configuration.nix`
|
||||
### 2026-02-21: Switch to proprietary driver + initrd loading
|
||||
|
||||
2. **Add boot delay for nvidia-drm** - Add kernel parameter `nvidia-drm.load_on_init=0` to defer initialization
|
||||
**Change 1: `hardware.nvidia.open = false`** (previously `true`)
|
||||
|
||||
3. **Try different nvidia module options** - Add to `boot.extraModprobeConfig`:
|
||||
```nix
|
||||
options nvidia-drm modeset=1 fbdev=1
|
||||
options nvidia NVreg_PreserveVideoMemoryAllocations=1
|
||||
```
|
||||
With the open driver, boot produced 7 errors including flip event timeouts and kernel WARNING stack traces:
|
||||
```
|
||||
Failed to add connector for NvKmsKapiDisplay 0x00000800
|
||||
Failed to get dynamic displays
|
||||
Flip event timeout on head 0
|
||||
Flip event timeout on head 1
|
||||
Failed to add encoder for NvKmsKapiDisplay 0x00000001
|
||||
WARNING: CPU: 5 PID: 1169 at nvidia-drm/nvidia-drm-crtc.h:328 __nv_drm_handle_flip_event (x2)
|
||||
```
|
||||
|
||||
4. **Check monitor firmware** - Samsung LS27A600U monitors have had MST firmware updates
|
||||
With the proprietary driver, only the 2 MST enumeration errors remain. The flip timeouts and kernel warnings are gone. The driver handles the MST failure much more gracefully.
|
||||
|
||||
5. **Reduce link rate during boot** - Lower refresh rate to 60Hz initially to reduce bandwidth requirements
|
||||
**Change 2: Load nvidia modules in initrd** (`boot.initrd.kernelModules`)
|
||||
|
||||
Without initrd loading, the nvidia driver took ~22 seconds to initialize (11s to first error, 10 more to give up on dynamic displays). During this time monitors lost signal and went to sleep.
|
||||
|
||||
With initrd loading, the driver loads and initializes in under 1 second. However, the same two MST errors still occur - the MST topology simply isn't ready yet regardless of timing.
|
||||
|
||||
**Result**: Subjectively improved - monitors now typically recover after a single power cycle instead of requiring multiple attempts. The boot process is also faster with no 20+ second black screen hang.
|
||||
|
||||
## Remaining Solutions to Try
|
||||
|
||||
1. **Display rescan service** - Create a systemd service that triggers the nvidia driver to re-enumerate displays a few seconds after boot. This could auto-detect MST monitors without manual power cycling.
|
||||
|
||||
2. **Remove `quiet splash`** from kernel params - Keeps console output active during boot, which maintains an active DP signal through the UEFI-to-kernel transition and may help keep the MST link alive.
|
||||
|
||||
3. **Check monitor firmware** - Samsung LS27A600U monitors have had MST firmware updates. Updating could improve MST link negotiation reliability.
|
||||
|
||||
4. **Reduce initial link rate** - Lower refresh rate to 60Hz initially to reduce DP bandwidth requirements during MST negotiation, potentially making link training more reliable.
|
||||
|
||||
## Useful Diagnostic Commands
|
||||
|
||||
@@ -98,7 +119,3 @@ done
|
||||
|
||||
- `hosts/gunter/configuration.nix` - NVIDIA driver settings
|
||||
- `home/hosts/gunter/default.nix` - Hyprland monitor configuration
|
||||
|
||||
## Date Investigated
|
||||
|
||||
2025-02-06
|
||||
|
||||
Reference in New Issue
Block a user