monitoring: relax systemd_not_running alert threshold
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s

Increase duration from 5m to 10m and demote severity from critical to
warning. Brief degraded states during nixos-rebuild are normal and were
causing false positive alerts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-07 01:22:29 +01:00
parent b9a269d280
commit 881e70df27

View File

@@ -75,12 +75,12 @@ groups:
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours." description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
- alert: systemd_not_running - alert: systemd_not_running
expr: node_systemd_system_running == 0 expr: node_systemd_system_running == 0
for: 5m for: 10m
labels: labels:
severity: critical severity: warning
annotations: annotations:
summary: "Systemd not in running state on {{ $labels.instance }}" summary: "Systemd not in running state on {{ $labels.instance }}"
description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state." description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state. Note: brief degraded states during nixos-rebuild are normal."
- alert: high_file_descriptors - alert: high_file_descriptors
expr: node_filefd_allocated / node_filefd_maximum > 0.8 expr: node_filefd_allocated / node_filefd_maximum > 0.8
for: 5m for: 5m