monitoring: relax systemd_not_running alert threshold

Increase duration from 5m to 10m and demote severity from critical to warning. Brief degraded states during nixos-rebuild are normal and were causing false positive alerts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 01:22:29 +01:00
parent b9a269d280
commit 881e70df27
1 changed files with 3 additions and 3 deletions
--- a/services/monitoring/rules.yml
+++ b/services/monitoring/rules.yml
@@ -75,12 +75,12 @@ groups:
          description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
      - alert: systemd_not_running
        expr: node_systemd_system_running == 0
-        for: 5m
+        for: 10m
        labels:
-          severity: critical
+          severity: warning
        annotations:
          summary: "Systemd not in running state on {{ $labels.instance }}"
-          description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state."
+          description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state. Note: brief degraded states during nixos-rebuild are normal."
      - alert: high_file_descriptors
        expr: node_filefd_allocated / node_filefd_maximum > 0.8
        for: 5m