monitoring: relax systemd_not_running alert threshold
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
Increase duration from 5m to 10m and demote severity from critical to warning. Brief degraded states during nixos-rebuild are normal and were causing false positive alerts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -75,12 +75,12 @@ groups:
|
||||
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
||||
- alert: systemd_not_running
|
||||
expr: node_systemd_system_running == 0
|
||||
for: 5m
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Systemd not in running state on {{ $labels.instance }}"
|
||||
description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state."
|
||||
description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state. Note: brief degraded states during nixos-rebuild are normal."
|
||||
- alert: high_file_descriptors
|
||||
expr: node_filefd_allocated / node_filefd_maximum > 0.8
|
||||
for: 5m
|
||||
|
||||
Reference in New Issue
Block a user