monitoring: increase filesystem_filling_up prediction window to 24h
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m55s
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m55s
Reduces false positives from transient Nix store growth by basing the linear prediction on a 24h trend instead of 6h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -67,13 +67,13 @@ groups:
|
|||||||
summary: "Promtail service not running on {{ $labels.instance }}"
|
summary: "Promtail service not running on {{ $labels.instance }}"
|
||||||
description: "The promtail service has not been active on {{ $labels.instance }} for 5 minutes."
|
description: "The promtail service has not been active on {{ $labels.instance }} for 5 minutes."
|
||||||
- alert: filesystem_filling_up
|
- alert: filesystem_filling_up
|
||||||
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24*3600) < 0
|
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[24h], 24*3600) < 0
|
||||||
for: 1h
|
for: 1h
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
annotations:
|
annotations:
|
||||||
summary: "Filesystem predicted to fill within 24h on {{ $labels.instance }}"
|
summary: "Filesystem predicted to fill within 24h on {{ $labels.instance }}"
|
||||||
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
description: "Based on the last 24h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
||||||
- alert: systemd_not_running
|
- alert: systemd_not_running
|
||||||
expr: node_systemd_system_running == 0
|
expr: node_systemd_system_running == 0
|
||||||
for: 10m
|
for: 10m
|
||||||
|
|||||||
Reference in New Issue
Block a user