Commit Graph

192 Commits

Author SHA1 Message Date
28b8d7c115 monitoring: increase high_cpu_load duration for nix-cache01 to 2h
nix-cache01 regularly hits high CPU during nix builds, causing flappy
alerts. Keep the 15m threshold for all other hosts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:28:48 +01:00
3a9a47f1ad monitoring: exclude step-ca serving cert from general expiry alert
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m23s
Run nix flake check / flake-check (pull_request) Failing after 4m46s
The step-ca serving certificate is auto-renewed with a 24h lifetime,
so it always triggers the general < 86400s threshold. Exclude it and
add a dedicated step_ca_serving_cert_expiring alert at < 1h instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:12:42 +01:00
fa6380e767 monitoring: fix nix-cache_caddy scrape target TLS error
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m43s
Move nix-cache_caddy back to a manual config in prometheus.nix using the
service CNAME (nix-cache.home.2rjus.net) instead of the hostname. The
auto-generated target used nix-cache01.home.2rjus.net which doesn't
match the TLS certificate SAN.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:04:50 +01:00
dd1b64de27 monitoring: auto-generate Prometheus scrape targets from host configs
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m49s
Run nix flake check / flake-check (push) Has been cancelled
Add homelab.monitoring NixOS options (enable, scrapeTargets) following
the same pattern as homelab.dns. Prometheus scrape configs are now
auto-generated from flake host configurations and external targets,
replacing hardcoded target lists.

Also cleans up alert rules: snake_case naming, fix zigbee2mqtt typo,
remove duplicate pushgateway alert, add for clauses to monitoring_rules,
remove hardcoded WireGuard public key, and add new alerts for
certificates, proxmox, caddy, smartctl temperature, filesystem
prediction, systemd state, file descriptors, and host reboots.

Fixes grafana scrape target port from 3100 to 3000.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:49:07 +01:00
83af00458b dns: remove defunct external hosts
Remove hosts that no longer respond to ping:
- kube-blue1-10 (entire k8s cluster)
- virt-mini1, mpnzb, inc2, testing
- CNAMEs: rook, git (pointed to removed kube-blue nodes)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:50:56 +01:00
cee1b264cd dns: auto-generate zone entries from host configurations
Replace static zone file with dynamically generated records:
- Add homelab.dns module with enable/cnames options
- Extract IPs from systemd.network configs (filters VPN interfaces)
- Use git commit timestamp as zone serial number
- Move external hosts to separate external-hosts.nix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:43:44 +01:00
7ae474fd3e pki: add new vault root ca to pki 2026-02-03 06:53:59 +01:00
f0525b5c74 ns: add vaulttest01 to zone
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m19s
2026-02-03 06:42:05 +01:00
42c391b355 ns: add vault cname to zone
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m7s
2026-02-03 06:00:59 +01:00
c694b9889a vault: add auto-unseal
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m16s
2026-02-02 00:28:24 +01:00
ace848b29c vault: replace vault with openbao 2026-02-01 22:16:52 +01:00
b012df9f34 ns: add vault01 host to zone
Some checks failed
Run nix flake check / flake-check (push) Failing after 15m40s
Periodic flake update / flake-update (push) Successful in 1m7s
2026-02-01 20:54:22 +01:00
a2c798bc30 vault: add minimal vault config
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2026-02-01 20:27:02 +01:00
bb9de5b4ca auth01: fix secret mode
Some checks failed
Run nix flake check / flake-check (push) Failing after 2m4s
2025-12-06 11:37:11 +01:00
8eefe38d5e auth01: fix secret group
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-12-06 11:34:34 +01:00
78efc4f592 auth01: fix secret path
Some checks failed
Run nix flake check / flake-check (push) Failing after 1m54s
2025-12-06 11:07:53 +01:00
25b786915c auth01: add lldap password to secrets
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-12-06 11:02:43 +01:00
3219b8da4b nix-cache01: re-add homelab label
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m15s
Periodic flake update / flake-update (push) Successful in 2m32s
2025-08-27 23:00:47 +02:00
e5d799ef68 nix-cache01: redo actions config
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-08-27 22:57:26 +02:00
2fc4623e8d nix-cache01: make more changes to runner
Some checks failed
Run nix flake check / flake-check (push) Failing after 23s
2025-08-27 22:47:27 +02:00
bd162f3743 nix-cache01: make some changes to runner
Some checks failed
Run nix flake check / flake-check (push) Failing after 12s
2025-08-27 22:42:42 +02:00
b86de01de8 nix-cache01: change runner log-level to debug
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-08-27 22:29:28 +02:00
09bd63169d nix-cache01: add podman to host
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m41s
Periodic flake update / flake-update (push) Successful in 2m0s
2025-08-21 21:36:49 +02:00
ef3d34d27f nix-cache01: change runner labels
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m50s
2025-08-21 21:28:14 +02:00
ad3f4e8094 nix-cache01: fix actions config secret name
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-08-21 21:00:20 +02:00
fa4e47a873 nix-cache01: fix instance name in runner
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-08-21 20:59:18 +02:00
f49711b1b3 nix-cache01: fix typo in actions config
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-08-21 20:57:02 +02:00
a0e94430b4 nix-cache01: add actions runner
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-08-21 20:56:04 +02:00
bcf01a0c11 ha1: add missing python package
Some checks failed
Run nix flake check / flake-check (push) Failing after 13m50s
Periodic flake update / flake-update (push) Successful in 3m53s
2025-08-05 17:36:11 +02:00
ccd9bbf4da Remove incus hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 14m57s
Periodic flake update / flake-update (push) Successful in 3m35s
2025-07-07 21:30:04 +02:00
adf70999b9 Fix scrape config
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m7s
Periodic flake update / flake-update (push) Successful in 3m13s
2025-06-01 02:41:54 +02:00
acb9e59775 Scrape nix-cache caddy
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-06-01 02:40:41 +02:00
fa4782e43f Attempt to fix caddyfile again
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m5s
2025-06-01 02:35:31 +02:00
9236d6aef7 Fix caddyfile for nix-cache
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-06-01 02:34:31 +02:00
7f84780956 Enable metrics endpoint for caddy on nix-cache
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-06-01 02:32:22 +02:00
41aac24d52 Change caddy config on nix-cache
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-06-01 02:30:33 +02:00
3e943862ef Fix error in caddyfile
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m21s
2025-06-01 02:25:50 +02:00
4754fea0c2 Configure more metrics for caddy
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-06-01 02:24:34 +02:00
2747556674 Add --show-error to curl in build-flakes
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m0s
2025-06-01 02:08:50 +02:00
de8bcda3c1 Modify curl flags for build-flakes
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-06-01 02:07:20 +02:00
14aa3a9340 Remove non-working timer rule
Some checks failed
Run nix flake check / flake-check (push) Failing after 14m3s
Periodic flake update / flake-update (push) Successful in 3m9s
2025-05-29 10:15:40 +02:00
797f915939 Add monitoring rules for monitoring services
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-29 10:09:27 +02:00
3785b8047a Fix alert name for build-flakes alert
Some checks failed
Run nix flake check / flake-check (push) Failing after 10m34s
Periodic flake update / flake-update (push) Successful in 3m3s
2025-05-28 21:28:04 +02:00
fb1a36a846 Rework build-flakes alert rules
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-28 21:26:04 +02:00
87c98581c2 Move label to url in build-flakes
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m4s
2025-05-28 21:18:03 +02:00
2538f57312 Add curl requirement to build-flakes
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-28 21:08:12 +02:00
a790331d0f Remove extra shebang in build-flakes
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-28 21:06:37 +02:00
3588fa670e Fix shellcheck warnings in build-flakes
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-28 21:05:28 +02:00
dd255955ca Rework build-flakes script
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2025-05-28 21:03:25 +02:00
77d1782f36 Set honor_labels for pushgw scrape
Some checks failed
Run nix flake check / flake-check (push) Failing after 8m37s
2025-05-28 20:34:17 +02:00