3cccfc0487
monitoring: implement monitoring gaps coverage
...
Run nix flake check / flake-check (push) Failing after 7m36s
Add exporters and scrape targets for services lacking monitoring:
- PostgreSQL: postgres-exporter on pgdb1
- Authelia: native telemetry metrics on auth01
- Unbound: unbound-exporter with remote-control on ns1/ns2
- NATS: HTTP monitoring endpoint on nats1
- OpenBao: telemetry config and Prometheus scrape with token auth
- Systemd: systemd-exporter on all hosts for per-service metrics
Add alert rules for postgres, auth (authelia + lldap), jellyfin,
vault (openbao), plus extend existing nats and unbound rules.
Add Terraform config for Prometheus metrics policy and token. The
token is created via vault_token resource and stored in KV, so no
manual token creation is needed.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-05 21:44:13 +01:00
fa6380e767
monitoring: fix nix-cache_caddy scrape target TLS error
...
Run nix flake check / flake-check (push) Successful in 2m43s
Move nix-cache_caddy back to a manual config in prometheus.nix using the
service CNAME (nix-cache.home.2rjus.net) instead of the hostname. The
auto-generated target used nix-cache01.home.2rjus.net which doesn't
match the TLS certificate SAN.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-05 01:04:50 +01:00
dd1b64de27
monitoring: auto-generate Prometheus scrape targets from host configs
...
Run nix flake check / flake-check (pull_request) Successful in 2m49s
Run nix flake check / flake-check (push) Has been cancelled
Add homelab.monitoring NixOS options (enable, scrapeTargets) following
the same pattern as homelab.dns. Prometheus scrape configs are now
auto-generated from flake host configurations and external targets,
replacing hardcoded target lists.
Also cleans up alert rules: snake_case naming, fix zigbee2mqtt typo,
remove duplicate pushgateway alert, add for clauses to monitoring_rules,
remove hardcoded WireGuard public key, and add new alerts for
certificates, proxmox, caddy, smartctl temperature, filesystem
prediction, systemd state, file descriptors, and host reboots.
Fixes grafana scrape target port from 3100 to 3000.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-05 00:49:07 +01:00
adf70999b9
Fix scrape config
Run nix flake check / flake-check (push) Failing after 6m7s
Periodic flake update / flake-update (push) Successful in 3m13s
2025-06-01 02:41:54 +02:00
acb9e59775
Scrape nix-cache caddy
Run nix flake check / flake-check (push) Has been cancelled
2025-06-01 02:40:41 +02:00
77d1782f36
Set honor_labels for pushgw scrape
Run nix flake check / flake-check (push) Failing after 8m37s
2025-05-28 20:34:17 +02:00
5b06a95222
Add prometheus pushgateway
Run nix flake check / flake-check (push) Failing after 12m59s
2025-05-28 17:10:50 +02:00
2a46da3761
Add labmon to scrape config
Run nix flake check / flake-check (push) Failing after 14m32s
2025-05-24 03:37:52 +02:00
4e870cda44
Scrape step-ca metrics
Run nix flake check / flake-check (push) Failing after 3m52s
Periodic flake update / flake-update (push) Successful in 2m42s
2025-05-23 09:28:52 +02:00
6e6d5098c5
Collect ghettoptt stats
Run nix flake check / flake-check (push) Failing after 11m48s
2025-05-22 14:55:32 +02:00
aa2cbcda60
Add home assistant to prometheus
Run nix flake check / flake-check (push) Failing after 15m18s
2025-05-19 11:21:46 +02:00
fe2e87658a
Move prometheus roles to external file
Run nix flake check / flake-check (push) Failing after 3m7s
2025-05-18 14:54:09 +02:00
c07d96bbab
Add alert for wireguard handshake
Run nix flake check / flake-check (push) Failing after 3m17s
Periodic flake update / flake-update (push) Successful in 2m15s
2025-05-18 01:12:04 +02:00
bd58d07001
Monitor wireguard
Run nix flake check / flake-check (push) Failing after 3m32s
2025-05-18 00:59:55 +02:00
3797526000
Add some alerting rules for smartctl
Run nix flake check / flake-check (push) Has been cancelled
2025-05-18 00:51:02 +02:00
afa3cc3a57
Collect smartctl metrics from gunter
Run nix flake check / flake-check (push) Failing after 4m53s
2025-05-18 00:43:15 +02:00
08a0ddaf30
Increase prometheus retention to 30d
Run nix flake check / flake-check (push) Failing after 5m58s
Periodic flake update / flake-update (push) Successful in 4m7s
2025-05-12 23:22:31 +02:00
518e3a3ded
Fix flapping build-flakes alarm
Run nix flake check / flake-check (push) Failing after 6m57s
Periodic flake update / flake-update (push) Successful in 3m59s
2025-04-07 10:41:35 +02:00
0dbdee65c5
Add harmonia alerting rule
Run nix flake check / flake-check (push) Has been cancelled
2025-02-24 18:29:41 +01:00
874e30fb28
Tune cpu alarm
Run nix flake check / flake-check (push) Failing after 4m18s
2025-02-23 20:46:25 +01:00
15e5ccb0ec
Change alertmanager repeat time
Run nix flake check / flake-check (push) Failing after 3m41s
2025-02-23 18:10:14 +01:00
b8d058d23e
Add alerting rules
Run nix flake check / flake-check (push) Failing after 8m51s
2025-02-12 20:34:22 +01:00
a5448c5fc1
Remove whitespace
Run nix flake check / flake-check (push) Failing after 24m42s
Periodic flake update / flake-update (push) Successful in 1m24s
2025-02-12 00:26:14 +01:00
f1ca20a387
Add some alerting rules
Run nix flake check / flake-check (push) Failing after 14m34s
2025-02-11 23:24:35 +01:00
f0bc29ac5e
Add nats host to monitoring
Run nix flake check / flake-check (push) Has been cancelled
2025-02-11 23:12:55 +01:00
539ff4eeac
Change cpu load alert
Run nix flake check / flake-check (push) Waiting to run
2025-02-11 23:07:56 +01:00
abb4cf58ea
Add alerttonotify to monitoring host
Run nix flake check / flake-check (push) Has been cancelled
2025-02-11 22:25:54 +01:00
6079852cc6
Add missing hosts to prometheus scrap job
Run nix flake check / flake-check (push) Failing after 6m22s
Periodic flake update / flake-update (push) Successful in 1m30s
2025-01-26 00:56:21 +01:00
26bf43bba5
Collect restic rest metrics
Run nix flake check / flake-check (push) Failing after 6m44s
Periodic flake update / flake-update (push) Successful in 1m29s
2025-01-24 23:43:02 +01:00
2824718e53
Collect alertmanager metrics
Run nix flake check / flake-check (push) Has been cancelled
2025-01-24 23:34:43 +01:00
25b2f1d1ee
Collect grafana metrics
2025-01-24 23:33:49 +01:00
f2b5bb6f2a
Collect loki metrics
2025-01-24 23:32:45 +01:00
e70e892ab2
Add build-flakes script for nix-cache
Run nix flake check / flake-check (push) Failing after 4m20s
2025-01-24 01:12:18 +01:00
43dfc0ec28
Add some alerting rules
Run nix flake check / flake-check (push) Failing after 3m23s
Periodic flake update / flake-update (push) Successful in 1m32s
2025-01-21 22:47:44 +01:00
79b6598d0d
Add jellyfin
Run nix flake check / flake-check (push) Failing after 4m36s
Periodic flake update / flake-update (push) Successful in 1m29s
2024-12-22 04:33:00 +01:00
b3ebe3a3b0
Monitor prometheus metrics
Run nix flake check / flake-check (push) Failing after 6m24s
Periodic flake update / flake-update (push) Successful in 1m59s
2024-12-05 19:36:55 +01:00
4c60f7b5c1
Fix caddy metrics endpoint
Run nix flake check / flake-check (push) Failing after 10m38s
2024-12-04 04:09:06 +01:00
5af18ca418
Gather caddy metrics
Run nix flake check / flake-check (push) Has been cancelled
2024-12-04 04:02:24 +01:00
4b38158780
Add pve monitoring
Run nix flake check / flake-check (push) Failing after 23m15s
Periodic flake update / flake-update (push) Successful in 1m47s
2024-12-03 18:01:48 +01:00
91a844fe4d
Fix alerting
Run nix flake check / flake-check (push) Failing after 13m25s
Periodic flake update / flake-update (push) Successful in 2m19s
2024-12-03 00:47:00 +01:00
f08ac69003
Improve monitoring stuff
Run nix flake check / flake-check (push) Failing after 5m5s
2024-12-02 23:41:46 +01:00
b62a5c3db9
Disable alertmanager
Run nix flake check / flake-check (push) Failing after 3m25s
Periodic flake update / flake-update (push) Successful in 1m45s
2024-12-01 23:45:45 +01:00
a4592ffda3
Improve monitoring stuff
Run nix flake check / flake-check (push) Failing after 23m19s
2024-12-01 20:51:14 +01:00
3c3eaaa042
Add monitoring host
2024-12-01 01:51:34 +01:00