Commit Graph

17 Commits

Author SHA1 Message Date
65acf13e6f grafana: fix datasource UIDs for VictoriaMetrics migration
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Update all dashboard datasource references from "prometheus" to
"victoriametrics" to match the declared datasource UID. Enable
prune and deleteDatasources to clean up the old Prometheus
(monitoring01) datasource from Grafana's database.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 22:23:04 +01:00
c151f31011 grafana: fix apiary dashboard panels empty on short time ranges
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m54s
Set interval=60s on rate() panels to match the actual Prometheus scrape
interval, so Grafana calculates $__rate_interval correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:03:26 +01:00
3e7aabc73a grafana: fix apiary geomap and make it full-width
Some checks failed
Run nix flake check / flake-check (push) Failing after 5m6s
Periodic flake update / flake-update (push) Successful in 5m25s
Add gazetteer reference for country code lookup resolution.
Remove unnecessary reduce transformation. Make geomap panel
full-width (24 cols) and taller (h=10) on its own row.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:36:24 +01:00
361e7f2a1b grafana: add apiary honeypot dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:31:06 +01:00
0bc10cb1fe grafana: add build service panels to nixos-fleet dashboard
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m48s
Periodic flake update / flake-update (push) Successful in 2m20s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 00:49:50 +01:00
1460eea700 grafana: fix probe status table join
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m9s
Use joinByField transformation instead of merge to properly align
rows by instance. Also exclude duplicate Time/job columns from join.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 22:38:02 +01:00
98c4f54f94 grafana: add TLS certificates dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Dashboard includes:
- Stat panels for endpoints monitored, probe failures, expiring certs
- Gauge showing minimum days until any cert expires
- Table of all endpoints sorted by expiry (color-coded)
- Probe status table with HTTP status and duration
- Time series graphs for expiry trends and probe success rate

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 22:35:44 +01:00
2f5a2a4bf1 grafana: use instant queries for fleet dashboard stat panels
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m6s
Prevents stat panels from being affected by dashboard time range selection.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 19:00:33 +01:00
ed7d2aa727 grafana: add deployment metrics to nixos-fleet dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 15:58:28 +01:00
f66dfc753c grafana: add NixOS operations dashboard
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m24s
Run nix flake check / flake-check (pull_request) Successful in 4m5s
Loki-based dashboard for tracking NixOS operations including:
- Upgrade activity and success/failure stats
- Build activity during upgrades
- Bootstrap logs for new VM deployments
- ACME certificate renewal activity

Log panels use LogQL json parsing with | keep host to show
clean messages with host labels.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 22:03:28 +01:00
89d0a6f358 grafana: add systemd services dashboard
Some checks failed
Run nix flake check / flake-check (push) Failing after 8m30s
Run nix flake check / flake-check (pull_request) Failing after 16m49s
Dashboard for monitoring systemd across the fleet:
- Summary stats: failed/active/inactive units, restarts, timers
- Failed units table (shows any units in failed state)
- Service restarts table (top 15 services by restart count)
- Active units per host bar chart
- NixOS upgrade timer table with last trigger time
- Backup timers table (restic jobs)
- Service restarts over time chart
- Hostname filter to focus on specific hosts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:06:59 +01:00
03ebee4d82 grafana: fix proxmox table __name__ column
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m9s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:04:41 +01:00
05630eb4d4 grafana: add Proxmox dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Dashboard for monitoring Proxmox VMs:
- Summary stats: VMs running/stopped, node CPU/memory, uptime
- VM status table with name, status, CPU%, memory%, uptime
- VM CPU usage over time
- VM memory usage over time
- Network traffic (RX/TX) per VM
- Disk I/O (read/write) per VM
- Storage usage gauges and capacity table
- VM filter to focus on specific VMs

Filters out template VMs, shows only actual guests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:02:28 +01:00
d333aa0164 grafana: fix fleet table __name__ columns
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Exclude the __name__ columns that were leaking through the
table transformations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:52:39 +01:00
a5d5827dcc grafana: add NixOS fleet dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Dashboard for monitoring NixOS deployments across the homelab:
- Hosts behind remote / needing reboot stat panels
- Fleet status table with revision, behind status, reboot needed, age
- Generation age bar chart (shows stale configs)
- Generations per host bar chart
- Deployment activity time series (see when hosts were updated)
- Flake input ages table
- Pie charts for hosts by revision and tier
- Tier filter variable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:50:08 +01:00
1c13ec12a4 grafana: add temperature dashboard
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Dashboard includes:
- Current temperatures per room (stat panel)
- Average home temperature (gauge)
- Current humidity (stat panel)
- 30-day temperature history with mean/min/max in legend
- Temperature trend (rate of change per hour)
- 24h min/max/avg table per room
- 30-day humidity history

Filters out device_temperature (internal sensor) metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:45:52 +01:00
4bf0eeeadb grafana: add dashboards and fix permissions
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
- Change default OIDC role from Viewer to Editor for Explore access
- Add declarative dashboard provisioning
- Add node-exporter dashboard (CPU, memory, disk, load, network, I/O)
- Add Loki logs dashboard with host/job filters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:39:21 +01:00