nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	65acf13e6f	grafana: fix datasource UIDs for VictoriaMetrics migration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Update all dashboard datasource references from "prometheus" to "victoriametrics" to match the declared datasource UID. Enable prune and deleteDatasources to clean up the old Prometheus (monitoring01) datasource from Grafana's database. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:23:04 +01:00
Torjus Håkestad	c151f31011	grafana: fix apiary dashboard panels empty on short time ranges Some checks failed Run nix flake check / flake-check (push) Failing after 3m54s Details Set interval=60s on rate() panels to match the actual Prometheus scrape interval, so Grafana calculates $__rate_interval correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 20:03:26 +01:00
Torjus Håkestad	3e7aabc73a	grafana: fix apiary geomap and make it full-width Some checks failed Run nix flake check / flake-check (push) Failing after 5m6s Details Periodic flake update / flake-update (push) Successful in 5m25s Details Add gazetteer reference for country code lookup resolution. Remove unnecessary reduce transformation. Make geomap panel full-width (24 cols) and taller (h=10) on its own row. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:36:24 +01:00
Torjus Håkestad	361e7f2a1b	grafana: add apiary honeypot dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:31:06 +01:00
Torjus Håkestad	0bc10cb1fe	grafana: add build service panels to nixos-fleet dashboard Some checks failed Run nix flake check / flake-check (push) Failing after 4m48s Details Periodic flake update / flake-update (push) Successful in 2m20s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-11 00:49:50 +01:00
Torjus Håkestad	1460eea700	grafana: fix probe status table join All checks were successful Run nix flake check / flake-check (push) Successful in 2m9s Details Use joinByField transformation instead of merge to properly align rows by instance. Also exclude duplicate Time/job columns from join. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:38:02 +01:00
Torjus Håkestad	98c4f54f94	grafana: add TLS certificates dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Dashboard includes: - Stat panels for endpoints monitored, probe failures, expiring certs - Gauge showing minimum days until any cert expires - Table of all endpoints sorted by expiry (color-coded) - Probe status table with HTTP status and duration - Time series graphs for expiry trends and probe success rate Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 22:35:44 +01:00
Torjus Håkestad	2f5a2a4bf1	grafana: use instant queries for fleet dashboard stat panels All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Prevents stat panels from being affected by dashboard time range selection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 19:00:33 +01:00
Torjus Håkestad	ed7d2aa727	grafana: add deployment metrics to nixos-fleet dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 15:58:28 +01:00
Torjus Håkestad	f66dfc753c	grafana: add NixOS operations dashboard All checks were successful Run nix flake check / flake-check (push) Successful in 3m24s Details Run nix flake check / flake-check (pull_request) Successful in 4m5s Details Loki-based dashboard for tracking NixOS operations including: - Upgrade activity and success/failure stats - Build activity during upgrades - Bootstrap logs for new VM deployments - ACME certificate renewal activity Log panels use LogQL json parsing with \| keep host to show clean messages with host labels. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 22:03:28 +01:00
Torjus Håkestad	89d0a6f358	grafana: add systemd services dashboard Some checks failed Run nix flake check / flake-check (push) Failing after 8m30s Details Run nix flake check / flake-check (pull_request) Failing after 16m49s Details Dashboard for monitoring systemd across the fleet: - Summary stats: failed/active/inactive units, restarts, timers - Failed units table (shows any units in failed state) - Service restarts table (top 15 services by restart count) - Active units per host bar chart - NixOS upgrade timer table with last trigger time - Backup timers table (restic jobs) - Service restarts over time chart - Hostname filter to focus on specific hosts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 21:06:59 +01:00
Torjus Håkestad	03ebee4d82	grafana: fix proxmox table __name__ column All checks were successful Run nix flake check / flake-check (push) Successful in 2m9s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 21:04:41 +01:00
Torjus Håkestad	05630eb4d4	grafana: add Proxmox dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Dashboard for monitoring Proxmox VMs: - Summary stats: VMs running/stopped, node CPU/memory, uptime - VM status table with name, status, CPU%, memory%, uptime - VM CPU usage over time - VM memory usage over time - Network traffic (RX/TX) per VM - Disk I/O (read/write) per VM - Storage usage gauges and capacity table - VM filter to focus on specific VMs Filters out template VMs, shows only actual guests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 21:02:28 +01:00
Torjus Håkestad	d333aa0164	grafana: fix fleet table __name__ columns All checks were successful Run nix flake check / flake-check (push) Successful in 2m5s Details Exclude the __name__ columns that were leaking through the table transformations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:52:39 +01:00
Torjus Håkestad	a5d5827dcc	grafana: add NixOS fleet dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Dashboard for monitoring NixOS deployments across the homelab: - Hosts behind remote / needing reboot stat panels - Fleet status table with revision, behind status, reboot needed, age - Generation age bar chart (shows stale configs) - Generations per host bar chart - Deployment activity time series (see when hosts were updated) - Flake input ages table - Pie charts for hosts by revision and tier - Tier filter variable Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:50:08 +01:00
Torjus Håkestad	1c13ec12a4	grafana: add temperature dashboard All checks were successful Run nix flake check / flake-check (push) Successful in 2m5s Details Dashboard includes: - Current temperatures per room (stat panel) - Average home temperature (gauge) - Current humidity (stat panel) - 30-day temperature history with mean/min/max in legend - Temperature trend (rate of change per hour) - 24h min/max/avg table per room - 30-day humidity history Filters out device_temperature (internal sensor) metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:45:52 +01:00
Torjus Håkestad	4bf0eeeadb	grafana: add dashboards and fix permissions All checks were successful Run nix flake check / flake-check (push) Successful in 2m3s Details - Change default OIDC role from Viewer to Editor for Explore access - Add declarative dashboard provisioning - Add node-exporter dashboard (CPU, memory, disk, load, network, I/O) - Add Loki logs dashboard with host/job filters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:39:21 +01:00

17 Commits