grafana-dashboards-permissions #36

Merged
torjus merged 8 commits from grafana-dashboards-permissions into master 2026-02-08 20:18:23 +00:00
Owner

Summary

This PR sets up declarative dashboard provisioning for the new Grafana instance on monitoring02 and fixes user permissions.

Changes

Permissions fix:

  • Changed default OIDC role from Viewer to Editor - Grafana 12.3.2 removed the viewers_can_edit setting, so Editor role is now required for Explore access

Dashboard provisioning:

  • Added declarative dashboard provisioning via services.grafana.provision.dashboards
  • Dashboards are stored as JSON in services/grafana/dashboards/

New dashboards:

  • node-exporter - Host metrics (CPU, memory, disk, load, network I/O) with instance selector
  • logs - Loki log viewer with host/job filters and search
  • temperature - Home Assistant temperature sensors with 30-day history, trends, humidity, and min/max/avg table
  • nixos-fleet - Fleet management showing hosts behind remote, needing reboot, generation ages, and flake info with tier filter
  • proxmox - VM status, CPU/memory usage, network/disk I/O, and storage from pve-exporter
  • systemd - Failed units, service restarts, timer status, and active units per host

Monitoring fix:

  • Fixed lib/monitoring.nix to always include tier label in Prometheus scrape configs (was previously omitting it for prod hosts, breaking tier-based filtering)

Test plan

  • Deployed to monitoring02 and verified dashboards load correctly
  • Verified OIDC login grants Editor role with Explore access
  • Deploy to monitoring01 to apply tier label fix (deferred)
## Summary This PR sets up declarative dashboard provisioning for the new Grafana instance on monitoring02 and fixes user permissions. ### Changes **Permissions fix:** - Changed default OIDC role from Viewer to Editor - Grafana 12.3.2 removed the `viewers_can_edit` setting, so Editor role is now required for Explore access **Dashboard provisioning:** - Added declarative dashboard provisioning via `services.grafana.provision.dashboards` - Dashboards are stored as JSON in `services/grafana/dashboards/` **New dashboards:** - **node-exporter** - Host metrics (CPU, memory, disk, load, network I/O) with instance selector - **logs** - Loki log viewer with host/job filters and search - **temperature** - Home Assistant temperature sensors with 30-day history, trends, humidity, and min/max/avg table - **nixos-fleet** - Fleet management showing hosts behind remote, needing reboot, generation ages, and flake info with tier filter - **proxmox** - VM status, CPU/memory usage, network/disk I/O, and storage from pve-exporter - **systemd** - Failed units, service restarts, timer status, and active units per host **Monitoring fix:** - Fixed `lib/monitoring.nix` to always include `tier` label in Prometheus scrape configs (was previously omitting it for prod hosts, breaking tier-based filtering) ## Test plan - [x] Deployed to monitoring02 and verified dashboards load correctly - [x] Verified OIDC login grants Editor role with Explore access - [ ] Deploy to monitoring01 to apply tier label fix (deferred)
torjus added 8 commits 2026-02-08 20:18:18 +00:00
grafana: add dashboards and fix permissions
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
4bf0eeeadb
- Change default OIDC role from Viewer to Editor for Explore access
- Add declarative dashboard provisioning
- Add node-exporter dashboard (CPU, memory, disk, load, network, I/O)
- Add Loki logs dashboard with host/job filters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
grafana: add temperature dashboard
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
1c13ec12a4
Dashboard includes:
- Current temperatures per room (stat panel)
- Average home temperature (gauge)
- Current humidity (stat panel)
- 30-day temperature history with mean/min/max in legend
- Temperature trend (rate of change per hour)
- 24h min/max/avg table per room
- 30-day humidity history

Filters out device_temperature (internal sensor) metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
grafana: add NixOS fleet dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
a5d5827dcc
Dashboard for monitoring NixOS deployments across the homelab:
- Hosts behind remote / needing reboot stat panels
- Fleet status table with revision, behind status, reboot needed, age
- Generation age bar chart (shows stale configs)
- Generations per host bar chart
- Deployment activity time series (see when hosts were updated)
- Flake input ages table
- Pie charts for hosts by revision and tier
- Tier filter variable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
grafana: fix fleet table __name__ columns
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
d333aa0164
Exclude the __name__ columns that were leaking through the
table transformations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
monitoring: always include tier label in scrape configs
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m8s
1e52eec02a
Previously tier was only included if non-default (not "prod"), which
meant prod hosts had no tier label. This made the Grafana tier filter
only show "test" since "prod" never appeared in label_values().

Now tier is always included, so both "prod" and "test" appear in the
fleet dashboard tier selector.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
grafana: add Proxmox dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
05630eb4d4
Dashboard for monitoring Proxmox VMs:
- Summary stats: VMs running/stopped, node CPU/memory, uptime
- VM status table with name, status, CPU%, memory%, uptime
- VM CPU usage over time
- VM memory usage over time
- Network traffic (RX/TX) per VM
- Disk I/O (read/write) per VM
- Storage usage gauges and capacity table
- VM filter to focus on specific VMs

Filters out template VMs, shows only actual guests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
grafana: fix proxmox table __name__ column
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m9s
03ebee4d82
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
grafana: add systemd services dashboard
Some checks failed
Run nix flake check / flake-check (push) Failing after 8m30s
Run nix flake check / flake-check (pull_request) Failing after 16m49s
89d0a6f358
Dashboard for monitoring systemd across the fleet:
- Summary stats: failed/active/inactive units, restarts, timers
- Failed units table (shows any units in failed state)
- Service restarts table (top 15 services by restart count)
- Active units per host bar chart
- NixOS upgrade timer table with last trigger time
- Backup timers table (restic jobs)
- Service restarts over time chart
- Hostname filter to focus on specific hosts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
torjus merged commit 79a6a72719 into master 2026-02-08 20:18:23 +00:00
torjus deleted branch grafana-dashboards-permissions 2026-02-08 20:18:23 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: torjus/nixos-servers#36