grafana-dashboards-permissions #36

torjus · 2026-02-08T20:18:18Z

torjus commented

2026-02-08 20:18:18 +00:00

Summary

This PR sets up declarative dashboard provisioning for the new Grafana instance on monitoring02 and fixes user permissions.

Changes

Permissions fix:

Changed default OIDC role from Viewer to Editor - Grafana 12.3.2 removed the viewers_can_edit setting, so Editor role is now required for Explore access

Dashboard provisioning:

Added declarative dashboard provisioning via services.grafana.provision.dashboards
Dashboards are stored as JSON in services/grafana/dashboards/

New dashboards:

node-exporter - Host metrics (CPU, memory, disk, load, network I/O) with instance selector
logs - Loki log viewer with host/job filters and search
temperature - Home Assistant temperature sensors with 30-day history, trends, humidity, and min/max/avg table
nixos-fleet - Fleet management showing hosts behind remote, needing reboot, generation ages, and flake info with tier filter
proxmox - VM status, CPU/memory usage, network/disk I/O, and storage from pve-exporter
systemd - Failed units, service restarts, timer status, and active units per host

Monitoring fix:

Fixed lib/monitoring.nix to always include tier label in Prometheus scrape configs (was previously omitting it for prod hosts, breaking tier-based filtering)

Test plan

Deployed to monitoring02 and verified dashboards load correctly
Verified OIDC login grants Editor role with Explore access
Deploy to monitoring01 to apply tier label fix (deferred)

## Summary This PR sets up declarative dashboard provisioning for the new Grafana instance on monitoring02 and fixes user permissions. ### Changes **Permissions fix:** - Changed default OIDC role from Viewer to Editor - Grafana 12.3.2 removed the `viewers_can_edit` setting, so Editor role is now required for Explore access **Dashboard provisioning:** - Added declarative dashboard provisioning via `services.grafana.provision.dashboards` - Dashboards are stored as JSON in `services/grafana/dashboards/` **New dashboards:** - **node-exporter** - Host metrics (CPU, memory, disk, load, network I/O) with instance selector - **logs** - Loki log viewer with host/job filters and search - **temperature** - Home Assistant temperature sensors with 30-day history, trends, humidity, and min/max/avg table - **nixos-fleet** - Fleet management showing hosts behind remote, needing reboot, generation ages, and flake info with tier filter - **proxmox** - VM status, CPU/memory usage, network/disk I/O, and storage from pve-exporter - **systemd** - Failed units, service restarts, timer status, and active units per host **Monitoring fix:** - Fixed `lib/monitoring.nix` to always include `tier` label in Prometheus scrape configs (was previously omitting it for prod hosts, breaking tier-based filtering) ## Test plan - [x] Deployed to monitoring02 and verified dashboards load correctly - [x] Verified OIDC login grants Editor role with Explore access - [ ] Deploy to monitoring01 to apply tier label fix (deferred)

torjus added 8 commits 2026-02-08 20:18:18 +00:00

grafana: add dashboards and fix permissions

Run nix flake check / flake-check (push) Successful in 2m3s

Details

4bf0eeeadb

- Change default OIDC role from Viewer to Editor for Explore access
- Add declarative dashboard provisioning
- Add node-exporter dashboard (CPU, memory, disk, load, network, I/O)
- Add Loki logs dashboard with host/job filters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

grafana: add temperature dashboard

Run nix flake check / flake-check (push) Successful in 2m5s

Details

1c13ec12a4

Dashboard includes:
- Current temperatures per room (stat panel)
- Average home temperature (gauge)
- Current humidity (stat panel)
- 30-day temperature history with mean/min/max in legend
- Temperature trend (rate of change per hour)
- 24h min/max/avg table per room
- 30-day humidity history

Filters out device_temperature (internal sensor) metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

grafana: add NixOS fleet dashboard

Run nix flake check / flake-check (push) Has been cancelled

Details

a5d5827dcc

Dashboard for monitoring NixOS deployments across the homelab:
- Hosts behind remote / needing reboot stat panels
- Fleet status table with revision, behind status, reboot needed, age
- Generation age bar chart (shows stale configs)
- Generations per host bar chart
- Deployment activity time series (see when hosts were updated)
- Flake input ages table
- Pie charts for hosts by revision and tier
- Tier filter variable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

grafana: fix fleet table __name__ columns

Run nix flake check / flake-check (push) Successful in 2m5s

Details

d333aa0164

Exclude the __name__ columns that were leaking through the
table transformations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

monitoring: always include tier label in scrape configs

Run nix flake check / flake-check (push) Successful in 2m8s

Details

1e52eec02a

Previously tier was only included if non-default (not "prod"), which
meant prod hosts had no tier label. This made the Grafana tier filter
only show "test" since "prod" never appeared in label_values().

Now tier is always included, so both "prod" and "test" appear in the
fleet dashboard tier selector.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

grafana: add Proxmox dashboard

Run nix flake check / flake-check (push) Has been cancelled

Details

05630eb4d4

Dashboard for monitoring Proxmox VMs:
- Summary stats: VMs running/stopped, node CPU/memory, uptime
- VM status table with name, status, CPU%, memory%, uptime
- VM CPU usage over time
- VM memory usage over time
- Network traffic (RX/TX) per VM
- Disk I/O (read/write) per VM
- Storage usage gauges and capacity table
- VM filter to focus on specific VMs

Filters out template VMs, shows only actual guests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

grafana: fix proxmox table __name__ column

Run nix flake check / flake-check (push) Successful in 2m9s

Details

03ebee4d82

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

grafana: add systemd services dashboard

Run nix flake check / flake-check (push) Failing after 8m30s

Details

Run nix flake check / flake-check (pull_request) Failing after 16m49s

Details

89d0a6f358

Dashboard for monitoring systemd across the fleet:
- Summary stats: failed/active/inactive units, restarts, timers
- Failed units table (shows any units in failed state)
- Service restarts table (top 15 services by restart count)
- Active units per host bar chart
- NixOS upgrade timer table with last trigger time
- Backup timers table (restic jobs)
- Service restarts over time chart
- Hostname filter to focus on specific hosts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

torjus merged commit 79a6a72719 into master

2026-02-08 20:18:23 +00:00

torjus deleted branch grafana-dashboards-permissions

2026-02-08 20:18:23 +00:00

torjus referenced this issue from a commit

2026-02-08 20:18:24 +00:00

Merge pull request 'grafana-dashboards-permissions' (#36) from grafana-dashboards-permissions into master

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: torjus/nixos-servers#36