16 Commits

Author SHA1 Message Date
79a6a72719 Merge pull request 'grafana-dashboards-permissions' (#36) from grafana-dashboards-permissions into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
Reviewed-on: #36
2026-02-08 20:18:22 +00:00
89d0a6f358 grafana: add systemd services dashboard
Some checks failed
Run nix flake check / flake-check (push) Failing after 8m30s
Run nix flake check / flake-check (pull_request) Failing after 16m49s
Dashboard for monitoring systemd across the fleet:
- Summary stats: failed/active/inactive units, restarts, timers
- Failed units table (shows any units in failed state)
- Service restarts table (top 15 services by restart count)
- Active units per host bar chart
- NixOS upgrade timer table with last trigger time
- Backup timers table (restic jobs)
- Service restarts over time chart
- Hostname filter to focus on specific hosts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:06:59 +01:00
03ebee4d82 grafana: fix proxmox table __name__ column
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m9s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:04:41 +01:00
05630eb4d4 grafana: add Proxmox dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Dashboard for monitoring Proxmox VMs:
- Summary stats: VMs running/stopped, node CPU/memory, uptime
- VM status table with name, status, CPU%, memory%, uptime
- VM CPU usage over time
- VM memory usage over time
- Network traffic (RX/TX) per VM
- Disk I/O (read/write) per VM
- Storage usage gauges and capacity table
- VM filter to focus on specific VMs

Filters out template VMs, shows only actual guests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:02:28 +01:00
1e52eec02a monitoring: always include tier label in scrape configs
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m8s
Previously tier was only included if non-default (not "prod"), which
meant prod hosts had no tier label. This made the Grafana tier filter
only show "test" since "prod" never appeared in label_values().

Now tier is always included, so both "prod" and "test" appear in the
fleet dashboard tier selector.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:58:52 +01:00
d333aa0164 grafana: fix fleet table __name__ columns
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Exclude the __name__ columns that were leaking through the
table transformations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:52:39 +01:00
a5d5827dcc grafana: add NixOS fleet dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Dashboard for monitoring NixOS deployments across the homelab:
- Hosts behind remote / needing reboot stat panels
- Fleet status table with revision, behind status, reboot needed, age
- Generation age bar chart (shows stale configs)
- Generations per host bar chart
- Deployment activity time series (see when hosts were updated)
- Flake input ages table
- Pie charts for hosts by revision and tier
- Tier filter variable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:50:08 +01:00
1c13ec12a4 grafana: add temperature dashboard
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Dashboard includes:
- Current temperatures per room (stat panel)
- Average home temperature (gauge)
- Current humidity (stat panel)
- 30-day temperature history with mean/min/max in legend
- Temperature trend (rate of change per hour)
- 24h min/max/avg table per room
- 30-day humidity history

Filters out device_temperature (internal sensor) metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:45:52 +01:00
4bf0eeeadb grafana: add dashboards and fix permissions
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
- Change default OIDC role from Viewer to Editor for Explore access
- Add declarative dashboard provisioning
- Add node-exporter dashboard (CPU, memory, disk, load, network, I/O)
- Add Loki logs dashboard with host/job filters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:39:21 +01:00
304cb117ce Merge pull request 'grafana-kanidm-oidc' (#35) from grafana-kanidm-oidc into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m7s
Reviewed-on: #35
2026-02-08 19:30:20 +00:00
02270a0e4a docs: update plans with Grafana OIDC progress
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m7s
Run nix flake check / flake-check (push) Failing after 16m31s
- auth-system-replacement.md: Mark OAuth2 client (Grafana) as completed,
  document key findings (PKCE, attribute paths, user requirements)
- monitoring-migration-victoriametrics.md: Note Grafana deployment on
  monitoring02 with Kanidm OIDC as test instance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:28:10 +01:00
030e8518c5 grafana: add Grafana on monitoring02 with Kanidm OIDC
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m3s
Deploy Grafana test instance on monitoring02 with:
- Kanidm OIDC authentication (admins -> Admin role, others -> Viewer)
- PKCE enabled for secure OAuth2 flow (required by Kanidm)
- Declarative datasources for Prometheus and Loki on monitoring01
- Local Caddy for TLS termination via internal ACME CA
- DNS CNAME grafana-test.home.2rjus.net

Terraform changes add OAuth2 client secret and AppRole policies for
kanidm01 and monitoring02.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:23:26 +01:00
9ffdd4f862 terraform: increase monitoring02 disk to 60G
Some checks failed
Run nix flake check / flake-check (push) Failing after 11m8s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 19:23:40 +01:00
0b977808ca hosts: add monitoring02 configuration
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
New test-tier host for monitoring stack expansion with:
- Static IP 10.69.13.24
- 4 CPU cores, 4GB RAM, 20GB disk
- Vault integration and NATS-based deployment enabled

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 19:19:38 +01:00
8786113f8f docs: add OpenBao + Kanidm OIDC integration plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m10s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:45:44 +01:00
fdb2c31f84 docs: add pipe-to-loki documentation to CLAUDE.md
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m1s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:34:01 +01:00
21 changed files with 2890 additions and 5 deletions

View File

@@ -39,6 +39,30 @@ Do not automatically deploy changes. Deployments are usually done by updating th
Do not run SSH commands directly. If a command needs to be run on a remote host, provide the command to the user and ask them to run it manually. Do not run SSH commands directly. If a command needs to be run on a remote host, provide the command to the user and ask them to run it manually.
### Sharing Command Output via Loki
All hosts have the `pipe-to-loki` script for sending command output or terminal sessions to Loki, allowing users to share output with Claude without copy-pasting.
**Pipe mode** - send command output:
```bash
command | pipe-to-loki # Auto-generated ID
command | pipe-to-loki --id my-test # Custom ID
```
**Session mode** - record interactive terminal session:
```bash
pipe-to-loki --record # Start recording, exit to send
pipe-to-loki --record --id my-session # With custom ID
```
The script prints the session ID which the user can share. Query results with:
```logql
{job="pipe-to-loki"} # All entries
{job="pipe-to-loki", id="my-test"} # Specific ID
{job="pipe-to-loki", host="testvm01"} # From specific host
{job="pipe-to-loki", type="session"} # Only sessions
```
### Testing Feature Branches on Hosts ### Testing Feature Branches on Hosts
All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging: All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging:

View File

@@ -151,11 +151,30 @@ Rationale:
- Well above NixOS system users (typically <1000) - Well above NixOS system users (typically <1000)
- Avoids Podman/container issues with very high GIDs - Avoids Podman/container issues with very high GIDs
### Completed (2026-02-08) - OAuth2/OIDC for Grafana
**OAuth2 client deployed for Grafana on monitoring02:**
- Client ID: `grafana`
- Redirect URL: `https://grafana-test.home.2rjus.net/login/generic_oauth`
- Scope maps: `openid`, `profile`, `email`, `groups` for `users` group
- Role mapping: `admins` group → Grafana Admin, others → Viewer
**Configuration locations:**
- Kanidm OAuth2 client: `services/kanidm/default.nix`
- Grafana OIDC config: `services/grafana/default.nix`
- Vault secret: `services/grafana/oauth2-client-secret`
**Key findings:**
- PKCE is required by Kanidm - enable `use_pkce = true` in Grafana
- Must set `email_attribute_path`, `login_attribute_path`, `name_attribute_path` to extract from userinfo
- Users need: primary credential (password + TOTP for MFA), membership in `users` group, email address set
- Unix password is separate from primary credential (web login requires primary credential)
### Next Steps ### Next Steps
1. Enable PAM/NSS on production hosts (after test tier validation) 1. Enable PAM/NSS on production hosts (after test tier validation)
2. Configure TrueNAS LDAP client for NAS integration testing 2. Configure TrueNAS LDAP client for NAS integration testing
3. Add OAuth2 clients (Grafana first) 3. Add OAuth2 clients for other services as needed
## References ## References

View File

@@ -169,9 +169,30 @@ Once ready to cut over:
- Destroy VM in Proxmox - Destroy VM in Proxmox
- Remove from terraform state - Remove from terraform state
## Current Progress
### monitoring02 Host Created (2026-02-08)
Host deployed at 10.69.13.24 (test tier) with:
- 4 CPU cores, 8GB RAM, 60GB disk
- Vault integration enabled
- NATS-based remote deployment enabled
### Grafana with Kanidm OIDC (2026-02-08)
Grafana deployed on monitoring02 as a test instance (`grafana-test.home.2rjus.net`):
- Kanidm OIDC authentication (PKCE enabled)
- Role mapping: `admins` → Admin, others → Viewer
- Declarative datasources pointing to monitoring01 (Prometheus, Loki)
- Local Caddy for TLS termination via internal ACME CA
This validates the Grafana + OIDC pattern before the full VictoriaMetrics migration. The existing
`services/monitoring/grafana.nix` on monitoring01 can be replaced with the new `services/grafana/`
module once monitoring02 becomes the primary monitoring host.
## Open Questions ## Open Questions
- [ ] What disk size for monitoring02? 100GB should allow 3+ months with VictoriaMetrics compression - [ ] What disk size for monitoring02? Current 60GB may need expansion for 3+ months with VictoriaMetrics
- [ ] Which dashboards to recreate declaratively? (Review monitoring01 Grafana for current set) - [ ] Which dashboards to recreate declaratively? (Review monitoring01 Grafana for current set)
## VictoriaMetrics Service Configuration ## VictoriaMetrics Service Configuration

View File

@@ -0,0 +1,108 @@
# OpenBao + Kanidm OIDC Integration
## Overview
Enable Kanidm users to authenticate to OpenBao (Vault) using OIDC, allowing access to secrets based on Kanidm group membership.
## Current State
**Kanidm:**
- Server: `auth.home.2rjus.net` (kanidm01)
- Domain: `home.2rjus.net`
- Groups: `admins`, `users`, `ssh-users`
- No OIDC clients configured yet
**OpenBao:**
- Server: `vault.home.2rjus.net` (vault01)
- Auth: AppRole only (machine-to-machine)
- No human user authentication configured
## OpenBao OIDC Auth Method
OpenBao includes the JWT/OIDC auth method in the open-source version (unlike Vault Enterprise which gates some auth features). Key points:
- Enable with: `bao auth enable oidc`
- Supports browser-based OIDC login flow
- Maps OIDC claims/groups to OpenBao policies
- Works with both CLI (`bao login`) and Web UI
### Required Configuration
```bash
bao write auth/oidc/config \
oidc_discovery_url="https://auth.home.2rjus.net/oauth2/openid/<client_id>" \
oidc_client_id="<client_id>" \
oidc_client_secret="<client_secret>" \
default_role="default"
```
### Callback URIs
OpenBao requires specific callback URIs registered in Kanidm:
- **CLI:** `http://localhost:8250/oidc/callback`
- **Web UI:** `https://vault.home.2rjus.net:8200/ui/vault/auth/oidc/oidc/callback`
## Kanidm OAuth2 Configuration
Kanidm supports declarative OAuth2 client provisioning via NixOS:
```nix
services.kanidm.provision.systems.oauth2.openbao = {
displayName = "OpenBao Secrets";
# originUrl - where the client lives
# originLanding - where to redirect after auth
# basicSecretFile - client secret
# scopeMaps - which scopes groups can request
# claimMaps - custom claims based on group membership
};
```
The `basicSecretFile` should contain the client secret, fetched from Vault.
## Implementation Approach
### 1. Create OAuth2 Client in Kanidm
Add to `services/kanidm/default.nix`:
- OAuth2 client `openbao` with callback URIs
- Scope maps for `admins` and `users` groups
- Claim maps to expose group membership
### 2. Enable OIDC Auth in OpenBao
Options:
- **Terraform:** Add `vault_jwt_auth_backend` resource in `terraform/vault/`
- **NixOS:** Configure in vault01 host config
Terraform is probably cleaner since we already manage OpenBao config there.
### 3. Create OpenBao Roles
Map Kanidm groups to policies:
| Kanidm Group | OpenBao Role | Policy |
|--------------|--------------|--------|
| `admins` | `admin` | Full read access to secrets |
| `users` | `user` | Limited read access |
### 4. Chicken-and-Egg Problem
The OAuth2 client secret needs to be stored in OpenBao, but OpenBao needs the secret to configure OIDC auth. Solutions:
1. **Bootstrap manually:** Create initial secret via `bao` CLI
2. **Two-phase Terraform:** First create the secret, then configure OIDC
3. **Static secret:** Use a static secret for the OAuth2 client (less ideal)
## Open Questions
1. **Web UI access:** Do we want users logging into the OpenBao web UI, or just CLI?
2. **Policy granularity:** What secrets should `admins` vs `users` access?
3. **Token TTL:** How long should OIDC-issued tokens last?
## References
- [OpenBao JWT/OIDC Auth Method](https://openbao.org/docs/auth/jwt/)
- [OpenBao OIDC Provider Configuration](https://openbao.org/docs/auth/jwt/oidc-providers/)
- [Kanidm OAuth2 Documentation](https://kanidm.github.io/kanidm/stable/integrations/oauth2.html)
- [NixOS Kanidm OAuth2 Options](https://search.nixos.org/options?query=services.kanidm.provision.systems.oauth2)

View File

@@ -43,11 +43,21 @@ kanidm person posix set-password <username>
kanidm person posix set <username> --shell /bin/zsh kanidm person posix set <username> --shell /bin/zsh
``` ```
### Setting Email Address
Email is required for OAuth2/OIDC login (e.g., Grafana):
```bash
kanidm person update <username> --mail <email>
```
### Example: Full User Creation ### Example: Full User Creation
```bash ```bash
kanidm person create testuser "Test User" kanidm person create testuser "Test User"
kanidm person update testuser --mail testuser@home.2rjus.net
kanidm group add-members ssh-users testuser kanidm group add-members ssh-users testuser
kanidm group add-members users testuser # Required for OAuth2 scopes
kanidm person posix set testuser kanidm person posix set testuser
kanidm person posix set-password testuser kanidm person posix set-password testuser
kanidm person get testuser kanidm person get testuser
@@ -129,6 +139,40 @@ Kanidm auto-assigns UIDs/GIDs from its configured range. For manually assigned G
| 65,536+ | Users (auto-assigned) | | 65,536+ | Users (auto-assigned) |
| 68,000 - 68,999 | Groups (manually assigned) | | 68,000 - 68,999 | Groups (manually assigned) |
## OAuth2/OIDC Login (Web Services)
For OAuth2/OIDC login to web services like Grafana, users need:
1. **Primary credential** - Password set via `credential update` (separate from unix password)
2. **MFA** - TOTP or passkey (Kanidm requires MFA for primary credentials)
3. **Group membership** - Member of `users` group (for OAuth2 scope mapping)
4. **Email address** - Set via `person update --mail`
### Setting Up Primary Credential (Web Login)
The primary credential is different from the unix/POSIX password:
```bash
# Interactive credential setup
kanidm person credential update <username>
# In the interactive prompt:
# 1. Type 'password' to set a password
# 2. Type 'totp' to add TOTP (scan QR with authenticator app)
# 3. Type 'commit' to save
```
### Verifying OAuth2 Readiness
```bash
kanidm person get <username>
```
Check for:
- `mail:` - Email address set
- `memberof:` - Includes `users@home.2rjus.net`
- Primary credential status (check via `credential update``status`)
## PAM/NSS Client Configuration ## PAM/NSS Client Configuration
Enable central authentication on a host: Enable central authentication on a host:

View File

@@ -191,6 +191,15 @@
./hosts/kanidm01 ./hosts/kanidm01
]; ];
}; };
monitoring02 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self;
};
modules = commonModules ++ [
./hosts/monitoring02
];
};
}; };
packages = forAllSystems ( packages = forAllSystems (
{ pkgs }: { pkgs }:

View File

@@ -0,0 +1,75 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
# Host metadata (adjust as needed)
homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation
};
# DNS CNAME for Grafana test instance
homelab.dns.cnames = [ "grafana-test" ];
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "monitoring02";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.24/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,6 @@
{ ... }: {
imports = [
./configuration.nix
../../services/grafana
];
}

View File

@@ -58,10 +58,9 @@ let
}; };
# Build effective labels for a host # Build effective labels for a host
# Always includes hostname; only includes tier/priority/role if non-default # Always includes hostname and tier; only includes priority/role if non-default
buildEffectiveLabels = host: buildEffectiveLabels = host:
{ hostname = host.hostname; } { hostname = host.hostname; tier = host.tier; }
// (lib.optionalAttrs (host.tier != "prod") { tier = host.tier; })
// (lib.optionalAttrs (host.priority != "high") { priority = host.priority; }) // (lib.optionalAttrs (host.priority != "high") { priority = host.priority; })
// (lib.optionalAttrs (host.role != null) { role = host.role; }) // (lib.optionalAttrs (host.role != null) { role = host.role; })
// host.labels; // host.labels;

View File

@@ -0,0 +1,85 @@
{
"uid": "logs-homelab",
"title": "Logs - Homelab",
"tags": ["loki", "logs", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"templating": {
"list": [
{
"name": "host",
"type": "query",
"datasource": {"type": "loki", "uid": "loki"},
"query": "label_values(host)",
"refresh": 2,
"includeAll": true,
"multi": false,
"current": {"text": "All", "value": "$__all"}
},
{
"name": "job",
"type": "query",
"datasource": {"type": "loki", "uid": "loki"},
"query": "label_values(job)",
"refresh": 2,
"includeAll": true,
"multi": false,
"current": {"text": "All", "value": "$__all"}
},
{
"name": "search",
"type": "textbox",
"current": {"text": "", "value": ""},
"label": "Search"
}
]
},
"panels": [
{
"id": 1,
"title": "Log Volume",
"type": "timeseries",
"gridPos": {"h": 6, "w": 24, "x": 0, "y": 0},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum by (host) (count_over_time({host=~\"$host\", job=~\"$job\"} |~ \"$search\" [1m]))",
"legendFormat": "{{host}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short"
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"}
}
},
{
"id": 2,
"title": "Logs",
"type": "logs",
"gridPos": {"h": 18, "w": 24, "x": 0, "y": 6},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "{host=~\"$host\", job=~\"$job\"} |~ \"$search\"",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"sortOrder": "Descending"
}
}
]
}

View File

@@ -0,0 +1,564 @@
{
"uid": "nixos-fleet-homelab",
"title": "NixOS Fleet - Homelab",
"tags": ["nixos", "fleet", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "1m",
"time": {
"from": "now-7d",
"to": "now"
},
"templating": {
"list": [
{
"name": "tier",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(nixos_flake_info, tier)",
"refresh": 2,
"includeAll": true,
"multi": false,
"current": {"text": "All", "value": "$__all"}
}
]
},
"panels": [
{
"id": 1,
"title": "Hosts Behind Remote",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 1)",
"legendFormat": "Behind",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 5}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none",
"textMode": "auto"
},
"description": "Number of hosts where current revision differs from remote master"
},
{
"id": 2,
"title": "Hosts Needing Reboot",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_config_mismatch{tier=~\"$tier\"} == 1)",
"legendFormat": "Need Reboot",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "orange", "value": 3},
{"color": "red", "value": 5}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Hosts where booted generation differs from current (switched but not rebooted)"
},
{
"id": 3,
"title": "Total Hosts",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_flake_info{tier=~\"$tier\"})",
"legendFormat": "Hosts",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 4,
"title": "Nixpkgs Age",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "max(nixos_flake_input_age_seconds{input=\"nixpkgs\", tier=~\"$tier\"})",
"legendFormat": "Nixpkgs",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 604800},
{"color": "orange", "value": 1209600},
{"color": "red", "value": 2592000}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Age of nixpkgs flake input (yellow >7d, orange >14d, red >30d)"
},
{
"id": 5,
"title": "Hosts Up-to-date",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 0)",
"legendFormat": "Up-to-date",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 6,
"title": "Fleet Status",
"type": "table",
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "nixos_flake_info{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "info"
},
{
"expr": "nixos_flake_revision_behind{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "behind"
},
{
"expr": "nixos_config_mismatch{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "mismatch"
},
{
"expr": "nixos_generation_age_seconds{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "age"
},
{
"expr": "nixos_generation_count{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "count"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Hostname"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Current Rev"},
"properties": [{"id": "custom.width", "value": 90}]
},
{
"matcher": {"id": "byName", "options": "Remote Rev"},
"properties": [{"id": "custom.width", "value": 90}]
},
{
"matcher": {"id": "byName", "options": "Behind"},
"properties": [
{"id": "custom.width", "value": 70},
{"id": "mappings", "value": [
{"type": "value", "options": {"0": {"text": "No", "color": "green"}}},
{"type": "value", "options": {"1": {"text": "Yes", "color": "red"}}}
]},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
},
{
"matcher": {"id": "byName", "options": "Need Reboot"},
"properties": [
{"id": "custom.width", "value": 100},
{"id": "mappings", "value": [
{"type": "value", "options": {"0": {"text": "No", "color": "green"}}},
{"type": "value", "options": {"1": {"text": "Yes", "color": "orange"}}}
]},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
},
{
"matcher": {"id": "byName", "options": "Config Age"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 100}
]
},
{
"matcher": {"id": "byName", "options": "Generations"},
"properties": [{"id": "custom.width", "value": 100}]
},
{
"matcher": {"id": "byName", "options": "Tier"},
"properties": [{"id": "custom.width", "value": 60}]
},
{
"matcher": {"id": "byName", "options": "Role"},
"properties": [{"id": "custom.width", "value": 80}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Hostname", "desc": false}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "hostname", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"Time 2": true,
"Time 3": true,
"Time 4": true,
"Time 5": true,
"Value #info": true,
"__name__": true,
"__name__ 1": true,
"__name__ 2": true,
"__name__ 3": true,
"__name__ 4": true,
"__name__ 5": true,
"dns_role": true,
"dns_role 1": true,
"dns_role 2": true,
"dns_role 3": true,
"dns_role 4": true,
"instance": true,
"instance 1": true,
"instance 2": true,
"instance 3": true,
"instance 4": true,
"job": true,
"job 1": true,
"job 2": true,
"job 3": true,
"job 4": true,
"nixos_version": true,
"nixpkgs_rev": true,
"role 1": true,
"role 2": true,
"role 3": true,
"role 4": true,
"tier 1": true,
"tier 2": true,
"tier 3": true,
"tier 4": true
},
"indexByName": {
"hostname": 0,
"tier": 1,
"role": 2,
"current_rev": 3,
"remote_rev": 4,
"Value #behind": 5,
"Value #mismatch": 6,
"Value #age": 7,
"Value #count": 8
},
"renameByName": {
"hostname": "Hostname",
"tier": "Tier",
"role": "Role",
"current_rev": "Current Rev",
"remote_rev": "Remote Rev",
"Value #behind": "Behind",
"Value #mismatch": "Need Reboot",
"Value #age": "Config Age",
"Value #count": "Generations"
}
}
}
]
},
{
"id": 7,
"title": "Generation Age by Host",
"type": "bargauge",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sort_desc(nixos_generation_age_seconds{tier=~\"$tier\"})",
"legendFormat": "{{hostname}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 86400},
{"color": "orange", "value": 259200},
{"color": "red", "value": 604800}
]
},
"min": 0
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
},
"description": "How long ago each host's current config was deployed (yellow >1d, orange >3d, red >7d)"
},
{
"id": 8,
"title": "Generations per Host",
"type": "bargauge",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sort_desc(nixos_generation_count{tier=~\"$tier\"})",
"legendFormat": "{{hostname}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "purple", "value": 50}
]
},
"min": 0
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
},
"description": "Total number of NixOS generations on each host"
},
{
"id": 9,
"title": "Deployment Activity (Generation Age Over Time)",
"type": "timeseries",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 22},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "nixos_generation_age_seconds{tier=~\"$tier\"}",
"legendFormat": "{{hostname}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"custom": {
"lineWidth": 1,
"fillOpacity": 0,
"showPoints": "never",
"stacking": {"mode": "none"}
}
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "Generation age increases over time, drops to near-zero when deployed. Useful to see deployment patterns."
},
{
"id": 10,
"title": "Flake Input Ages",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "max by (input) (nixos_flake_input_age_seconds)",
"format": "table",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s"
},
"overrides": [
{
"matcher": {"id": "byName", "options": "input"},
"properties": [{"id": "custom.width", "value": 150}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Value", "desc": true}]
},
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {"Time": true},
"renameByName": {
"input": "Flake Input",
"Value": "Age"
}
}
}
],
"description": "Age of each flake input across the fleet"
},
{
"id": 11,
"title": "Hosts by Revision",
"type": "piechart",
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count by (current_rev) (nixos_flake_info{tier=~\"$tier\"})",
"legendFormat": "{{current_rev}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"legend": {"displayMode": "table", "placement": "right", "values": ["value"]},
"pieType": "pie"
},
"description": "Distribution of hosts by their current flake revision"
},
{
"id": 12,
"title": "Hosts by Tier",
"type": "piechart",
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count by (tier) (nixos_flake_info)",
"legendFormat": "{{tier}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"legend": {"displayMode": "table", "placement": "right", "values": ["value"]},
"pieType": "pie"
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "^$",
"renamePattern": "prod"
}
}
],
"description": "Distribution of hosts by tier (test vs prod)"
}
]
}

View File

@@ -0,0 +1,208 @@
{
"uid": "node-exporter-homelab",
"title": "Node Exporter - Homelab",
"tags": ["node-exporter", "prometheus", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"templating": {
"list": [
{
"name": "instance",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(node_uname_info, instance)",
"refresh": 2,
"includeAll": false,
"multi": false,
"current": {}
}
]
},
"panels": [
{
"id": 1,
"title": "CPU Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\", instance=~\"$instance\"}[5m])) * 100)",
"legendFormat": "CPU %",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
}
},
{
"id": 2,
"title": "Memory Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "(1 - (node_memory_MemAvailable_bytes{instance=~\"$instance\"} / node_memory_MemTotal_bytes{instance=~\"$instance\"})) * 100",
"legendFormat": "Memory %",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
}
},
{
"id": 3,
"title": "Disk Usage",
"type": "gauge",
"gridPos": {"h": 8, "w": 8, "x": 0, "y": 8},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "100 - ((node_filesystem_avail_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"} / node_filesystem_size_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"}) * 100)",
"legendFormat": "Root /",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 85}
]
}
}
}
},
{
"id": 4,
"title": "System Load",
"type": "timeseries",
"gridPos": {"h": 8, "w": 8, "x": 8, "y": 8},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "node_load1{instance=~\"$instance\"}",
"legendFormat": "1m",
"refId": "A"
},
{
"expr": "node_load5{instance=~\"$instance\"}",
"legendFormat": "5m",
"refId": "B"
},
{
"expr": "node_load15{instance=~\"$instance\"}",
"legendFormat": "15m",
"refId": "C"
}
],
"fieldConfig": {
"defaults": {
"unit": "short"
}
}
},
{
"id": 5,
"title": "Uptime",
"type": "stat",
"gridPos": {"h": 8, "w": 8, "x": 16, "y": 8},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "time() - node_boot_time_seconds{instance=~\"$instance\"}",
"legendFormat": "Uptime",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s"
}
}
},
{
"id": 6,
"title": "Network Traffic",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{instance=~\"$instance\",device!~\"lo|veth.*|br.*|docker.*\"}[5m])",
"legendFormat": "Receive {{device}}",
"refId": "A"
},
{
"expr": "-rate(node_network_transmit_bytes_total{instance=~\"$instance\",device!~\"lo|veth.*|br.*|docker.*\"}[5m])",
"legendFormat": "Transmit {{device}}",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps"
}
}
},
{
"id": 7,
"title": "Disk I/O",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(node_disk_read_bytes_total{instance=~\"$instance\",device!~\"dm-.*\"}[5m])",
"legendFormat": "Read {{device}}",
"refId": "A"
},
{
"expr": "-rate(node_disk_written_bytes_total{instance=~\"$instance\",device!~\"dm-.*\"}[5m])",
"legendFormat": "Write {{device}}",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps"
}
}
}
]
}

View File

@@ -0,0 +1,606 @@
{
"uid": "proxmox-homelab",
"title": "Proxmox - Homelab",
"tags": ["proxmox", "virtualization", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"time": {
"from": "now-6h",
"to": "now"
},
"templating": {
"list": [
{
"name": "vm",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(pve_guest_info{template=\"0\"}, name)",
"refresh": 2,
"includeAll": true,
"multi": true,
"current": {"text": "All", "value": "$__all"}
}
]
},
"panels": [
{
"id": 1,
"title": "VMs Running",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 1)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 2,
"title": "VMs Stopped",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 0)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 3}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 3,
"title": "Node CPU",
"type": "gauge",
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_cpu_usage_ratio{id=~\"node/.*\"} * 100",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"showThresholdLabels": false,
"showThresholdMarkers": true
}
},
{
"id": 4,
"title": "Node Memory",
"type": "gauge",
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_memory_usage_bytes{id=~\"node/.*\"} / pve_memory_size_bytes{id=~\"node/.*\"} * 100",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"showThresholdLabels": false,
"showThresholdMarkers": true
}
},
{
"id": 5,
"title": "Node Uptime",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_uptime_seconds{id=~\"node/.*\"}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 6,
"title": "Templates",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(pve_guest_info{template=\"1\"})",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "purple", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 7,
"title": "VM Status",
"type": "table",
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_guest_info{template=\"0\", name=~\"$vm\"}",
"format": "table",
"instant": true,
"refId": "info"
},
{
"expr": "pve_up{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"format": "table",
"instant": true,
"refId": "status"
},
{
"expr": "pve_cpu_usage_ratio{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} * 100",
"format": "table",
"instant": true,
"refId": "cpu"
},
{
"expr": "pve_memory_usage_bytes{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} / on(id) pve_memory_size_bytes * 100",
"format": "table",
"instant": true,
"refId": "mem"
},
{
"expr": "pve_uptime_seconds{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"format": "table",
"instant": true,
"refId": "uptime"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Name"},
"properties": [{"id": "custom.width", "value": 150}]
},
{
"matcher": {"id": "byName", "options": "Status"},
"properties": [
{"id": "custom.width", "value": 80},
{"id": "mappings", "value": [
{"type": "value", "options": {"0": {"text": "Stopped", "color": "red"}}},
{"type": "value", "options": {"1": {"text": "Running", "color": "green"}}}
]},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
},
{
"matcher": {"id": "byName", "options": "CPU %"},
"properties": [
{"id": "unit", "value": "percent"},
{"id": "decimals", "value": 1},
{"id": "custom.width", "value": 80},
{"id": "custom.cellOptions", "value": {"type": "gauge", "mode": "basic"}},
{"id": "min", "value": 0},
{"id": "max", "value": 100},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 50}, {"color": "red", "value": 80}]}}
]
},
{
"matcher": {"id": "byName", "options": "Memory %"},
"properties": [
{"id": "unit", "value": "percent"},
{"id": "decimals", "value": 1},
{"id": "custom.width", "value": 100},
{"id": "custom.cellOptions", "value": {"type": "gauge", "mode": "basic"}},
{"id": "min", "value": 0},
{"id": "max", "value": 100},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 70}, {"color": "red", "value": 90}]}}
]
},
{
"matcher": {"id": "byName", "options": "Uptime"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 100}
]
},
{
"matcher": {"id": "byName", "options": "ID"},
"properties": [{"id": "custom.width", "value": 90}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Name", "desc": false}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "name", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"Time 2": true,
"Time 3": true,
"Time 4": true,
"Value #info": true,
"__name__": true,
"id 1": true,
"id 2": true,
"id 3": true,
"id 4": true,
"instance": true,
"instance 1": true,
"instance 2": true,
"instance 3": true,
"instance 4": true,
"job": true,
"job 1": true,
"job 2": true,
"job 3": true,
"job 4": true,
"name 1": true,
"name 2": true,
"name 3": true,
"name 4": true,
"node": true,
"tags": true,
"template": true,
"type": true
},
"indexByName": {
"name": 0,
"id": 1,
"Value #status": 2,
"Value #cpu": 3,
"Value #mem": 4,
"Value #uptime": 5
},
"renameByName": {
"name": "Name",
"id": "ID",
"Value #status": "Status",
"Value #cpu": "CPU %",
"Value #mem": "Memory %",
"Value #uptime": "Uptime"
}
}
}
]
},
{
"id": 8,
"title": "VM CPU Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_cpu_usage_ratio{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} * 100",
"legendFormat": "{{name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 9,
"title": "VM Memory Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_memory_usage_bytes{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "bytes",
"min": 0,
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 10,
"title": "VM Network Traffic",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(pve_network_receive_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} RX",
"refId": "A"
},
{
"expr": "-rate(pve_network_transmit_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} TX",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps",
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 11,
"title": "VM Disk I/O",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(pve_disk_read_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} Read",
"refId": "A"
},
{
"expr": "-rate(pve_disk_write_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} Write",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps",
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 12,
"title": "Storage Usage",
"type": "bargauge",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_disk_usage_bytes{id=~\"storage/.*\"} / pve_disk_size_bytes{id=~\"storage/.*\"} * 100",
"legendFormat": "{{id}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 85}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "storage/pve1/(.*)",
"renamePattern": "$1"
}
}
]
},
{
"id": 13,
"title": "Storage Capacity",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_disk_size_bytes{id=~\"storage/.*\"}",
"format": "table",
"instant": true,
"refId": "size"
},
{
"expr": "pve_disk_usage_bytes{id=~\"storage/.*\"}",
"format": "table",
"instant": true,
"refId": "used"
},
{
"expr": "pve_disk_size_bytes{id=~\"storage/.*\"} - pve_disk_usage_bytes{id=~\"storage/.*\"}",
"format": "table",
"instant": true,
"refId": "free"
}
],
"fieldConfig": {
"defaults": {
"unit": "bytes"
},
"overrides": [
{
"matcher": {"id": "byName", "options": "Storage"},
"properties": [{"id": "unit", "value": "none"}]
}
]
},
"options": {
"showHeader": true
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "id", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"Time 2": true,
"instance": true,
"instance 1": true,
"instance 2": true,
"job": true,
"job 1": true,
"job 2": true
},
"renameByName": {
"id": "Storage",
"Value #size": "Total",
"Value #used": "Used",
"Value #free": "Free"
}
}
},
{
"id": "renameByRegex",
"options": {
"regex": "storage/pve1/(.*)",
"renamePattern": "$1"
}
}
]
}
]
}

View File

@@ -0,0 +1,553 @@
{
"uid": "systemd-homelab",
"title": "Systemd Services - Homelab",
"tags": ["systemd", "services", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "1m",
"time": {
"from": "now-24h",
"to": "now"
},
"templating": {
"list": [
{
"name": "hostname",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(systemd_unit_state, hostname)",
"refresh": 2,
"includeAll": true,
"multi": true,
"current": {"text": "All", "value": "$__all"}
}
]
},
"panels": [
{
"id": 1,
"title": "Failed Units",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1) or vector(0)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "red", "value": 1}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 2,
"title": "Active Units",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 3,
"title": "Hosts Monitored",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(count by (hostname) (systemd_unit_state{hostname=~\"$hostname\"}))",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 4,
"title": "Total Service Restarts",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sum(systemd_service_restart_total{hostname=~\"$hostname\"})",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 10},
{"color": "orange", "value": 50}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 5,
"title": "Inactive Units",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_unit_state{state=\"inactive\", hostname=~\"$hostname\"} == 1)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "purple", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 6,
"title": "Timers",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_timer_last_trigger_seconds{hostname=~\"$hostname\"})",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 7,
"title": "Failed Units",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1",
"format": "table",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Unit"},
"properties": [{"id": "custom.width", "value": 300}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Host", "desc": false}]
},
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Value": true,
"__name__": true,
"dns_role": true,
"instance": true,
"job": true,
"role": true,
"state": true,
"tier": true,
"type": true
},
"renameByName": {
"hostname": "Host",
"name": "Unit"
}
}
}
],
"description": "Units currently in failed state"
},
{
"id": 8,
"title": "Service Restarts (Top 15)",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "topk(15, systemd_service_restart_total{hostname=~\"$hostname\"} > 0)",
"format": "table",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Service"},
"properties": [{"id": "custom.width", "value": 280}]
},
{
"matcher": {"id": "byName", "options": "Restarts"},
"properties": [{"id": "custom.width", "value": 80}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Restarts", "desc": true}]
},
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"__name__": true,
"dns_role": true,
"instance": true,
"job": true,
"role": true,
"tier": true
},
"renameByName": {
"hostname": "Host",
"name": "Service",
"Value": "Restarts"
}
}
}
],
"description": "Services that have been restarted (since host boot)"
},
{
"id": 9,
"title": "Active Units per Host",
"type": "bargauge",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 10},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sort_desc(count by (hostname) (systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1))",
"legendFormat": "{{hostname}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
},
"min": 0
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
}
},
{
"id": 10,
"title": "NixOS Upgrade Timers",
"type": "table",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 10},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "systemd_timer_last_trigger_seconds{name=\"nixos-upgrade.timer\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "last"
},
{
"expr": "time() - systemd_timer_last_trigger_seconds{name=\"nixos-upgrade.timer\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "ago"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 130}]
},
{
"matcher": {"id": "byName", "options": "Last Trigger"},
"properties": [
{"id": "unit", "value": "dateTimeAsLocalNoDateIfToday"},
{"id": "custom.width", "value": 180}
]
},
{
"matcher": {"id": "byName", "options": "Time Ago"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 120},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 86400}, {"color": "red", "value": 172800}]}},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Time Ago", "desc": true}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "hostname", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"__name__": true,
"__name__ 1": true,
"dns_role": true,
"dns_role 1": true,
"instance": true,
"instance 1": true,
"job": true,
"job 1": true,
"name": true,
"name 1": true,
"role": true,
"role 1": true,
"tier": true,
"tier 1": true
},
"indexByName": {
"hostname": 0,
"Value #last": 1,
"Value #ago": 2
},
"renameByName": {
"hostname": "Host",
"Value #last": "Last Trigger",
"Value #ago": "Time Ago"
}
}
}
],
"description": "When nixos-upgrade.timer last ran on each host. Yellow >24h, Red >48h."
},
{
"id": 11,
"title": "Backup Timers",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 18},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "systemd_timer_last_trigger_seconds{name=~\"restic.*\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "last"
},
{
"expr": "time() - systemd_timer_last_trigger_seconds{name=~\"restic.*\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "ago"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Timer"},
"properties": [{"id": "custom.width", "value": 220}]
},
{
"matcher": {"id": "byName", "options": "Last Trigger"},
"properties": [
{"id": "unit", "value": "dateTimeAsLocalNoDateIfToday"},
{"id": "custom.width", "value": 180}
]
},
{
"matcher": {"id": "byName", "options": "Time Ago"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 100},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 86400}, {"color": "red", "value": 172800}]}},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Time Ago", "desc": true}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "name", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"__name__": true,
"__name__ 1": true,
"dns_role": true,
"dns_role 1": true,
"instance": true,
"instance 1": true,
"job": true,
"job 1": true,
"role": true,
"role 1": true,
"tier": true,
"tier 1": true,
"hostname 1": true
},
"indexByName": {
"hostname": 0,
"name": 1,
"Value #last": 2,
"Value #ago": 3
},
"renameByName": {
"hostname": "Host",
"name": "Timer",
"Value #last": "Last Trigger",
"Value #ago": "Time Ago"
}
}
}
],
"description": "Restic backup timers"
},
{
"id": 12,
"title": "Service Restarts Over Time",
"type": "timeseries",
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 18},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sum by (hostname) (increase(systemd_service_restart_total{hostname=~\"$hostname\"}[1h]))",
"legendFormat": "{{hostname}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"custom": {
"lineWidth": 1,
"fillOpacity": 20,
"showPoints": "never",
"stacking": {"mode": "normal"}
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "Service restart rate per hour"
}
]
}

View File

@@ -0,0 +1,399 @@
{
"uid": "temperature-homelab",
"title": "Temperature - Homelab",
"tags": ["home-assistant", "temperature", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "1m",
"time": {
"from": "now-30d",
"to": "now"
},
"templating": {
"list": []
},
"panels": [
{
"id": 1,
"title": "Current Temperatures",
"type": "stat",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "green", "value": 18},
{"color": "yellow", "value": 24},
{"color": "orange", "value": 27},
{"color": "red", "value": 30}
]
},
"mappings": []
},
"overrides": []
},
"options": {
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto"
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
}
]
},
{
"id": 2,
"title": "Average Home Temperature",
"type": "gauge",
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "avg(hass_sensor_temperature_celsius{entity!~\".*device_temperature|.*server.*\"})",
"legendFormat": "Average",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"min": 15,
"max": 30,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "green", "value": 18},
{"color": "yellow", "value": 24},
{"color": "red", "value": 28}
]
}
}
},
"options": {
"reduceOptions": {
"calcs": ["lastNotNull"]
},
"showThresholdLabels": false,
"showThresholdMarkers": true
}
},
{
"id": 3,
"title": "Current Humidity",
"type": "stat",
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_humidity_percent{entity!~\".*server.*\"}",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "red", "value": null},
{"color": "yellow", "value": 30},
{"color": "green", "value": 40},
{"color": "yellow", "value": 60},
{"color": "red", "value": 70}
]
}
}
},
"options": {
"reduceOptions": {
"calcs": ["lastNotNull"]
},
"orientation": "horizontal",
"colorMode": "value",
"graphMode": "none"
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Humidity",
"renamePattern": "$1"
}
}
]
},
{
"id": 4,
"title": "Temperature History (30 Days)",
"type": "timeseries",
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 6},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"pointSize": 5,
"showPoints": "never",
"spanNulls": 3600000
}
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom",
"calcs": ["mean", "min", "max"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "temp_server Temperature",
"renamePattern": "Server"
}
}
]
},
{
"id": 5,
"title": "Temperature Trend (1h rate of change)",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "deriv(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[1h]) * 3600",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"custom": {
"lineWidth": 1,
"fillOpacity": 20,
"showPoints": "never",
"spanNulls": 3600000
},
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "green", "value": -0.5},
{"color": "green", "value": 0.5},
{"color": "red", "value": 1}
]
},
"displayName": "${__field.labels.friendly_name}"
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "multi"
}
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "temp_server Temperature",
"renamePattern": "Server"
}
}
],
"description": "Rate of temperature change per hour. Positive = warming, Negative = cooling."
},
{
"id": 6,
"title": "24h Min / Max / Avg",
"type": "table",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "min_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
"legendFormat": "{{friendly_name}}",
"refId": "min",
"instant": true
},
{
"expr": "max_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
"legendFormat": "{{friendly_name}}",
"refId": "max",
"instant": true
},
{
"expr": "avg_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
"legendFormat": "{{friendly_name}}",
"refId": "avg",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"decimals": 1
},
"overrides": [
{
"matcher": {"id": "byName", "options": "Room"},
"properties": [{"id": "custom.width", "value": 150}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Room", "desc": false}]
},
"transformations": [
{
"id": "joinByField",
"options": {
"byField": "friendly_name",
"mode": "outer"
}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"domain": true,
"entity": true,
"hostname": true,
"instance": true,
"job": true
},
"renameByName": {
"friendly_name": "Room",
"Value #min": "Min (24h)",
"Value #max": "Max (24h)",
"Value #avg": "Avg (24h)"
}
}
},
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
}
]
},
{
"id": 7,
"title": "Humidity History (30 Days)",
"type": "timeseries",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 24},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_humidity_percent",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never",
"spanNulls": 3600000
}
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom",
"calcs": ["mean", "min", "max"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Humidity",
"renamePattern": "$1"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "temp_server Humidity",
"renamePattern": "Server"
}
}
]
}
]
}

View File

@@ -0,0 +1,111 @@
{ config, pkgs, ... }:
{
services.grafana = {
enable = true;
settings = {
server = {
http_addr = "127.0.0.1";
http_port = 3000;
domain = "grafana-test.home.2rjus.net";
root_url = "https://grafana-test.home.2rjus.net/";
};
# Disable anonymous access
"auth.anonymous".enabled = false;
# OIDC authentication via Kanidm
"auth.generic_oauth" = {
enabled = true;
name = "Kanidm";
client_id = "grafana";
client_secret = "$__file{/run/secrets/grafana-oauth2}";
auth_url = "https://auth.home.2rjus.net/ui/oauth2";
token_url = "https://auth.home.2rjus.net/oauth2/token";
api_url = "https://auth.home.2rjus.net/oauth2/openid/grafana/userinfo";
scopes = "openid profile email groups";
use_pkce = true; # Required by Kanidm, more secure
# Extract user attributes from userinfo response
email_attribute_path = "email";
login_attribute_path = "preferred_username";
name_attribute_path = "name";
# Map admins group to Admin role, everyone else to Editor (for Explore access)
role_attribute_path = "contains(groups[*], 'admins') && 'Admin' || 'Editor'";
allow_sign_up = true;
};
};
# Declarative datasources pointing to monitoring01
provision.datasources.settings = {
apiVersion = 1;
datasources = [
{
name = "Prometheus";
type = "prometheus";
url = "http://monitoring01.home.2rjus.net:9090";
isDefault = true;
uid = "prometheus";
}
{
name = "Loki";
type = "loki";
url = "http://monitoring01.home.2rjus.net:3100";
uid = "loki";
}
];
};
# Declarative dashboards
provision.dashboards.settings = {
apiVersion = 1;
providers = [
{
name = "homelab";
type = "file";
options.path = ./dashboards;
disableDeletion = true;
}
];
};
};
# Vault secret for OAuth2 client secret
vault.secrets.grafana-oauth2 = {
secretPath = "services/grafana/oauth2-client-secret";
extractKey = "password";
services = [ "grafana" ];
owner = "grafana";
group = "grafana";
};
# Local Caddy for TLS termination
services.caddy = {
enable = true;
package = pkgs.unstable.caddy;
configFile = pkgs.writeText "Caddyfile" ''
{
acme_ca https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
metrics
}
grafana-test.home.2rjus.net {
log {
output file /var/log/caddy/grafana.log {
mode 644
}
}
reverse_proxy http://127.0.0.1:3000
}
http://${config.networking.hostName}.home.2rjus.net/metrics {
metrics
}
'';
};
# Expose Caddy metrics for Prometheus
homelab.monitoring.scrapeTargets = [{
job_name = "caddy";
port = 80;
}];
}

View File

@@ -30,6 +30,16 @@
}; };
# Regular users (persons) are managed imperatively via kanidm CLI # Regular users (persons) are managed imperatively via kanidm CLI
# OAuth2/OIDC clients for service authentication
systems.oauth2.grafana = {
displayName = "Grafana";
originUrl = "https://grafana-test.home.2rjus.net/login/generic_oauth";
originLanding = "https://grafana-test.home.2rjus.net/";
basicSecretFile = config.vault.secrets.grafana-oauth2.outputDir;
preferShortUsername = true;
scopeMaps.users = [ "openid" "profile" "email" "groups" ];
};
}; };
}; };
@@ -53,6 +63,15 @@
group = "kanidm"; group = "kanidm";
}; };
# Vault secret for Grafana OAuth2 client secret
vault.secrets.grafana-oauth2 = {
secretPath = "services/grafana/oauth2-client-secret";
extractKey = "password";
services = [ "kanidm" ];
owner = "kanidm";
group = "kanidm";
};
# Note: Kanidm does not expose Prometheus metrics # Note: Kanidm does not expose Prometheus metrics
# If metrics support is added in the future, uncomment: # If metrics support is added in the future, uncomment:
# homelab.monitoring.scrapeTargets = [ # homelab.monitoring.scrapeTargets = [

View File

@@ -89,6 +89,23 @@ locals {
] ]
} }
# kanidm01: Kanidm identity provider
"kanidm01" = {
paths = [
"secret/data/hosts/kanidm01/*",
"secret/data/kanidm/*",
"secret/data/services/grafana/*",
]
}
# monitoring02: Grafana test instance
"monitoring02" = {
paths = [
"secret/data/hosts/monitoring02/*",
"secret/data/services/grafana/*",
]
}
} }
} }

View File

@@ -39,6 +39,11 @@ locals {
"secret/data/kanidm/*", "secret/data/kanidm/*",
] ]
} }
"monitoring02" = {
paths = [
"secret/data/hosts/monitoring02/*",
]
}
} }

View File

@@ -108,6 +108,12 @@ locals {
auto_generate = true auto_generate = true
password_length = 32 password_length = 32
} }
# Grafana OAuth2 client secret (for Kanidm OIDC)
"services/grafana/oauth2-client-secret" = {
auto_generate = true
password_length = 64
}
} }
} }

View File

@@ -79,6 +79,13 @@ locals {
disk_size = "20G" disk_size = "20G"
vault_wrapped_token = "s.OOqjEECeIV7dNgCS6jNmyY3K" vault_wrapped_token = "s.OOqjEECeIV7dNgCS6jNmyY3K"
} }
"monitoring02" = {
ip = "10.69.13.24/24"
cpu_cores = 4
memory = 4096
disk_size = "60G"
vault_wrapped_token = "s.uXpdoGxHXpWvTsGbHkZuq1jF"
}
} }
# Compute VM configurations with defaults applied # Compute VM configurations with defaults applied