37 Commits

Author SHA1 Message Date
7ff3d2a09b docs: move openbao-kanidm-oidc plan to completed
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m7s
2026-02-09 19:44:06 +01:00
e85f15b73d vault: add OpenBao OIDC integration with Kanidm
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m9s
Enable Kanidm users to authenticate to OpenBao via OIDC for Web UI access.
Members of the admins group get full read/write access to secrets.

Changes:
- Add OIDC auth backend in Terraform (oidc.tf)
- Add oidc-admin and oidc-default policies
- Add openbao OAuth2 client to Kanidm
- Enable legacy crypto (RS256) for OpenBao compatibility
- Allow imperative group membership management in Kanidm

Limitations:
- CLI login not supported (Kanidm requires HTTPS for confidential client redirects)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 19:42:26 +01:00
2f5a2a4bf1 grafana: use instant queries for fleet dashboard stat panels
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m6s
Prevents stat panels from being affected by dashboard time range selection.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 19:00:33 +01:00
287141c623 hosts: add role metadata to all hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 13m51s
Assign roles to hosts for better organization and filtering:
- ha1: home-automation
- monitoring01, monitoring02: monitoring
- jelly01: media
- nats1: messaging
- http-proxy: proxy
- testvm01-03: test

Also promote kanidm01 and monitoring02 from test to prod tier.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 16:21:08 +01:00
9ed11b712f home-assistant: fix Jinja2 battery template syntax
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m13s
The template used | min(100) | max(0) which is invalid Jinja2 syntax.
These filters expect iterables (lists), not scalar arguments. This
caused TypeError warnings on every MQTT message and left battery
sensors unavailable.

Fixed by using proper list-based min/max:
  [[[value, 100] | min, 0] | max

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 16:12:59 +01:00
ffad2dd205 monitoring: increase zigbee_sensor_stale threshold to 4 hours
The 2-hour threshold was too aggressive for temperature sensors in
stable environments. Historical data shows gaps up to 2.75 hours when
temperature hasn't changed (Home Assistant only updates last_updated
when values change). Increasing to 4 hours avoids false positives
while still catching genuine failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 16:10:54 +01:00
ed7d2aa727 grafana: add deployment metrics to nixos-fleet dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 15:58:28 +01:00
bf7a025364 flake: update homelab-deploy input
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m49s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 15:45:30 +01:00
4ae99dbc89 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/e576e3c9cf9bad747afcddd9e34f51d18c855b4e?narHash=sha256-tlFqNG/uzz2%2B%2BaAmn4v8J0vAkV3z7XngeIIB3rM3650%3D' (2026-02-03)
  → 'github:nixos/nixpkgs/23d72dabcb3b12469f57b37170fcbc1789bd7457?narHash=sha256-z5NJPSBwsLf/OfD8WTmh79tlSU8XgIbwmk6qB1/TFzY%3D' (2026-02-07)
• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/00c21e4c93d963c50d4c0c89bfa84ed6e0694df2?narHash=sha256-AYqlWrX09%2BHvGs8zM6ebZ1pwUqjkfpnv8mewYwAo%2BiM%3D' (2026-02-04)
  → 'github:nixos/nixpkgs/d6c71932130818840fc8fe9509cf50be8c64634f?narHash=sha256-ub1gpAONMFsT/GU2hV6ZWJjur8rJ6kKxdm9IlCT0j84%3D' (2026-02-08)
2026-02-09 00:01:58 +00:00
5c142b1323 flake: update homelab-deploy input
Some checks failed
Run nix flake check / flake-check (push) Failing after 10m7s
Periodic flake update / flake-update (push) Successful in 2m51s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 00:42:51 +01:00
4091e51f41 nixos-exporter: use nkeySeedFile option
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m26s
Use the new nkeySeedFile option instead of credentialsFile for NATS
authentication.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 00:34:22 +01:00
a8e558a6b7 flake: update nixos-exporter input
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 00:32:56 +01:00
4efc798c38 nixos-exporter: fix nkey file permissions
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m6s
Set owner/group to nixos-exporter so the service can read the
NATS credentials file.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 00:18:10 +01:00
016f8c9119 terraform: add nixos-exporter shared policy
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
- Create shared policy granting all hosts access to nixos-exporter nkey
- Add policy to both manual and generated host AppRoles
- Remove duplicate kanidm01/monitoring02 entries from hosts-generated.tf

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 00:04:17 +01:00
fec2a261ab Merge pull request 'nixos-exporter: enable NATS cache sharing' (#38) from nixos-exporter-nats-cache into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m18s
Reviewed-on: #38
2026-02-08 22:58:24 +00:00
60c04a2052 nixos-exporter: enable NATS cache sharing
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m17s
Run nix flake check / flake-check (push) Failing after 5m16s
When one host fetches the latest flake revision, it publishes to NATS
and all other hosts receive the update immediately. This reduces
redundant nix flake metadata calls across the fleet.

- Add nkeys to devshell for key generation
- Add nixos-exporter user to NATS HOMELAB account
- Add Vault secret for NKey storage
- Configure all hosts to use NATS for revision sharing
- Update nixos-exporter input to version with NATS support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 23:57:28 +01:00
39e3f37263 flake: update homelab-deploy input
Some checks failed
Run nix flake check / flake-check (push) Failing after 15m17s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 22:49:44 +01:00
a2d93baba8 Merge pull request 'grafana: add NixOS operations dashboard' (#37) from grafana-nixos-operations-dashboard into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m54s
Reviewed-on: #37
2026-02-08 21:04:19 +00:00
f66dfc753c grafana: add NixOS operations dashboard
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m24s
Run nix flake check / flake-check (pull_request) Successful in 4m5s
Loki-based dashboard for tracking NixOS operations including:
- Upgrade activity and success/failure stats
- Build activity during upgrades
- Bootstrap logs for new VM deployments
- ACME certificate renewal activity

Log panels use LogQL json parsing with | keep host to show
clean messages with host labels.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 22:03:28 +01:00
79a6a72719 Merge pull request 'grafana-dashboards-permissions' (#36) from grafana-dashboards-permissions into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
Reviewed-on: #36
2026-02-08 20:18:22 +00:00
89d0a6f358 grafana: add systemd services dashboard
Some checks failed
Run nix flake check / flake-check (push) Failing after 8m30s
Run nix flake check / flake-check (pull_request) Failing after 16m49s
Dashboard for monitoring systemd across the fleet:
- Summary stats: failed/active/inactive units, restarts, timers
- Failed units table (shows any units in failed state)
- Service restarts table (top 15 services by restart count)
- Active units per host bar chart
- NixOS upgrade timer table with last trigger time
- Backup timers table (restic jobs)
- Service restarts over time chart
- Hostname filter to focus on specific hosts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:06:59 +01:00
03ebee4d82 grafana: fix proxmox table __name__ column
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m9s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:04:41 +01:00
05630eb4d4 grafana: add Proxmox dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Dashboard for monitoring Proxmox VMs:
- Summary stats: VMs running/stopped, node CPU/memory, uptime
- VM status table with name, status, CPU%, memory%, uptime
- VM CPU usage over time
- VM memory usage over time
- Network traffic (RX/TX) per VM
- Disk I/O (read/write) per VM
- Storage usage gauges and capacity table
- VM filter to focus on specific VMs

Filters out template VMs, shows only actual guests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:02:28 +01:00
1e52eec02a monitoring: always include tier label in scrape configs
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m8s
Previously tier was only included if non-default (not "prod"), which
meant prod hosts had no tier label. This made the Grafana tier filter
only show "test" since "prod" never appeared in label_values().

Now tier is always included, so both "prod" and "test" appear in the
fleet dashboard tier selector.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:58:52 +01:00
d333aa0164 grafana: fix fleet table __name__ columns
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Exclude the __name__ columns that were leaking through the
table transformations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:52:39 +01:00
a5d5827dcc grafana: add NixOS fleet dashboard
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Dashboard for monitoring NixOS deployments across the homelab:
- Hosts behind remote / needing reboot stat panels
- Fleet status table with revision, behind status, reboot needed, age
- Generation age bar chart (shows stale configs)
- Generations per host bar chart
- Deployment activity time series (see when hosts were updated)
- Flake input ages table
- Pie charts for hosts by revision and tier
- Tier filter variable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:50:08 +01:00
1c13ec12a4 grafana: add temperature dashboard
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Dashboard includes:
- Current temperatures per room (stat panel)
- Average home temperature (gauge)
- Current humidity (stat panel)
- 30-day temperature history with mean/min/max in legend
- Temperature trend (rate of change per hour)
- 24h min/max/avg table per room
- 30-day humidity history

Filters out device_temperature (internal sensor) metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:45:52 +01:00
4bf0eeeadb grafana: add dashboards and fix permissions
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
- Change default OIDC role from Viewer to Editor for Explore access
- Add declarative dashboard provisioning
- Add node-exporter dashboard (CPU, memory, disk, load, network, I/O)
- Add Loki logs dashboard with host/job filters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:39:21 +01:00
304cb117ce Merge pull request 'grafana-kanidm-oidc' (#35) from grafana-kanidm-oidc into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m7s
Reviewed-on: #35
2026-02-08 19:30:20 +00:00
02270a0e4a docs: update plans with Grafana OIDC progress
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m7s
Run nix flake check / flake-check (push) Failing after 16m31s
- auth-system-replacement.md: Mark OAuth2 client (Grafana) as completed,
  document key findings (PKCE, attribute paths, user requirements)
- monitoring-migration-victoriametrics.md: Note Grafana deployment on
  monitoring02 with Kanidm OIDC as test instance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:28:10 +01:00
030e8518c5 grafana: add Grafana on monitoring02 with Kanidm OIDC
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m3s
Deploy Grafana test instance on monitoring02 with:
- Kanidm OIDC authentication (admins -> Admin role, others -> Viewer)
- PKCE enabled for secure OAuth2 flow (required by Kanidm)
- Declarative datasources for Prometheus and Loki on monitoring01
- Local Caddy for TLS termination via internal ACME CA
- DNS CNAME grafana-test.home.2rjus.net

Terraform changes add OAuth2 client secret and AppRole policies for
kanidm01 and monitoring02.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 20:23:26 +01:00
9ffdd4f862 terraform: increase monitoring02 disk to 60G
Some checks failed
Run nix flake check / flake-check (push) Failing after 11m8s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 19:23:40 +01:00
0b977808ca hosts: add monitoring02 configuration
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
New test-tier host for monitoring stack expansion with:
- Static IP 10.69.13.24
- 4 CPU cores, 4GB RAM, 20GB disk
- Vault integration and NATS-based deployment enabled

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 19:19:38 +01:00
8786113f8f docs: add OpenBao + Kanidm OIDC integration plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m10s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:45:44 +01:00
fdb2c31f84 docs: add pipe-to-loki documentation to CLAUDE.md
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m1s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:34:01 +01:00
78eb04205f system: add pipe-to-loki helper script
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Adds a system-wide script for sending command output or interactive
sessions to Loki for easy sharing with Claude.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:30:53 +01:00
19cb61ebbc Merge pull request 'kanidm-pam-client' (#34) from kanidm-pam-client into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m19s
Reviewed-on: #34
2026-02-08 14:14:53 +00:00
41 changed files with 3591 additions and 44 deletions

View File

@@ -39,6 +39,30 @@ Do not automatically deploy changes. Deployments are usually done by updating th
Do not run SSH commands directly. If a command needs to be run on a remote host, provide the command to the user and ask them to run it manually. Do not run SSH commands directly. If a command needs to be run on a remote host, provide the command to the user and ask them to run it manually.
### Sharing Command Output via Loki
All hosts have the `pipe-to-loki` script for sending command output or terminal sessions to Loki, allowing users to share output with Claude without copy-pasting.
**Pipe mode** - send command output:
```bash
command | pipe-to-loki # Auto-generated ID
command | pipe-to-loki --id my-test # Custom ID
```
**Session mode** - record interactive terminal session:
```bash
pipe-to-loki --record # Start recording, exit to send
pipe-to-loki --record --id my-session # With custom ID
```
The script prints the session ID which the user can share. Query results with:
```logql
{job="pipe-to-loki"} # All entries
{job="pipe-to-loki", id="my-test"} # Specific ID
{job="pipe-to-loki", host="testvm01"} # From specific host
{job="pipe-to-loki", type="session"} # Only sessions
```
### Testing Feature Branches on Hosts ### Testing Feature Branches on Hosts
All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging: All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging:

View File

@@ -151,11 +151,30 @@ Rationale:
- Well above NixOS system users (typically <1000) - Well above NixOS system users (typically <1000)
- Avoids Podman/container issues with very high GIDs - Avoids Podman/container issues with very high GIDs
### Completed (2026-02-08) - OAuth2/OIDC for Grafana
**OAuth2 client deployed for Grafana on monitoring02:**
- Client ID: `grafana`
- Redirect URL: `https://grafana-test.home.2rjus.net/login/generic_oauth`
- Scope maps: `openid`, `profile`, `email`, `groups` for `users` group
- Role mapping: `admins` group → Grafana Admin, others → Viewer
**Configuration locations:**
- Kanidm OAuth2 client: `services/kanidm/default.nix`
- Grafana OIDC config: `services/grafana/default.nix`
- Vault secret: `services/grafana/oauth2-client-secret`
**Key findings:**
- PKCE is required by Kanidm - enable `use_pkce = true` in Grafana
- Must set `email_attribute_path`, `login_attribute_path`, `name_attribute_path` to extract from userinfo
- Users need: primary credential (password + TOTP for MFA), membership in `users` group, email address set
- Unix password is separate from primary credential (web login requires primary credential)
### Next Steps ### Next Steps
1. Enable PAM/NSS on production hosts (after test tier validation) 1. Enable PAM/NSS on production hosts (after test tier validation)
2. Configure TrueNAS LDAP client for NAS integration testing 2. Configure TrueNAS LDAP client for NAS integration testing
3. Add OAuth2 clients (Grafana first) 3. Add OAuth2 clients for other services as needed
## References ## References

View File

@@ -0,0 +1,87 @@
# OpenBao + Kanidm OIDC Integration
## Status: Completed
Implemented 2026-02-09.
## Overview
Enable Kanidm users to authenticate to OpenBao (Vault) using OIDC for Web UI access. Members of the `admins` group get full read/write access to secrets.
## Implementation
### Files Modified
| File | Changes |
|------|---------|
| `terraform/vault/oidc.tf` | New - OIDC auth backend and roles |
| `terraform/vault/policies.tf` | Added oidc-admin and oidc-default policies |
| `terraform/vault/secrets.tf` | Added OAuth2 client secret |
| `terraform/vault/approle.tf` | Granted kanidm01 access to openbao secrets |
| `services/kanidm/default.nix` | Added openbao OAuth2 client, enabled imperative group membership |
### Kanidm Configuration
OAuth2 client `openbao` with:
- Confidential client (uses client secret)
- Web UI callback only: `https://vault.home.2rjus.net:8200/ui/vault/auth/oidc/oidc/callback`
- Legacy crypto enabled (RS256 for OpenBao compatibility)
- Scope maps for `admins` and `users` groups
Group membership is now managed imperatively (`overwriteMembers = false`) to prevent provisioning from resetting group memberships on service restart.
### OpenBao Configuration
OIDC auth backend at `/oidc` with two roles:
| Role | Bound Claims | Policy | Access |
|------|--------------|--------|--------|
| `admin` | `groups = admins@home.2rjus.net` | `oidc-admin` | Full read/write to secrets, system health/metrics |
| `default` | (none) | `oidc-default` | Token lookup-self, system health |
Both roles request scopes: `openid`, `profile`, `email`, `groups`
### Policies
**oidc-admin:**
- `secret/*` - create, read, update, delete, list
- `sys/health` - read
- `sys/metrics` - read
- `sys/auth` - read
- `sys/mounts` - read
**oidc-default:**
- `auth/token/lookup-self` - read
- `sys/health` - read
## Usage
### Web UI Login
1. Navigate to https://vault.home.2rjus.net:8200
2. Select "OIDC" authentication method
3. Enter role: `admin` (for admins) or `default` (for any user)
4. Click "Sign in with OIDC"
5. Authenticate with Kanidm
### Group Management
Add users to admins group for full access:
```bash
kanidm group add-members admins <username>
```
## Limitations
**CLI login not supported:** Kanidm requires HTTPS for all redirect URIs on confidential (non-public) OAuth2 clients. OpenBao CLI uses `http://localhost:8250/oidc/callback` which Kanidm rejects. Public clients would allow localhost redirects, but OpenBao requires a client secret for OIDC auth.
## Lessons Learned
1. **Kanidm group names:** Groups are returned as `groupname@domain` (e.g., `admins@home.2rjus.net`), not just the short name
2. **RS256 required:** OpenBao only supports RS256 for JWT signing; Kanidm defaults to ES256, requiring `enableLegacyCrypto = true`
3. **Scope request:** OIDC roles must explicitly request the `groups` scope via `oidc_scopes`
4. **Provisioning resets:** Kanidm provisioning with default `overwriteMembers = true` resets group memberships on restart
5. **Two-phase Terraform:** Secret must exist before OIDC backend can validate discovery URL
## References
- [OpenBao JWT/OIDC Auth Method](https://openbao.org/docs/auth/jwt/)
- [Kanidm OAuth2 Documentation](https://kanidm.github.io/kanidm/stable/integrations/oauth2.html)

View File

@@ -169,9 +169,30 @@ Once ready to cut over:
- Destroy VM in Proxmox - Destroy VM in Proxmox
- Remove from terraform state - Remove from terraform state
## Current Progress
### monitoring02 Host Created (2026-02-08)
Host deployed at 10.69.13.24 (test tier) with:
- 4 CPU cores, 8GB RAM, 60GB disk
- Vault integration enabled
- NATS-based remote deployment enabled
### Grafana with Kanidm OIDC (2026-02-08)
Grafana deployed on monitoring02 as a test instance (`grafana-test.home.2rjus.net`):
- Kanidm OIDC authentication (PKCE enabled)
- Role mapping: `admins` → Admin, others → Viewer
- Declarative datasources pointing to monitoring01 (Prometheus, Loki)
- Local Caddy for TLS termination via internal ACME CA
This validates the Grafana + OIDC pattern before the full VictoriaMetrics migration. The existing
`services/monitoring/grafana.nix` on monitoring01 can be replaced with the new `services/grafana/`
module once monitoring02 becomes the primary monitoring host.
## Open Questions ## Open Questions
- [ ] What disk size for monitoring02? 100GB should allow 3+ months with VictoriaMetrics compression - [ ] What disk size for monitoring02? Current 60GB may need expansion for 3+ months with VictoriaMetrics
- [ ] Which dashboards to recreate declaratively? (Review monitoring01 Grafana for current set) - [ ] Which dashboards to recreate declaratively? (Review monitoring01 Grafana for current set)
## VictoriaMetrics Service Configuration ## VictoriaMetrics Service Configuration

View File

@@ -43,11 +43,21 @@ kanidm person posix set-password <username>
kanidm person posix set <username> --shell /bin/zsh kanidm person posix set <username> --shell /bin/zsh
``` ```
### Setting Email Address
Email is required for OAuth2/OIDC login (e.g., Grafana):
```bash
kanidm person update <username> --mail <email>
```
### Example: Full User Creation ### Example: Full User Creation
```bash ```bash
kanidm person create testuser "Test User" kanidm person create testuser "Test User"
kanidm person update testuser --mail testuser@home.2rjus.net
kanidm group add-members ssh-users testuser kanidm group add-members ssh-users testuser
kanidm group add-members users testuser # Required for OAuth2 scopes
kanidm person posix set testuser kanidm person posix set testuser
kanidm person posix set-password testuser kanidm person posix set-password testuser
kanidm person get testuser kanidm person get testuser
@@ -129,6 +139,40 @@ Kanidm auto-assigns UIDs/GIDs from its configured range. For manually assigned G
| 65,536+ | Users (auto-assigned) | | 65,536+ | Users (auto-assigned) |
| 68,000 - 68,999 | Groups (manually assigned) | | 68,000 - 68,999 | Groups (manually assigned) |
## OAuth2/OIDC Login (Web Services)
For OAuth2/OIDC login to web services like Grafana, users need:
1. **Primary credential** - Password set via `credential update` (separate from unix password)
2. **MFA** - TOTP or passkey (Kanidm requires MFA for primary credentials)
3. **Group membership** - Member of `users` group (for OAuth2 scope mapping)
4. **Email address** - Set via `person update --mail`
### Setting Up Primary Credential (Web Login)
The primary credential is different from the unix/POSIX password:
```bash
# Interactive credential setup
kanidm person credential update <username>
# In the interactive prompt:
# 1. Type 'password' to set a password
# 2. Type 'totp' to add TOTP (scan QR with authenticator app)
# 3. Type 'commit' to save
```
### Verifying OAuth2 Readiness
```bash
kanidm person get <username>
```
Check for:
- `mail:` - Email address set
- `memberof:` - Includes `users@home.2rjus.net`
- Primary credential status (check via `credential update``status`)
## PAM/NSS Client Configuration ## PAM/NSS Client Configuration
Enable central authentication on a host: Enable central authentication on a host:

28
flake.lock generated
View File

@@ -28,11 +28,11 @@
] ]
}, },
"locked": { "locked": {
"lastModified": 1770481834, "lastModified": 1770648258,
"narHash": "sha256-Xx9BYnI0C/qgPbwr9nj6NoAdQTbYLunrdbNSaUww9oY=", "narHash": "sha256-sExxD8N9Q0RrHIoppOV6qp4jcJirLVjpQd20C72V78I=",
"ref": "master", "ref": "master",
"rev": "fd0d63b103dfaf21d1c27363266590e723021c67", "rev": "277a49a666347e2e2ae67128cf732956a9c3be56",
"revCount": 24, "revCount": 27,
"type": "git", "type": "git",
"url": "https://git.t-juice.club/torjus/homelab-deploy" "url": "https://git.t-juice.club/torjus/homelab-deploy"
}, },
@@ -49,11 +49,11 @@
] ]
}, },
"locked": { "locked": {
"lastModified": 1770422522, "lastModified": 1770593543,
"narHash": "sha256-WmIFnquu4u58v8S2bOVWmknRwHn4x88CRfBFTzJ1inQ=", "narHash": "sha256-hT8Rj6JAwGDFvcxWEcUzTCrWSiupCfBa57pBDnM2C5g=",
"ref": "refs/heads/master", "ref": "refs/heads/master",
"rev": "cf0ce858997af4d8dcc2ce10393ff393e17fc911", "rev": "5aa5f7275b7a08015816171ba06d2cbdc2e02d3e",
"revCount": 11, "revCount": 15,
"type": "git", "type": "git",
"url": "https://git.t-juice.club/torjus/nixos-exporter" "url": "https://git.t-juice.club/torjus/nixos-exporter"
}, },
@@ -64,11 +64,11 @@
}, },
"nixpkgs": { "nixpkgs": {
"locked": { "locked": {
"lastModified": 1770136044, "lastModified": 1770464364,
"narHash": "sha256-tlFqNG/uzz2++aAmn4v8J0vAkV3z7XngeIIB3rM3650=", "narHash": "sha256-z5NJPSBwsLf/OfD8WTmh79tlSU8XgIbwmk6qB1/TFzY=",
"owner": "nixos", "owner": "nixos",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "e576e3c9cf9bad747afcddd9e34f51d18c855b4e", "rev": "23d72dabcb3b12469f57b37170fcbc1789bd7457",
"type": "github" "type": "github"
}, },
"original": { "original": {
@@ -80,11 +80,11 @@
}, },
"nixpkgs-unstable": { "nixpkgs-unstable": {
"locked": { "locked": {
"lastModified": 1770197578, "lastModified": 1770562336,
"narHash": "sha256-AYqlWrX09+HvGs8zM6ebZ1pwUqjkfpnv8mewYwAo+iM=", "narHash": "sha256-ub1gpAONMFsT/GU2hV6ZWJjur8rJ6kKxdm9IlCT0j84=",
"owner": "nixos", "owner": "nixos",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "00c21e4c93d963c50d4c0c89bfa84ed6e0694df2", "rev": "d6c71932130818840fc8fe9509cf50be8c64634f",
"type": "github" "type": "github"
}, },
"original": { "original": {

View File

@@ -191,6 +191,15 @@
./hosts/kanidm01 ./hosts/kanidm01
]; ];
}; };
monitoring02 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self;
};
modules = commonModules ++ [
./hosts/monitoring02
];
};
}; };
packages = forAllSystems ( packages = forAllSystems (
{ pkgs }: { pkgs }:
@@ -208,6 +217,7 @@
pkgs.opentofu pkgs.opentofu
pkgs.openbao pkgs.openbao
pkgs.kanidm_1_8 pkgs.kanidm_1_8
pkgs.nkeys
(pkgs.callPackage ./scripts/create-host { }) (pkgs.callPackage ./scripts/create-host { })
homelab-deploy.packages.${pkgs.system}.default homelab-deploy.packages.${pkgs.system}.default
]; ];

View File

@@ -13,6 +13,8 @@
../../common/vm ../../common/vm
]; ];
homelab.host.role = "home-automation";
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
# Use the systemd-boot EFI boot loader. # Use the systemd-boot EFI boot loader.
boot.loader.grub = { boot.loader.grub = {

View File

@@ -11,6 +11,7 @@
../../common/vm ../../common/vm
]; ];
homelab.host.role = "proxy";
homelab.dns.cnames = [ homelab.dns.cnames = [
"nzbget" "nzbget"
"radarr" "radarr"

View File

@@ -11,6 +11,8 @@
../../common/vm ../../common/vm
]; ];
homelab.host.role = "media";
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
# Use the systemd-boot EFI boot loader. # Use the systemd-boot EFI boot loader.
boot.loader.grub = { boot.loader.grub = {

View File

@@ -14,9 +14,8 @@
../../services/kanidm ../../services/kanidm
]; ];
# Host metadata
homelab.host = { homelab.host = {
tier = "test"; tier = "prod";
role = "auth"; role = "auth";
}; };

View File

@@ -11,6 +11,8 @@
../../common/vm ../../common/vm
]; ];
homelab.host.role = "monitoring";
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
# Use the systemd-boot EFI boot loader. # Use the systemd-boot EFI boot loader.
boot.loader.grub = { boot.loader.grub = {

View File

@@ -0,0 +1,75 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
homelab.host = {
tier = "prod";
role = "monitoring";
};
# DNS CNAME for Grafana test instance
homelab.dns.cnames = [ "grafana-test" ];
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "monitoring02";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.24/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,6 @@
{ ... }: {
imports = [
./configuration.nix
../../services/grafana
];
}

View File

@@ -11,6 +11,8 @@
../../common/vm ../../common/vm
]; ];
homelab.host.role = "messaging";
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
# Use the systemd-boot EFI boot loader. # Use the systemd-boot EFI boot loader.
boot.loader.grub = { boot.loader.grub = {

View File

@@ -14,9 +14,9 @@
../../common/ssh-audit.nix ../../common/ssh-audit.nix
]; ];
# Host metadata (adjust as needed)
homelab.host = { homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation tier = "test";
role = "test";
}; };
# Enable Vault integration # Enable Vault integration

View File

@@ -14,9 +14,9 @@
../../common/ssh-audit.nix ../../common/ssh-audit.nix
]; ];
# Host metadata (adjust as needed)
homelab.host = { homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation tier = "test";
role = "test";
}; };
# Enable Vault integration # Enable Vault integration

View File

@@ -14,9 +14,9 @@
../../common/ssh-audit.nix ../../common/ssh-audit.nix
]; ];
# Host metadata (adjust as needed)
homelab.host = { homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation tier = "test";
role = "test";
}; };
# Enable Vault integration # Enable Vault integration

View File

@@ -58,10 +58,9 @@ let
}; };
# Build effective labels for a host # Build effective labels for a host
# Always includes hostname; only includes tier/priority/role if non-default # Always includes hostname and tier; only includes priority/role if non-default
buildEffectiveLabels = host: buildEffectiveLabels = host:
{ hostname = host.hostname; } { hostname = host.hostname; tier = host.tier; }
// (lib.optionalAttrs (host.tier != "prod") { tier = host.tier; })
// (lib.optionalAttrs (host.priority != "high") { priority = host.priority; }) // (lib.optionalAttrs (host.priority != "high") { priority = host.priority; })
// (lib.optionalAttrs (host.role != null) { role = host.role; }) // (lib.optionalAttrs (host.role != null) { role = host.role; })
// host.labels; // host.labels;

View File

@@ -0,0 +1,85 @@
{
"uid": "logs-homelab",
"title": "Logs - Homelab",
"tags": ["loki", "logs", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"templating": {
"list": [
{
"name": "host",
"type": "query",
"datasource": {"type": "loki", "uid": "loki"},
"query": "label_values(host)",
"refresh": 2,
"includeAll": true,
"multi": false,
"current": {"text": "All", "value": "$__all"}
},
{
"name": "job",
"type": "query",
"datasource": {"type": "loki", "uid": "loki"},
"query": "label_values(job)",
"refresh": 2,
"includeAll": true,
"multi": false,
"current": {"text": "All", "value": "$__all"}
},
{
"name": "search",
"type": "textbox",
"current": {"text": "", "value": ""},
"label": "Search"
}
]
},
"panels": [
{
"id": 1,
"title": "Log Volume",
"type": "timeseries",
"gridPos": {"h": 6, "w": 24, "x": 0, "y": 0},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum by (host) (count_over_time({host=~\"$host\", job=~\"$job\"} |~ \"$search\" [1m]))",
"legendFormat": "{{host}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short"
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"}
}
},
{
"id": 2,
"title": "Logs",
"type": "logs",
"gridPos": {"h": 18, "w": 24, "x": 0, "y": 6},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "{host=~\"$host\", job=~\"$job\"} |~ \"$search\"",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"sortOrder": "Descending"
}
}
]
}

View File

@@ -0,0 +1,633 @@
{
"uid": "nixos-fleet-homelab",
"title": "NixOS Fleet - Homelab",
"tags": ["nixos", "fleet", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "1m",
"time": {
"from": "now-7d",
"to": "now"
},
"templating": {
"list": [
{
"name": "tier",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(nixos_flake_info, tier)",
"refresh": 2,
"includeAll": true,
"multi": false,
"current": {"text": "All", "value": "$__all"}
}
]
},
"panels": [
{
"id": 1,
"title": "Hosts Behind Remote",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 1)",
"legendFormat": "Behind",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 5}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none",
"textMode": "auto"
},
"description": "Number of hosts where current revision differs from remote master"
},
{
"id": 2,
"title": "Hosts Needing Reboot",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_config_mismatch{tier=~\"$tier\"} == 1)",
"legendFormat": "Need Reboot",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "orange", "value": 3},
{"color": "red", "value": 5}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Hosts where booted generation differs from current (switched but not rebooted)"
},
{
"id": 3,
"title": "Total Hosts",
"type": "stat",
"gridPos": {"h": 4, "w": 3, "x": 8, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_flake_info{tier=~\"$tier\"})",
"legendFormat": "Hosts",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 4,
"title": "Nixpkgs Age",
"type": "stat",
"gridPos": {"h": 4, "w": 3, "x": 11, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "max(nixos_flake_input_age_seconds{input=\"nixpkgs\", tier=~\"$tier\"})",
"legendFormat": "Nixpkgs",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 604800},
{"color": "orange", "value": 1209600},
{"color": "red", "value": 2592000}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Age of nixpkgs flake input (yellow >7d, orange >14d, red >30d)"
},
{
"id": 5,
"title": "Hosts Up-to-date",
"type": "stat",
"gridPos": {"h": 4, "w": 3, "x": 14, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 0)",
"legendFormat": "Up-to-date",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 13,
"title": "Deployments (24h)",
"type": "stat",
"gridPos": {"h": 4, "w": 3, "x": 17, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sum(increase(homelab_deploy_deployments_total{status=\"completed\"}[24h]))",
"legendFormat": "Deployments",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
},
"noValue": "0",
"decimals": 0
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Total successful deployments in the last 24 hours"
},
{
"id": 14,
"title": "Avg Deploy Time",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sum(increase(homelab_deploy_deployment_duration_seconds_sum{success=\"true\"}[24h])) / sum(increase(homelab_deploy_deployment_duration_seconds_count{success=\"true\"}[24h]))",
"legendFormat": "Avg Time",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 30},
{"color": "red", "value": 60}
]
},
"noValue": "-"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Average deployment duration over the last 24 hours (yellow >30s, red >60s)"
},
{
"id": 6,
"title": "Fleet Status",
"type": "table",
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "nixos_flake_info{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "info"
},
{
"expr": "nixos_flake_revision_behind{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "behind"
},
{
"expr": "nixos_config_mismatch{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "mismatch"
},
{
"expr": "nixos_generation_age_seconds{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "age"
},
{
"expr": "nixos_generation_count{tier=~\"$tier\"}",
"format": "table",
"instant": true,
"refId": "count"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Hostname"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Current Rev"},
"properties": [{"id": "custom.width", "value": 90}]
},
{
"matcher": {"id": "byName", "options": "Remote Rev"},
"properties": [{"id": "custom.width", "value": 90}]
},
{
"matcher": {"id": "byName", "options": "Behind"},
"properties": [
{"id": "custom.width", "value": 70},
{"id": "mappings", "value": [
{"type": "value", "options": {"0": {"text": "No", "color": "green"}}},
{"type": "value", "options": {"1": {"text": "Yes", "color": "red"}}}
]},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
},
{
"matcher": {"id": "byName", "options": "Need Reboot"},
"properties": [
{"id": "custom.width", "value": 100},
{"id": "mappings", "value": [
{"type": "value", "options": {"0": {"text": "No", "color": "green"}}},
{"type": "value", "options": {"1": {"text": "Yes", "color": "orange"}}}
]},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
},
{
"matcher": {"id": "byName", "options": "Config Age"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 100}
]
},
{
"matcher": {"id": "byName", "options": "Generations"},
"properties": [{"id": "custom.width", "value": 100}]
},
{
"matcher": {"id": "byName", "options": "Tier"},
"properties": [{"id": "custom.width", "value": 60}]
},
{
"matcher": {"id": "byName", "options": "Role"},
"properties": [{"id": "custom.width", "value": 80}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Hostname", "desc": false}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "hostname", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"Time 2": true,
"Time 3": true,
"Time 4": true,
"Time 5": true,
"Value #info": true,
"__name__": true,
"__name__ 1": true,
"__name__ 2": true,
"__name__ 3": true,
"__name__ 4": true,
"__name__ 5": true,
"dns_role": true,
"dns_role 1": true,
"dns_role 2": true,
"dns_role 3": true,
"dns_role 4": true,
"instance": true,
"instance 1": true,
"instance 2": true,
"instance 3": true,
"instance 4": true,
"job": true,
"job 1": true,
"job 2": true,
"job 3": true,
"job 4": true,
"nixos_version": true,
"nixpkgs_rev": true,
"role 1": true,
"role 2": true,
"role 3": true,
"role 4": true,
"tier 1": true,
"tier 2": true,
"tier 3": true,
"tier 4": true
},
"indexByName": {
"hostname": 0,
"tier": 1,
"role": 2,
"current_rev": 3,
"remote_rev": 4,
"Value #behind": 5,
"Value #mismatch": 6,
"Value #age": 7,
"Value #count": 8
},
"renameByName": {
"hostname": "Hostname",
"tier": "Tier",
"role": "Role",
"current_rev": "Current Rev",
"remote_rev": "Remote Rev",
"Value #behind": "Behind",
"Value #mismatch": "Need Reboot",
"Value #age": "Config Age",
"Value #count": "Generations"
}
}
}
]
},
{
"id": 7,
"title": "Generation Age by Host",
"type": "bargauge",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sort_desc(nixos_generation_age_seconds{tier=~\"$tier\"})",
"legendFormat": "{{hostname}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 86400},
{"color": "orange", "value": 259200},
{"color": "red", "value": 604800}
]
},
"min": 0
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
},
"description": "How long ago each host's current config was deployed (yellow >1d, orange >3d, red >7d)"
},
{
"id": 8,
"title": "Generations per Host",
"type": "bargauge",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sort_desc(nixos_generation_count{tier=~\"$tier\"})",
"legendFormat": "{{hostname}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "purple", "value": 50}
]
},
"min": 0
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
},
"description": "Total number of NixOS generations on each host"
},
{
"id": 9,
"title": "Deployment Activity (Generation Age Over Time)",
"type": "timeseries",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 22},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "nixos_generation_age_seconds{tier=~\"$tier\"}",
"legendFormat": "{{hostname}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"custom": {
"lineWidth": 1,
"fillOpacity": 0,
"showPoints": "never",
"stacking": {"mode": "none"}
}
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "Generation age increases over time, drops to near-zero when deployed. Useful to see deployment patterns."
},
{
"id": 10,
"title": "Flake Input Ages",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "max by (input) (nixos_flake_input_age_seconds)",
"format": "table",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s"
},
"overrides": [
{
"matcher": {"id": "byName", "options": "input"},
"properties": [{"id": "custom.width", "value": 150}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Value", "desc": true}]
},
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {"Time": true},
"renameByName": {
"input": "Flake Input",
"Value": "Age"
}
}
}
],
"description": "Age of each flake input across the fleet"
},
{
"id": 11,
"title": "Hosts by Revision",
"type": "piechart",
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count by (current_rev) (nixos_flake_info{tier=~\"$tier\"})",
"legendFormat": "{{current_rev}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"legend": {"displayMode": "table", "placement": "right", "values": ["value"]},
"pieType": "pie"
},
"description": "Distribution of hosts by their current flake revision"
},
{
"id": 12,
"title": "Hosts by Tier",
"type": "piechart",
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count by (tier) (nixos_flake_info)",
"legendFormat": "{{tier}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"legend": {"displayMode": "table", "placement": "right", "values": ["value"]},
"pieType": "pie"
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "^$",
"renamePattern": "prod"
}
}
],
"description": "Distribution of hosts by tier (test vs prod)"
}
]
}

View File

@@ -0,0 +1,296 @@
{
"uid": "nixos-operations",
"title": "NixOS Operations",
"tags": ["loki", "nixos", "operations", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "1m",
"time": {
"from": "now-24h",
"to": "now"
},
"templating": {
"list": [
{
"name": "host",
"type": "query",
"datasource": {"type": "loki", "uid": "loki"},
"query": "label_values(host)",
"refresh": 2,
"includeAll": true,
"multi": true,
"current": {"text": "All", "value": "$__all"}
}
]
},
"panels": [
{
"id": 1,
"title": "Upgrade Log Volume",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum(count_over_time({systemd_unit=\"nixos-upgrade.service\", host=~\"$host\"} [$__range]))",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Total log entries from nixos-upgrade.service in selected time range"
},
{
"id": 2,
"title": "Successful Upgrades",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum(count_over_time({systemd_unit=\"nixos-upgrade.service\", host=~\"$host\"} |= \"Done. The new configuration is\" [$__range]))",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Upgrades that completed successfully"
},
{
"id": 3,
"title": "Upgrade Errors",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum(count_over_time({systemd_unit=\"nixos-upgrade.service\", host=~\"$host\"} |~ \"(?i)error|failed\" [$__range]))",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "red", "value": 1}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Upgrade log entries containing errors"
},
{
"id": 4,
"title": "Bootstrap Events",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum(count_over_time({job=\"bootstrap\", host=~\"$host\"} [$__range]))",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "purple", "value": null}]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
},
"description": "Bootstrap log entries from new VM deployments"
},
{
"id": 5,
"title": "Upgrade Activity by Host",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum by (host) (count_over_time({systemd_unit=\"nixos-upgrade.service\", host=~\"$host\"} [5m]))",
"legendFormat": "{{host}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"custom": {
"lineWidth": 1,
"fillOpacity": 30,
"showPoints": "never",
"stacking": {"mode": "normal"}
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "When upgrades ran on each host"
},
{
"id": 6,
"title": "ACME Certificate Activity",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "sum by (host) (count_over_time({systemd_unit=~\"acme.*\", host=~\"$host\"} [5m]))",
"legendFormat": "{{host}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"custom": {
"lineWidth": 1,
"fillOpacity": 30,
"showPoints": "never",
"stacking": {"mode": "normal"}
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "ACME certificate renewal activity"
},
{
"id": 7,
"title": "Recent Upgrade Completions",
"type": "logs",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 12},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "{systemd_unit=\"nixos-upgrade.service\", host=~\"$host\"} |= \"Done. The new configuration is\" | json | line_format \"{{.MESSAGE}}\" | keep host",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"sortOrder": "Descending"
},
"description": "Successful upgrade completion messages showing the new system path"
},
{
"id": 8,
"title": "Build Activity",
"type": "logs",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 12},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "{systemd_unit=\"nixos-upgrade.service\", host=~\"$host\"} |= \"building\" | json | line_format \"{{.MESSAGE}}\" | keep host",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"sortOrder": "Descending"
},
"description": "Derivations being built during upgrades"
},
{
"id": 9,
"title": "Bootstrap Logs",
"type": "logs",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 20},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "{job=\"bootstrap\", host=~\"$host\"}",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"sortOrder": "Descending"
},
"description": "Logs from VM bootstrap process (new deployments)"
},
{
"id": 10,
"title": "Upgrade Errors & Failures",
"type": "logs",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 28},
"datasource": {"type": "loki", "uid": "loki"},
"targets": [
{
"expr": "{systemd_unit=\"nixos-upgrade.service\", host=~\"$host\"} |~ \"(?i)error|failed\" | json | line_format \"{{.MESSAGE}}\" | keep host",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"sortOrder": "Descending"
},
"description": "Errors and failures during NixOS upgrades"
}
]
}

View File

@@ -0,0 +1,208 @@
{
"uid": "node-exporter-homelab",
"title": "Node Exporter - Homelab",
"tags": ["node-exporter", "prometheus", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"templating": {
"list": [
{
"name": "instance",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(node_uname_info, instance)",
"refresh": 2,
"includeAll": false,
"multi": false,
"current": {}
}
]
},
"panels": [
{
"id": 1,
"title": "CPU Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\", instance=~\"$instance\"}[5m])) * 100)",
"legendFormat": "CPU %",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
}
},
{
"id": 2,
"title": "Memory Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "(1 - (node_memory_MemAvailable_bytes{instance=~\"$instance\"} / node_memory_MemTotal_bytes{instance=~\"$instance\"})) * 100",
"legendFormat": "Memory %",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
}
},
{
"id": 3,
"title": "Disk Usage",
"type": "gauge",
"gridPos": {"h": 8, "w": 8, "x": 0, "y": 8},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "100 - ((node_filesystem_avail_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"} / node_filesystem_size_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"}) * 100)",
"legendFormat": "Root /",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 85}
]
}
}
}
},
{
"id": 4,
"title": "System Load",
"type": "timeseries",
"gridPos": {"h": 8, "w": 8, "x": 8, "y": 8},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "node_load1{instance=~\"$instance\"}",
"legendFormat": "1m",
"refId": "A"
},
{
"expr": "node_load5{instance=~\"$instance\"}",
"legendFormat": "5m",
"refId": "B"
},
{
"expr": "node_load15{instance=~\"$instance\"}",
"legendFormat": "15m",
"refId": "C"
}
],
"fieldConfig": {
"defaults": {
"unit": "short"
}
}
},
{
"id": 5,
"title": "Uptime",
"type": "stat",
"gridPos": {"h": 8, "w": 8, "x": 16, "y": 8},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "time() - node_boot_time_seconds{instance=~\"$instance\"}",
"legendFormat": "Uptime",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s"
}
}
},
{
"id": 6,
"title": "Network Traffic",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{instance=~\"$instance\",device!~\"lo|veth.*|br.*|docker.*\"}[5m])",
"legendFormat": "Receive {{device}}",
"refId": "A"
},
{
"expr": "-rate(node_network_transmit_bytes_total{instance=~\"$instance\",device!~\"lo|veth.*|br.*|docker.*\"}[5m])",
"legendFormat": "Transmit {{device}}",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps"
}
}
},
{
"id": 7,
"title": "Disk I/O",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(node_disk_read_bytes_total{instance=~\"$instance\",device!~\"dm-.*\"}[5m])",
"legendFormat": "Read {{device}}",
"refId": "A"
},
{
"expr": "-rate(node_disk_written_bytes_total{instance=~\"$instance\",device!~\"dm-.*\"}[5m])",
"legendFormat": "Write {{device}}",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps"
}
}
}
]
}

View File

@@ -0,0 +1,606 @@
{
"uid": "proxmox-homelab",
"title": "Proxmox - Homelab",
"tags": ["proxmox", "virtualization", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"time": {
"from": "now-6h",
"to": "now"
},
"templating": {
"list": [
{
"name": "vm",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(pve_guest_info{template=\"0\"}, name)",
"refresh": 2,
"includeAll": true,
"multi": true,
"current": {"text": "All", "value": "$__all"}
}
]
},
"panels": [
{
"id": 1,
"title": "VMs Running",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 1)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 2,
"title": "VMs Stopped",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 0)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 3}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 3,
"title": "Node CPU",
"type": "gauge",
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_cpu_usage_ratio{id=~\"node/.*\"} * 100",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"showThresholdLabels": false,
"showThresholdMarkers": true
}
},
{
"id": 4,
"title": "Node Memory",
"type": "gauge",
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_memory_usage_bytes{id=~\"node/.*\"} / pve_memory_size_bytes{id=~\"node/.*\"} * 100",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"showThresholdLabels": false,
"showThresholdMarkers": true
}
},
{
"id": 5,
"title": "Node Uptime",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_uptime_seconds{id=~\"node/.*\"}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 6,
"title": "Templates",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(pve_guest_info{template=\"1\"})",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "purple", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 7,
"title": "VM Status",
"type": "table",
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_guest_info{template=\"0\", name=~\"$vm\"}",
"format": "table",
"instant": true,
"refId": "info"
},
{
"expr": "pve_up{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"format": "table",
"instant": true,
"refId": "status"
},
{
"expr": "pve_cpu_usage_ratio{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} * 100",
"format": "table",
"instant": true,
"refId": "cpu"
},
{
"expr": "pve_memory_usage_bytes{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} / on(id) pve_memory_size_bytes * 100",
"format": "table",
"instant": true,
"refId": "mem"
},
{
"expr": "pve_uptime_seconds{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"format": "table",
"instant": true,
"refId": "uptime"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Name"},
"properties": [{"id": "custom.width", "value": 150}]
},
{
"matcher": {"id": "byName", "options": "Status"},
"properties": [
{"id": "custom.width", "value": 80},
{"id": "mappings", "value": [
{"type": "value", "options": {"0": {"text": "Stopped", "color": "red"}}},
{"type": "value", "options": {"1": {"text": "Running", "color": "green"}}}
]},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
},
{
"matcher": {"id": "byName", "options": "CPU %"},
"properties": [
{"id": "unit", "value": "percent"},
{"id": "decimals", "value": 1},
{"id": "custom.width", "value": 80},
{"id": "custom.cellOptions", "value": {"type": "gauge", "mode": "basic"}},
{"id": "min", "value": 0},
{"id": "max", "value": 100},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 50}, {"color": "red", "value": 80}]}}
]
},
{
"matcher": {"id": "byName", "options": "Memory %"},
"properties": [
{"id": "unit", "value": "percent"},
{"id": "decimals", "value": 1},
{"id": "custom.width", "value": 100},
{"id": "custom.cellOptions", "value": {"type": "gauge", "mode": "basic"}},
{"id": "min", "value": 0},
{"id": "max", "value": 100},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 70}, {"color": "red", "value": 90}]}}
]
},
{
"matcher": {"id": "byName", "options": "Uptime"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 100}
]
},
{
"matcher": {"id": "byName", "options": "ID"},
"properties": [{"id": "custom.width", "value": 90}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Name", "desc": false}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "name", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"Time 2": true,
"Time 3": true,
"Time 4": true,
"Value #info": true,
"__name__": true,
"id 1": true,
"id 2": true,
"id 3": true,
"id 4": true,
"instance": true,
"instance 1": true,
"instance 2": true,
"instance 3": true,
"instance 4": true,
"job": true,
"job 1": true,
"job 2": true,
"job 3": true,
"job 4": true,
"name 1": true,
"name 2": true,
"name 3": true,
"name 4": true,
"node": true,
"tags": true,
"template": true,
"type": true
},
"indexByName": {
"name": 0,
"id": 1,
"Value #status": 2,
"Value #cpu": 3,
"Value #mem": 4,
"Value #uptime": 5
},
"renameByName": {
"name": "Name",
"id": "ID",
"Value #status": "Status",
"Value #cpu": "CPU %",
"Value #mem": "Memory %",
"Value #uptime": "Uptime"
}
}
}
]
},
{
"id": 8,
"title": "VM CPU Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_cpu_usage_ratio{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} * 100",
"legendFormat": "{{name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 9,
"title": "VM Memory Usage",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_memory_usage_bytes{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "bytes",
"min": 0,
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 10,
"title": "VM Network Traffic",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(pve_network_receive_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} RX",
"refId": "A"
},
{
"expr": "-rate(pve_network_transmit_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} TX",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps",
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 11,
"title": "VM Disk I/O",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "rate(pve_disk_read_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} Read",
"refId": "A"
},
{
"expr": "-rate(pve_disk_write_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
"legendFormat": "{{name}} Write",
"refId": "B"
}
],
"fieldConfig": {
"defaults": {
"unit": "Bps",
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never"
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
}
},
{
"id": 12,
"title": "Storage Usage",
"type": "bargauge",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_disk_usage_bytes{id=~\"storage/.*\"} / pve_disk_size_bytes{id=~\"storage/.*\"} * 100",
"legendFormat": "{{id}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 85}
]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "storage/pve1/(.*)",
"renamePattern": "$1"
}
}
]
},
{
"id": 13,
"title": "Storage Capacity",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 30},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "pve_disk_size_bytes{id=~\"storage/.*\"}",
"format": "table",
"instant": true,
"refId": "size"
},
{
"expr": "pve_disk_usage_bytes{id=~\"storage/.*\"}",
"format": "table",
"instant": true,
"refId": "used"
},
{
"expr": "pve_disk_size_bytes{id=~\"storage/.*\"} - pve_disk_usage_bytes{id=~\"storage/.*\"}",
"format": "table",
"instant": true,
"refId": "free"
}
],
"fieldConfig": {
"defaults": {
"unit": "bytes"
},
"overrides": [
{
"matcher": {"id": "byName", "options": "Storage"},
"properties": [{"id": "unit", "value": "none"}]
}
]
},
"options": {
"showHeader": true
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "id", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"Time 2": true,
"instance": true,
"instance 1": true,
"instance 2": true,
"job": true,
"job 1": true,
"job 2": true
},
"renameByName": {
"id": "Storage",
"Value #size": "Total",
"Value #used": "Used",
"Value #free": "Free"
}
}
},
{
"id": "renameByRegex",
"options": {
"regex": "storage/pve1/(.*)",
"renamePattern": "$1"
}
}
]
}
]
}

View File

@@ -0,0 +1,553 @@
{
"uid": "systemd-homelab",
"title": "Systemd Services - Homelab",
"tags": ["systemd", "services", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "1m",
"time": {
"from": "now-24h",
"to": "now"
},
"templating": {
"list": [
{
"name": "hostname",
"type": "query",
"datasource": {"type": "prometheus", "uid": "prometheus"},
"query": "label_values(systemd_unit_state, hostname)",
"refresh": 2,
"includeAll": true,
"multi": true,
"current": {"text": "All", "value": "$__all"}
}
]
},
"panels": [
{
"id": 1,
"title": "Failed Units",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1) or vector(0)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "red", "value": 1}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 2,
"title": "Active Units",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 3,
"title": "Hosts Monitored",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(count by (hostname) (systemd_unit_state{hostname=~\"$hostname\"}))",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 4,
"title": "Total Service Restarts",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sum(systemd_service_restart_total{hostname=~\"$hostname\"})",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 10},
{"color": "orange", "value": 50}
]
},
"noValue": "0"
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 5,
"title": "Inactive Units",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_unit_state{state=\"inactive\", hostname=~\"$hostname\"} == 1)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "purple", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 6,
"title": "Timers",
"type": "stat",
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "count(systemd_timer_last_trigger_seconds{hostname=~\"$hostname\"})",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "blue", "value": null}]
}
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"colorMode": "value",
"graphMode": "none"
}
},
{
"id": 7,
"title": "Failed Units",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1",
"format": "table",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Unit"},
"properties": [{"id": "custom.width", "value": 300}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Host", "desc": false}]
},
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Value": true,
"__name__": true,
"dns_role": true,
"instance": true,
"job": true,
"role": true,
"state": true,
"tier": true,
"type": true
},
"renameByName": {
"hostname": "Host",
"name": "Unit"
}
}
}
],
"description": "Units currently in failed state"
},
{
"id": 8,
"title": "Service Restarts (Top 15)",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 4},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "topk(15, systemd_service_restart_total{hostname=~\"$hostname\"} > 0)",
"format": "table",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Service"},
"properties": [{"id": "custom.width", "value": 280}]
},
{
"matcher": {"id": "byName", "options": "Restarts"},
"properties": [{"id": "custom.width", "value": 80}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Restarts", "desc": true}]
},
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"__name__": true,
"dns_role": true,
"instance": true,
"job": true,
"role": true,
"tier": true
},
"renameByName": {
"hostname": "Host",
"name": "Service",
"Value": "Restarts"
}
}
}
],
"description": "Services that have been restarted (since host boot)"
},
{
"id": 9,
"title": "Active Units per Host",
"type": "bargauge",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 10},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sort_desc(count by (hostname) (systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1))",
"legendFormat": "{{hostname}}",
"refId": "A",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [{"color": "green", "value": null}]
},
"min": 0
}
},
"options": {
"reduceOptions": {"calcs": ["lastNotNull"]},
"orientation": "horizontal",
"displayMode": "gradient",
"showUnfilled": true
}
},
{
"id": 10,
"title": "NixOS Upgrade Timers",
"type": "table",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 10},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "systemd_timer_last_trigger_seconds{name=\"nixos-upgrade.timer\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "last"
},
{
"expr": "time() - systemd_timer_last_trigger_seconds{name=\"nixos-upgrade.timer\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "ago"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 130}]
},
{
"matcher": {"id": "byName", "options": "Last Trigger"},
"properties": [
{"id": "unit", "value": "dateTimeAsLocalNoDateIfToday"},
{"id": "custom.width", "value": 180}
]
},
{
"matcher": {"id": "byName", "options": "Time Ago"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 120},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 86400}, {"color": "red", "value": 172800}]}},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Time Ago", "desc": true}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "hostname", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"__name__": true,
"__name__ 1": true,
"dns_role": true,
"dns_role 1": true,
"instance": true,
"instance 1": true,
"job": true,
"job 1": true,
"name": true,
"name 1": true,
"role": true,
"role 1": true,
"tier": true,
"tier 1": true
},
"indexByName": {
"hostname": 0,
"Value #last": 1,
"Value #ago": 2
},
"renameByName": {
"hostname": "Host",
"Value #last": "Last Trigger",
"Value #ago": "Time Ago"
}
}
}
],
"description": "When nixos-upgrade.timer last ran on each host. Yellow >24h, Red >48h."
},
{
"id": 11,
"title": "Backup Timers",
"type": "table",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 18},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "systemd_timer_last_trigger_seconds{name=~\"restic.*\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "last"
},
{
"expr": "time() - systemd_timer_last_trigger_seconds{name=~\"restic.*\", hostname=~\"$hostname\"}",
"format": "table",
"instant": true,
"refId": "ago"
}
],
"fieldConfig": {
"defaults": {},
"overrides": [
{
"matcher": {"id": "byName", "options": "Host"},
"properties": [{"id": "custom.width", "value": 120}]
},
{
"matcher": {"id": "byName", "options": "Timer"},
"properties": [{"id": "custom.width", "value": 220}]
},
{
"matcher": {"id": "byName", "options": "Last Trigger"},
"properties": [
{"id": "unit", "value": "dateTimeAsLocalNoDateIfToday"},
{"id": "custom.width", "value": 180}
]
},
{
"matcher": {"id": "byName", "options": "Time Ago"},
"properties": [
{"id": "unit", "value": "s"},
{"id": "custom.width", "value": 100},
{"id": "thresholds", "value": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 86400}, {"color": "red", "value": 172800}]}},
{"id": "custom.cellOptions", "value": {"type": "color-text"}}
]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Time Ago", "desc": true}]
},
"transformations": [
{
"id": "joinByField",
"options": {"byField": "name", "mode": "outer"}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"__name__": true,
"__name__ 1": true,
"dns_role": true,
"dns_role 1": true,
"instance": true,
"instance 1": true,
"job": true,
"job 1": true,
"role": true,
"role 1": true,
"tier": true,
"tier 1": true,
"hostname 1": true
},
"indexByName": {
"hostname": 0,
"name": 1,
"Value #last": 2,
"Value #ago": 3
},
"renameByName": {
"hostname": "Host",
"name": "Timer",
"Value #last": "Last Trigger",
"Value #ago": "Time Ago"
}
}
}
],
"description": "Restic backup timers"
},
{
"id": 12,
"title": "Service Restarts Over Time",
"type": "timeseries",
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 18},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "sum by (hostname) (increase(systemd_service_restart_total{hostname=~\"$hostname\"}[1h]))",
"legendFormat": "{{hostname}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"custom": {
"lineWidth": 1,
"fillOpacity": 20,
"showPoints": "never",
"stacking": {"mode": "normal"}
}
}
},
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "multi", "sort": "desc"}
},
"description": "Service restart rate per hour"
}
]
}

View File

@@ -0,0 +1,399 @@
{
"uid": "temperature-homelab",
"title": "Temperature - Homelab",
"tags": ["home-assistant", "temperature", "homelab"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "1m",
"time": {
"from": "now-30d",
"to": "now"
},
"templating": {
"list": []
},
"panels": [
{
"id": 1,
"title": "Current Temperatures",
"type": "stat",
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "green", "value": 18},
{"color": "yellow", "value": 24},
{"color": "orange", "value": 27},
{"color": "red", "value": 30}
]
},
"mappings": []
},
"overrides": []
},
"options": {
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto"
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
}
]
},
{
"id": 2,
"title": "Average Home Temperature",
"type": "gauge",
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "avg(hass_sensor_temperature_celsius{entity!~\".*device_temperature|.*server.*\"})",
"legendFormat": "Average",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"min": 15,
"max": 30,
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "green", "value": 18},
{"color": "yellow", "value": 24},
{"color": "red", "value": 28}
]
}
}
},
"options": {
"reduceOptions": {
"calcs": ["lastNotNull"]
},
"showThresholdLabels": false,
"showThresholdMarkers": true
}
},
{
"id": 3,
"title": "Current Humidity",
"type": "stat",
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 0},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_humidity_percent{entity!~\".*server.*\"}",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "red", "value": null},
{"color": "yellow", "value": 30},
{"color": "green", "value": 40},
{"color": "yellow", "value": 60},
{"color": "red", "value": 70}
]
}
}
},
"options": {
"reduceOptions": {
"calcs": ["lastNotNull"]
},
"orientation": "horizontal",
"colorMode": "value",
"graphMode": "none"
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Humidity",
"renamePattern": "$1"
}
}
]
},
{
"id": 4,
"title": "Temperature History (30 Days)",
"type": "timeseries",
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 6},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"pointSize": 5,
"showPoints": "never",
"spanNulls": 3600000
}
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom",
"calcs": ["mean", "min", "max"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "temp_server Temperature",
"renamePattern": "Server"
}
}
]
},
{
"id": 5,
"title": "Temperature Trend (1h rate of change)",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "deriv(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[1h]) * 3600",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"custom": {
"lineWidth": 1,
"fillOpacity": 20,
"showPoints": "never",
"spanNulls": 3600000
},
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "blue", "value": null},
{"color": "green", "value": -0.5},
{"color": "green", "value": 0.5},
{"color": "red", "value": 1}
]
},
"displayName": "${__field.labels.friendly_name}"
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "multi"
}
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "temp_server Temperature",
"renamePattern": "Server"
}
}
],
"description": "Rate of temperature change per hour. Positive = warming, Negative = cooling."
},
{
"id": 6,
"title": "24h Min / Max / Avg",
"type": "table",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "min_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
"legendFormat": "{{friendly_name}}",
"refId": "min",
"instant": true
},
{
"expr": "max_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
"legendFormat": "{{friendly_name}}",
"refId": "max",
"instant": true
},
{
"expr": "avg_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
"legendFormat": "{{friendly_name}}",
"refId": "avg",
"instant": true
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"decimals": 1
},
"overrides": [
{
"matcher": {"id": "byName", "options": "Room"},
"properties": [{"id": "custom.width", "value": 150}]
}
]
},
"options": {
"showHeader": true,
"sortBy": [{"displayName": "Room", "desc": false}]
},
"transformations": [
{
"id": "joinByField",
"options": {
"byField": "friendly_name",
"mode": "outer"
}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"domain": true,
"entity": true,
"hostname": true,
"instance": true,
"job": true
},
"renameByName": {
"friendly_name": "Room",
"Value #min": "Min (24h)",
"Value #max": "Max (24h)",
"Value #avg": "Avg (24h)"
}
}
},
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Temperature",
"renamePattern": "$1"
}
}
]
},
{
"id": 7,
"title": "Humidity History (30 Days)",
"type": "timeseries",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 24},
"datasource": {"type": "prometheus", "uid": "prometheus"},
"targets": [
{
"expr": "hass_sensor_humidity_percent",
"legendFormat": "{{friendly_name}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "never",
"spanNulls": 3600000
}
}
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom",
"calcs": ["mean", "min", "max"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "Temp (.*) Humidity",
"renamePattern": "$1"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "temp_server Humidity",
"renamePattern": "Server"
}
}
]
}
]
}

View File

@@ -0,0 +1,111 @@
{ config, pkgs, ... }:
{
services.grafana = {
enable = true;
settings = {
server = {
http_addr = "127.0.0.1";
http_port = 3000;
domain = "grafana-test.home.2rjus.net";
root_url = "https://grafana-test.home.2rjus.net/";
};
# Disable anonymous access
"auth.anonymous".enabled = false;
# OIDC authentication via Kanidm
"auth.generic_oauth" = {
enabled = true;
name = "Kanidm";
client_id = "grafana";
client_secret = "$__file{/run/secrets/grafana-oauth2}";
auth_url = "https://auth.home.2rjus.net/ui/oauth2";
token_url = "https://auth.home.2rjus.net/oauth2/token";
api_url = "https://auth.home.2rjus.net/oauth2/openid/grafana/userinfo";
scopes = "openid profile email groups";
use_pkce = true; # Required by Kanidm, more secure
# Extract user attributes from userinfo response
email_attribute_path = "email";
login_attribute_path = "preferred_username";
name_attribute_path = "name";
# Map admins group to Admin role, everyone else to Editor (for Explore access)
role_attribute_path = "contains(groups[*], 'admins') && 'Admin' || 'Editor'";
allow_sign_up = true;
};
};
# Declarative datasources pointing to monitoring01
provision.datasources.settings = {
apiVersion = 1;
datasources = [
{
name = "Prometheus";
type = "prometheus";
url = "http://monitoring01.home.2rjus.net:9090";
isDefault = true;
uid = "prometheus";
}
{
name = "Loki";
type = "loki";
url = "http://monitoring01.home.2rjus.net:3100";
uid = "loki";
}
];
};
# Declarative dashboards
provision.dashboards.settings = {
apiVersion = 1;
providers = [
{
name = "homelab";
type = "file";
options.path = ./dashboards;
disableDeletion = true;
}
];
};
};
# Vault secret for OAuth2 client secret
vault.secrets.grafana-oauth2 = {
secretPath = "services/grafana/oauth2-client-secret";
extractKey = "password";
services = [ "grafana" ];
owner = "grafana";
group = "grafana";
};
# Local Caddy for TLS termination
services.caddy = {
enable = true;
package = pkgs.unstable.caddy;
configFile = pkgs.writeText "Caddyfile" ''
{
acme_ca https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
metrics
}
grafana-test.home.2rjus.net {
log {
output file /var/log/caddy/grafana.log {
mode 644
}
}
reverse_proxy http://127.0.0.1:3000
}
http://${config.networking.hostName}.home.2rjus.net/metrics {
metrics
}
'';
};
# Expose Caddy metrics for Prometheus
homelab.monitoring.scrapeTargets = [{
job_name = "caddy";
port = 80;
}];
}

View File

@@ -78,15 +78,15 @@
# Override battery calculation using voltage (mV): (voltage - 2100) / 9 # Override battery calculation using voltage (mV): (voltage - 2100) / 9
"0x54ef441000a547bd" = { "0x54ef441000a547bd" = {
friendly_name = "0x54ef441000a547bd"; friendly_name = "0x54ef441000a547bd";
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}"; homeassistant.battery.value_template = "{{ [[(((value_json.voltage | float) - 2100) / 9) | round(0) | int, 100] | min, 0] | max }}";
}; };
"0x54ef441000a54d3c" = { "0x54ef441000a54d3c" = {
friendly_name = "0x54ef441000a54d3c"; friendly_name = "0x54ef441000a54d3c";
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}"; homeassistant.battery.value_template = "{{ [[(((value_json.voltage | float) - 2100) / 9) | round(0) | int, 100] | min, 0] | max }}";
}; };
"0x54ef441000a564b6" = { "0x54ef441000a564b6" = {
friendly_name = "temp_server"; friendly_name = "temp_server";
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}"; homeassistant.battery.value_template = "{{ [[(((value_json.voltage | float) - 2100) / 9) | round(0) | int, 100] | min, 0] | max }}";
}; };
# Other sensors # Other sensors

View File

@@ -24,12 +24,37 @@
idmAdminPasswordFile = config.vault.secrets.kanidm-idm-admin.outputDir; idmAdminPasswordFile = config.vault.secrets.kanidm-idm-admin.outputDir;
groups = { groups = {
admins = { }; # overwriteMembers = false allows imperative member management via CLI
users = { }; admins = { overwriteMembers = false; };
ssh-users = { }; users = { overwriteMembers = false; };
ssh-users = { overwriteMembers = false; };
}; };
# Regular users (persons) are managed imperatively via kanidm CLI # Regular users (persons) are managed imperatively via kanidm CLI
# OAuth2/OIDC clients for service authentication
systems.oauth2.grafana = {
displayName = "Grafana";
originUrl = "https://grafana-test.home.2rjus.net/login/generic_oauth";
originLanding = "https://grafana-test.home.2rjus.net/";
basicSecretFile = config.vault.secrets.grafana-oauth2.outputDir;
preferShortUsername = true;
scopeMaps.users = [ "openid" "profile" "email" "groups" ];
};
systems.oauth2.openbao = {
displayName = "OpenBao Secrets";
# Web UI callback only (CLI localhost not supported with confidential clients)
originUrl = "https://vault.home.2rjus.net:8200/ui/vault/auth/oidc/oidc/callback";
originLanding = "https://vault.home.2rjus.net:8200/";
basicSecretFile = config.vault.secrets.openbao-oauth2.outputDir;
preferShortUsername = true;
# Enable RS256 signing algorithm (required by OpenBao)
enableLegacyCrypto = true;
# Allow groups scope for role binding
scopeMaps.admins = [ "openid" "profile" "email" "groups" ];
scopeMaps.users = [ "openid" "profile" "email" "groups" ];
};
}; };
}; };
@@ -53,6 +78,24 @@
group = "kanidm"; group = "kanidm";
}; };
# Vault secret for Grafana OAuth2 client secret
vault.secrets.grafana-oauth2 = {
secretPath = "services/grafana/oauth2-client-secret";
extractKey = "password";
services = [ "kanidm" ];
owner = "kanidm";
group = "kanidm";
};
# Vault secret for OpenBao OAuth2 client secret
vault.secrets.openbao-oauth2 = {
secretPath = "services/openbao/oauth2-client-secret";
extractKey = "password";
services = [ "kanidm" ];
owner = "kanidm";
group = "kanidm";
};
# Note: Kanidm does not expose Prometheus metrics # Note: Kanidm does not expose Prometheus metrics
# If metrics support is added in the future, uncomment: # If metrics support is added in the future, uncomment:
# homelab.monitoring.scrapeTargets = [ # homelab.monitoring.scrapeTargets = [

View File

@@ -229,13 +229,13 @@ groups:
summary: "Mosquitto not running on {{ $labels.instance }}" summary: "Mosquitto not running on {{ $labels.instance }}"
description: "Mosquitto has been down on {{ $labels.instance }} more than 5 minutes." description: "Mosquitto has been down on {{ $labels.instance }} more than 5 minutes."
- alert: zigbee_sensor_stale - alert: zigbee_sensor_stale
expr: (time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 7200 expr: (time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 14400
for: 5m for: 5m
labels: labels:
severity: warning severity: warning
annotations: annotations:
summary: "Zigbee sensor {{ $labels.friendly_name }} is stale" summary: "Zigbee sensor {{ $labels.friendly_name }} is stale"
description: "Zigbee temperature sensor {{ $labels.entity }} has not reported data for over 2 hours. The sensor may have a dead battery or connectivity issues." description: "Zigbee temperature sensor {{ $labels.entity }} has not reported data for over 4 hours. The sensor may have a dead battery or connectivity issues."
- name: smartctl_rules - name: smartctl_rules
rules: rules:
- alert: smart_critical_warning - alert: smart_critical_warning

View File

@@ -35,9 +35,18 @@
HOMELAB = { HOMELAB = {
jetstream = "enabled"; jetstream = "enabled";
users = [ users = [
# alerttonotify (full access to HOMELAB account)
{ {
nkey = "UASLNKLWGICRTZMIXVD3RXLQ57XRIMCKBHP5V3PYFFRNO3E3BIJBCYMZ"; nkey = "UASLNKLWGICRTZMIXVD3RXLQ57XRIMCKBHP5V3PYFFRNO3E3BIJBCYMZ";
} }
# nixos-exporter (restricted to nixos-exporter subjects)
{
nkey = "UBCL3ODHVERVZJNGUJ567YBBKHQZOV3LK3WO6TVVSGQOCTK2NQ3IJVRV"; # Replace with public key from: nix develop -c nk -gen user -pubout
permissions = {
publish = [ "nixos-exporter.>" ];
subscribe = [ "nixos-exporter.>" ];
};
}
]; ];
}; };

View File

@@ -9,6 +9,7 @@
./motd.nix ./motd.nix
./packages.nix ./packages.nix
./nix.nix ./nix.nix
./pipe-to-loki.nix
./root-user.nix ./root-user.nix
./pki/root-ca.nix ./pki/root-ca.nix
./sshd.nix ./sshd.nix

View File

@@ -19,14 +19,33 @@
]; ];
}; };
# Fetch NKey from Vault for NATS authentication
vault.secrets.nixos-exporter-nkey = {
secretPath = "shared/nixos-exporter/nkey";
extractKey = "nkey";
owner = "nixos-exporter";
group = "nixos-exporter";
};
services.prometheus.exporters.nixos = { services.prometheus.exporters.nixos = {
enable = true; enable = true;
# Default port: 9971 # Default port: 9971
flake = { flake = {
enable = true; enable = true;
url = "git+https://git.t-juice.club/torjus/nixos-servers.git"; url = "git+https://git.t-juice.club/torjus/nixos-servers.git";
nats = {
enable = true;
url = "nats://nats1.home.2rjus.net:4222";
nkeySeedFile = "/run/secrets/nixos-exporter-nkey";
}; };
}; };
};
# Ensure exporter starts after Vault secret is available
systemd.services.prometheus-nixos-exporter = {
after = [ "vault-secret-nixos-exporter-nkey.service" ];
requires = [ "vault-secret-nixos-exporter-nkey.service" ];
};
# Register nixos-exporter as a Prometheus scrape target # Register nixos-exporter as a Prometheus scrape target
homelab.monitoring.scrapeTargets = [ homelab.monitoring.scrapeTargets = [

140
system/pipe-to-loki.nix Normal file
View File

@@ -0,0 +1,140 @@
{
config,
pkgs,
lib,
...
}:
let
pipe-to-loki = pkgs.writeShellApplication {
name = "pipe-to-loki";
runtimeInputs = with pkgs; [
curl
jq
util-linux
coreutils
];
text = ''
set -euo pipefail
LOKI_URL="http://monitoring01.home.2rjus.net:3100/loki/api/v1/push"
HOSTNAME=$(hostname)
SESSION_ID=""
RECORD_MODE=false
usage() {
echo "Usage: pipe-to-loki [--id ID] [--record]"
echo ""
echo "Send command output or interactive sessions to Loki."
echo ""
echo "Options:"
echo " --id ID Set custom session ID (default: auto-generated)"
echo " --record Start interactive recording session"
echo ""
echo "Examples:"
echo " command | pipe-to-loki # Pipe command output"
echo " command | pipe-to-loki --id foo # Pipe with custom ID"
echo " pipe-to-loki --record # Start recording session"
exit 1
}
generate_id() {
local random_chars
random_chars=$(head -c 2 /dev/urandom | od -An -tx1 | tr -d ' \n')
echo "''${HOSTNAME}-$(date +%s)-''${random_chars}"
}
send_to_loki() {
local content="$1"
local type="$2"
local timestamp_ns
timestamp_ns=$(date +%s%N)
local payload
payload=$(jq -n \
--arg job "pipe-to-loki" \
--arg host "$HOSTNAME" \
--arg type "$type" \
--arg id "$SESSION_ID" \
--arg ts "$timestamp_ns" \
--arg content "$content" \
'{
streams: [{
stream: {
job: $job,
host: $host,
type: $type,
id: $id
},
values: [[$ts, $content]]
}]
}')
if curl -s -X POST "$LOKI_URL" \
-H "Content-Type: application/json" \
-d "$payload" > /dev/null; then
return 0
else
echo "Error: Failed to send to Loki" >&2
return 1
fi
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--id)
SESSION_ID="$2"
shift 2
;;
--record)
RECORD_MODE=true
shift
;;
--help|-h)
usage
;;
*)
echo "Unknown option: $1" >&2
usage
;;
esac
done
# Generate ID if not provided
if [[ -z "$SESSION_ID" ]]; then
SESSION_ID=$(generate_id)
fi
if $RECORD_MODE; then
# Session recording mode
SCRIPT_FILE=$(mktemp)
trap 'rm -f "$SCRIPT_FILE"' EXIT
echo "Recording session $SESSION_ID... (exit to send)"
# Use script to record the session
script -q "$SCRIPT_FILE"
# Read the transcript and send to Loki
content=$(cat "$SCRIPT_FILE")
if send_to_loki "$content" "session"; then
echo "Session $SESSION_ID sent to Loki"
fi
else
# Pipe mode - read from stdin
if [[ -t 0 ]]; then
echo "Error: No input provided. Pipe a command or use --record for interactive mode." >&2
exit 1
fi
content=$(cat)
if send_to_loki "$content" "command"; then
echo "Sent to Loki with id: $SESSION_ID"
fi
fi
'';
};
in
{
environment.systemPackages = [ pipe-to-loki ];
}

View File

@@ -15,6 +15,17 @@ path "secret/data/shared/homelab-deploy/*" {
EOT EOT
} }
# Shared policy for nixos-exporter NATS cache sharing
resource "vault_policy" "nixos_exporter" {
name = "nixos-exporter"
policy = <<EOT
path "secret/data/shared/nixos-exporter/*" {
capabilities = ["read", "list"]
}
EOT
}
# Define host access policies # Define host access policies
locals { locals {
host_policies = { host_policies = {
@@ -89,6 +100,24 @@ locals {
] ]
} }
# kanidm01: Kanidm identity provider
"kanidm01" = {
paths = [
"secret/data/hosts/kanidm01/*",
"secret/data/kanidm/*",
"secret/data/services/grafana/*",
"secret/data/services/openbao/*",
]
}
# monitoring02: Grafana test instance
"monitoring02" = {
paths = [
"secret/data/hosts/monitoring02/*",
"secret/data/services/grafana/*",
]
}
} }
} }
@@ -114,7 +143,7 @@ resource "vault_approle_auth_backend_role" "hosts" {
backend = vault_auth_backend.approle.path backend = vault_auth_backend.approle.path
role_name = each.key role_name = each.key
token_policies = concat( token_policies = concat(
["${each.key}-policy", "homelab-deploy"], ["${each.key}-policy", "homelab-deploy", "nixos-exporter"],
lookup(each.value, "extra_policies", []) lookup(each.value, "extra_policies", [])
) )

View File

@@ -33,13 +33,6 @@ locals {
"secret/data/shared/homelab-deploy/*", "secret/data/shared/homelab-deploy/*",
] ]
} }
"kanidm01" = {
paths = [
"secret/data/hosts/kanidm01/*",
"secret/data/kanidm/*",
]
}
} }
# Placeholder secrets - user should add actual secrets manually or via tofu # Placeholder secrets - user should add actual secrets manually or via tofu
@@ -69,7 +62,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
backend = vault_auth_backend.approle.path backend = vault_auth_backend.approle.path
role_name = each.key role_name = each.key
token_policies = ["host-${each.key}", "homelab-deploy"] token_policies = ["host-${each.key}", "homelab-deploy", "nixos-exporter"]
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit) secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
token_ttl = 3600 token_ttl = 3600
token_max_ttl = 3600 token_max_ttl = 3600

50
terraform/vault/oidc.tf Normal file
View File

@@ -0,0 +1,50 @@
# OIDC authentication backend for Kanidm integration
# Web UI only - CLI localhost redirects not supported with confidential clients
resource "vault_jwt_auth_backend" "oidc" {
path = "oidc"
type = "oidc"
oidc_discovery_url = "https://auth.home.2rjus.net/oauth2/openid/openbao"
oidc_client_id = "openbao"
oidc_client_secret = random_password.auto_secrets["services/openbao/oauth2-client-secret"].result
default_role = "default"
tune {
listing_visibility = "unauth"
default_lease_ttl = "1h"
max_lease_ttl = "24h"
token_type = "default-service"
}
}
# Admin role - maps Kanidm admins group to admin policy
resource "vault_jwt_auth_backend_role" "admin" {
backend = vault_jwt_auth_backend.oidc.path
role_name = "admin"
token_policies = ["oidc-admin"]
user_claim = "preferred_username"
groups_claim = "groups"
bound_claims = { groups = "admins@home.2rjus.net" }
role_type = "oidc"
oidc_scopes = ["openid", "profile", "email", "groups"]
allowed_redirect_uris = [
"https://vault.home.2rjus.net:8200/ui/vault/auth/oidc/oidc/callback",
]
}
# Default role - any authenticated user (limited access)
resource "vault_jwt_auth_backend_role" "default" {
backend = vault_jwt_auth_backend.oidc.path
role_name = "default"
token_policies = ["oidc-default"]
user_claim = "preferred_username"
groups_claim = "groups"
role_type = "oidc"
oidc_scopes = ["openid", "profile", "email", "groups"]
allowed_redirect_uris = [
"https://vault.home.2rjus.net:8200/ui/vault/auth/oidc/oidc/callback",
]
}

View File

@@ -8,3 +8,50 @@ path "sys/metrics" {
} }
EOT EOT
} }
# OIDC admin policy - full read/write to all secrets
resource "vault_policy" "oidc_admin" {
name = "oidc-admin"
policy = <<EOT
# Full access to KV secrets
path "secret/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
# Read system health and metrics
path "sys/health" {
capabilities = ["read"]
}
path "sys/metrics" {
capabilities = ["read"]
}
# List auth methods and mounts
path "sys/auth" {
capabilities = ["read"]
}
path "sys/mounts" {
capabilities = ["read"]
}
EOT
}
# OIDC default policy - minimal access for authenticated users
resource "vault_policy" "oidc_default" {
name = "oidc-default"
policy = <<EOT
# Read own token info
path "auth/token/lookup-self" {
capabilities = ["read"]
}
# Read system health
path "sys/health" {
capabilities = ["read"]
}
EOT
}

View File

@@ -108,6 +108,24 @@ locals {
auto_generate = true auto_generate = true
password_length = 32 password_length = 32
} }
# Grafana OAuth2 client secret (for Kanidm OIDC)
"services/grafana/oauth2-client-secret" = {
auto_generate = true
password_length = 64
}
# OpenBao OAuth2 client secret (for Kanidm OIDC)
"services/openbao/oauth2-client-secret" = {
auto_generate = true
password_length = 64
}
# NKey for nixos-exporter NATS cache sharing
"shared/nixos-exporter/nkey" = {
auto_generate = false
data = { nkey = var.nixos_exporter_nkey }
}
} }
} }

View File

@@ -73,3 +73,10 @@ variable "homelab_deploy_admin_deployer_nkey" {
sensitive = true sensitive = true
} }
variable "nixos_exporter_nkey" {
description = "NKey seed for nixos-exporter NATS authentication"
type = string
default = "PLACEHOLDER"
sensitive = true
}

View File

@@ -79,6 +79,13 @@ locals {
disk_size = "20G" disk_size = "20G"
vault_wrapped_token = "s.OOqjEECeIV7dNgCS6jNmyY3K" vault_wrapped_token = "s.OOqjEECeIV7dNgCS6jNmyY3K"
} }
"monitoring02" = {
ip = "10.69.13.24/24"
cpu_cores = 4
memory = 4096
disk_size = "60G"
vault_wrapped_token = "s.uXpdoGxHXpWvTsGbHkZuq1jF"
}
} }
# Compute VM configurations with defaults applied # Compute VM configurations with defaults applied