Compare commits
16 Commits
95a96b2192
...
jellyfin-m
| Author | SHA1 | Date | |
|---|---|---|---|
|
16ef202530
|
|||
|
5f3508a6d4
|
|||
|
2ca2509083
|
|||
|
58702bd10b
|
|||
|
c9f47acb01
|
|||
|
09ce018fb2
|
|||
| 3042803c4d | |||
|
1e7200b494
|
|||
|
eec1e374b2
|
|||
|
fcc410afad
|
|||
|
59f0c7ceda
|
|||
| d713f06c6e | |||
|
7374d1ff7f
|
|||
| e912c75b6c | |||
|
b218b4f8bc
|
|||
|
65acf13e6f
|
@@ -73,6 +73,7 @@ Additional context, caveats, or references.
|
|||||||
- **Reference existing patterns**: Mention how this fits with existing infrastructure
|
- **Reference existing patterns**: Mention how this fits with existing infrastructure
|
||||||
- **Tables for comparisons**: Use markdown tables when comparing options
|
- **Tables for comparisons**: Use markdown tables when comparing options
|
||||||
- **Practical focus**: Emphasize what needs to happen, not theory
|
- **Practical focus**: Emphasize what needs to happen, not theory
|
||||||
|
- **Mermaid diagrams**: Use mermaid code blocks for architecture diagrams, flow charts, or other graphs when relevant to the plan. Keep node labels short and use `<br/>` for line breaks
|
||||||
|
|
||||||
## Examples of Good Plans
|
## Examples of Good Plans
|
||||||
|
|
||||||
|
|||||||
@@ -20,9 +20,9 @@ Hosts to migrate:
|
|||||||
| http-proxy | Stateless | Reverse proxy, recreate |
|
| http-proxy | Stateless | Reverse proxy, recreate |
|
||||||
| nats1 | Stateless | Messaging, recreate |
|
| nats1 | Stateless | Messaging, recreate |
|
||||||
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||||
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
| ~~monitoring01~~ | ~~Decommission~~ | ✓ Complete — replaced by monitoring02 (VictoriaMetrics) |
|
||||||
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
||||||
| pgdb1 | Decommission | Only used by Open WebUI on gunter, migrating to local postgres |
|
| ~~pgdb1~~ | ~~Decommission~~ | ✓ Complete |
|
||||||
| ~~jump~~ | ~~Decommission~~ | ✓ Complete |
|
| ~~jump~~ | ~~Decommission~~ | ✓ Complete |
|
||||||
| ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
|
| ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
|
||||||
| ~~ca~~ | ~~Deferred~~ | ✓ Complete |
|
| ~~ca~~ | ~~Deferred~~ | ✓ Complete |
|
||||||
@@ -31,10 +31,12 @@ Hosts to migrate:
|
|||||||
|
|
||||||
Before migrating any stateful host, ensure restic backups are in place and verified.
|
Before migrating any stateful host, ensure restic backups are in place and verified.
|
||||||
|
|
||||||
### 1a. Expand monitoring01 Grafana Backup
|
### ~~1a. Expand monitoring01 Grafana Backup~~ ✓ N/A
|
||||||
|
|
||||||
The existing backup only covers `/var/lib/grafana/plugins` and a sqlite dump of `grafana.db`.
|
~~The existing backup only covers `/var/lib/grafana/plugins` and a sqlite dump of `grafana.db`.
|
||||||
Expand to back up all of `/var/lib/grafana/` to capture config directory and any other state.
|
Expand to back up all of `/var/lib/grafana/` to capture config directory and any other state.~~
|
||||||
|
|
||||||
|
No longer needed — monitoring01 decommissioned, replaced by monitoring02 with declarative Grafana dashboards.
|
||||||
|
|
||||||
### 1b. Add Jellyfin Backup to jelly01
|
### 1b. Add Jellyfin Backup to jelly01
|
||||||
|
|
||||||
@@ -94,15 +96,17 @@ For each stateful host, the procedure is:
|
|||||||
7. Start services and verify functionality
|
7. Start services and verify functionality
|
||||||
8. Decommission the old VM
|
8. Decommission the old VM
|
||||||
|
|
||||||
### 3a. monitoring01
|
### 3a. monitoring01 ✓ COMPLETE
|
||||||
|
|
||||||
1. Run final Grafana backup
|
~~1. Run final Grafana backup~~
|
||||||
2. Provision new monitoring01 via OpenTofu
|
~~2. Provision new monitoring01 via OpenTofu~~
|
||||||
3. After bootstrap, restore `/var/lib/grafana/` from restic
|
~~3. After bootstrap, restore `/var/lib/grafana/` from restic~~
|
||||||
4. Restart Grafana, verify dashboards and datasources are intact
|
~~4. Restart Grafana, verify dashboards and datasources are intact~~
|
||||||
5. Prometheus and Loki start fresh with empty data (acceptable)
|
~~5. Prometheus and Loki start fresh with empty data (acceptable)~~
|
||||||
6. Verify all scrape targets are being collected
|
~~6. Verify all scrape targets are being collected~~
|
||||||
7. Decommission old VM
|
~~7. Decommission old VM~~
|
||||||
|
|
||||||
|
Replaced by monitoring02 with VictoriaMetrics, standalone Loki and Grafana modules. Host configuration, old service modules, and terraform resources removed.
|
||||||
|
|
||||||
### 3b. jelly01
|
### 3b. jelly01
|
||||||
|
|
||||||
@@ -163,19 +167,19 @@ Host was already removed from flake.nix and VM destroyed. Configuration cleaned
|
|||||||
|
|
||||||
Host configuration, services, and VM already removed.
|
Host configuration, services, and VM already removed.
|
||||||
|
|
||||||
### pgdb1 (in progress)
|
### pgdb1 ✓ COMPLETE
|
||||||
|
|
||||||
Only consumer was Open WebUI on gunter, which has been migrated to use local PostgreSQL.
|
~~Only consumer was Open WebUI on gunter, which has been migrated to use local PostgreSQL.~~
|
||||||
|
|
||||||
1. ~~Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)~~ ✓
|
~~1. Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)~~
|
||||||
2. ~~Remove host configuration from `hosts/pgdb1/`~~ ✓
|
~~2. Remove host configuration from `hosts/pgdb1/`~~
|
||||||
3. ~~Remove `services/postgres/` (only used by pgdb1)~~ ✓
|
~~3. Remove `services/postgres/` (only used by pgdb1)~~
|
||||||
4. ~~Remove from `flake.nix`~~ ✓
|
~~4. Remove from `flake.nix`~~
|
||||||
5. ~~Remove Vault AppRole from `terraform/vault/approle.tf`~~ ✓
|
~~5. Remove Vault AppRole from `terraform/vault/approle.tf`~~
|
||||||
6. Destroy the VM in Proxmox
|
~~6. Destroy the VM in Proxmox~~
|
||||||
7. ~~Commit cleanup~~ ✓
|
~~7. Commit cleanup~~
|
||||||
|
|
||||||
See `docs/plans/pgdb1-decommission.md` for detailed plan.
|
Host configuration, services, terraform resources, and VM removed. See `docs/plans/pgdb1-decommission.md` for detailed plan.
|
||||||
|
|
||||||
## Phase 5: Decommission ca Host ✓ COMPLETE
|
## Phase 5: Decommission ca Host ✓ COMPLETE
|
||||||
|
|
||||||
|
|||||||
@@ -24,29 +24,20 @@ After evaluating WireGuard gateway vs Headscale (self-hosted Tailscale), the **W
|
|||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```mermaid
|
||||||
┌─────────────────────────────────┐
|
graph TD
|
||||||
│ VPS (OpenStack) │
|
clients["Laptop / Phone"]
|
||||||
Laptop/Phone ──→ │ WireGuard endpoint │
|
vps["VPS<br/>(WireGuard endpoint)"]
|
||||||
(WireGuard) │ Client peers: laptop, phone │
|
extgw["extgw01<br/>(gateway + bastion)"]
|
||||||
│ Routes 10.69.13.0/24 via tunnel│
|
grafana["Grafana<br/>monitoring01:3000"]
|
||||||
└──────────┬──────────────────────┘
|
jellyfin["Jellyfin<br/>jelly01:8096"]
|
||||||
│ WireGuard tunnel
|
arr["arr stack<br/>*-jail hosts"]
|
||||||
▼
|
|
||||||
┌─────────────────────────────────┐
|
clients -->|WireGuard| vps
|
||||||
│ extgw01 (gateway + bastion) │
|
vps -->|WireGuard tunnel| extgw
|
||||||
│ - WireGuard tunnel to VPS │
|
extgw -->|allowed traffic| grafana
|
||||||
│ - Firewall (allowlist only) │
|
extgw -->|allowed traffic| jellyfin
|
||||||
│ - SSH + 2FA (full access) │
|
extgw -->|allowed traffic| arr
|
||||||
└──────────┬──────────────────────┘
|
|
||||||
│ allowed traffic only
|
|
||||||
▼
|
|
||||||
┌─────────────────────────────────┐
|
|
||||||
│ Internal network 10.69.13.0/24 │
|
|
||||||
│ - monitoring01:3000 (Grafana) │
|
|
||||||
│ - jelly01:8096 (Jellyfin) │
|
|
||||||
│ - *-jail hosts (arr stack) │
|
|
||||||
└─────────────────────────────────┘
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Existing path (unchanged)
|
### Existing path (unchanged)
|
||||||
|
|||||||
@@ -39,23 +39,17 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway
|
|||||||
- nzbget: NixOS service or OCI container
|
- nzbget: NixOS service or OCI container
|
||||||
- NFS exports: `services.nfs.server`
|
- NFS exports: `services.nfs.server`
|
||||||
|
|
||||||
### Filesystem: BTRFS RAID1
|
### Filesystem: Keep ZFS
|
||||||
|
|
||||||
**Decision**: Migrate from ZFS to BTRFS with RAID1
|
**Decision**: Keep existing ZFS pool, import on NixOS
|
||||||
|
|
||||||
**Rationale**:
|
**Rationale**:
|
||||||
- **In-kernel**: No out-of-tree module issues like ZFS
|
- **No data migration needed**: Existing ZFS pool can be imported directly on NixOS
|
||||||
- **Flexible expansion**: Add individual disks, not required to buy pairs
|
- **Proven reliability**: Pool has been running reliably on TrueNAS
|
||||||
- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach
|
- **NixOS ZFS support**: Well-supported, declarative configuration via `boot.zfs` and `services.zfs`
|
||||||
- **RAID level conversion**: Can convert between RAID levels in place
|
- **BTRFS RAID5/6 unreliable**: Research showed BTRFS RAID5/6 write hole is still unresolved
|
||||||
- Built-in checksumming, snapshots, compression (zstd)
|
- **BTRFS RAID1 wasteful**: With mixed disk sizes, RAID1 wastes significant capacity vs ZFS mirrors
|
||||||
- NixOS has good BTRFS support
|
- Checksumming, snapshots, compression (lz4/zstd) all available
|
||||||
|
|
||||||
**BTRFS RAID1 notes**:
|
|
||||||
- "RAID1" means 2 copies of all data
|
|
||||||
- Distributes across all available devices
|
|
||||||
- With 6+ disks, provides redundancy + capacity scaling
|
|
||||||
- RAID5/6 avoided (known issues), RAID1/10 are stable
|
|
||||||
|
|
||||||
### Hardware: Keep Existing + Add Disks
|
### Hardware: Keep Existing + Add Disks
|
||||||
|
|
||||||
@@ -69,83 +63,94 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway
|
|||||||
|
|
||||||
**Storage architecture**:
|
**Storage architecture**:
|
||||||
|
|
||||||
**Bulk storage** (BTRFS RAID1 on HDDs):
|
**hdd-pool** (ZFS mirrors):
|
||||||
- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB)
|
- Current: 3 mirror vdevs (2x16TB + 2x8TB + 2x8TB) = 32TB usable
|
||||||
- Add: 2x new HDDs (size TBD)
|
- Add: mirror-3 with 2x 24TB = +24TB usable
|
||||||
|
- Total after expansion: ~56TB usable
|
||||||
- Use: Media, downloads, backups, non-critical data
|
- Use: Media, downloads, backups, non-critical data
|
||||||
- Risk tolerance: High (data mostly replaceable)
|
|
||||||
|
|
||||||
**Critical data** (small volume):
|
|
||||||
- Use 2x 240GB SSDs in mirror (BTRFS or ZFS)
|
|
||||||
- Or use 2TB NVMe for critical data
|
|
||||||
- Risk tolerance: Low (data important but small)
|
|
||||||
|
|
||||||
### Disk Purchase Decision
|
### Disk Purchase Decision
|
||||||
|
|
||||||
**Options under consideration**:
|
**Decision**: 2x 24TB drives (ordered, arriving 2026-02-21)
|
||||||
|
|
||||||
**Option A: 2x 16TB drives**
|
|
||||||
- Matches largest current drives
|
|
||||||
- Enables potential future RAID5 if desired (6x 16TB array)
|
|
||||||
- More conservative capacity increase
|
|
||||||
|
|
||||||
**Option B: 2x 20-24TB drives**
|
|
||||||
- Larger capacity headroom
|
|
||||||
- Better $/TB ratio typically
|
|
||||||
- Future-proofs better
|
|
||||||
|
|
||||||
**Initial purchase**: 2 drives (chassis has space for 2 more without modifications)
|
|
||||||
|
|
||||||
## Migration Strategy
|
## Migration Strategy
|
||||||
|
|
||||||
### High-Level Plan
|
### High-Level Plan
|
||||||
|
|
||||||
1. **Preparation**:
|
1. **Expand ZFS pool** (on TrueNAS):
|
||||||
- Purchase 2x new HDDs (16TB or 20-24TB)
|
- Install 2x 24TB drives (may need new drive trays - order from abroad if needed)
|
||||||
- Create NixOS configuration for new storage host
|
- If chassis space is limited, temporarily replace the two oldest 8TB drives (da0/ada4)
|
||||||
- Set up bare metal NixOS installation
|
- Add as mirror-3 vdev to hdd-pool
|
||||||
|
- Verify pool health and resilver completes
|
||||||
|
- Check SMART data on old 8TB drives (all healthy as of 2026-02-20, no reallocated sectors)
|
||||||
|
- Burn-in: at minimum short + long SMART test before adding to pool
|
||||||
|
|
||||||
2. **Initial BTRFS pool**:
|
2. **Prepare NixOS configuration**:
|
||||||
- Install 2 new disks
|
- Create host configuration (`hosts/nas1/` or similar)
|
||||||
- Create BTRFS filesystem in RAID1
|
- Configure ZFS pool import (`boot.zfs.extraPools`)
|
||||||
- Mount and test NFS exports
|
- Set up services: radarr, sonarr, nzbget, restic-rest, NFS
|
||||||
|
- Configure monitoring (node-exporter, promtail, smartctl-exporter)
|
||||||
|
|
||||||
3. **Data migration**:
|
3. **Install NixOS**:
|
||||||
- Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE
|
- `zfs export hdd-pool` on TrueNAS before shutdown (clean export)
|
||||||
- Verify data integrity
|
- Wipe TrueNAS boot-pool SSDs, set up as mdadm RAID1 for NixOS root
|
||||||
|
- Install NixOS on mdadm mirror (keeps boot path ZFS-independent)
|
||||||
|
- Import hdd-pool via `boot.zfs.extraPools`
|
||||||
|
- Verify all datasets mount correctly
|
||||||
|
|
||||||
4. **Expand pool**:
|
4. **Service migration**:
|
||||||
- As old ZFS pool is emptied, wipe drives and add to BTRFS pool
|
- Configure NixOS services to use ZFS dataset paths
|
||||||
- Pool grows incrementally: 2 → 4 → 6 → 8 disks
|
- Update NFS exports
|
||||||
- BTRFS rebalances data across new devices
|
- Test from consuming hosts
|
||||||
|
|
||||||
5. **Service migration**:
|
5. **Cutover**:
|
||||||
- Set up radarr/sonarr/nzbget/restic as NixOS services
|
- Update DNS/client mounts if IP changes
|
||||||
- Update NFS client mounts on consuming hosts
|
- Verify monitoring integration
|
||||||
|
|
||||||
6. **Cutover**:
|
|
||||||
- Point consumers to new NAS host
|
|
||||||
- Decommission TrueNAS
|
- Decommission TrueNAS
|
||||||
- Repurpose hardware or keep as spare
|
|
||||||
|
### Post-Expansion: Vdev Rebalancing
|
||||||
|
|
||||||
|
ZFS has no built-in rebalance command. After adding the new 24TB vdev, ZFS will
|
||||||
|
write new data preferentially to it (most free space), leaving old vdevs packed
|
||||||
|
at ~97%. This is suboptimal but not urgent once overall pool usage drops to ~50%.
|
||||||
|
|
||||||
|
To gradually rebalance, rewrite files in place so ZFS redistributes blocks across
|
||||||
|
all vdevs proportional to free space:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Rewrite files individually (spreads blocks across all vdevs)
|
||||||
|
find /pool/dataset -type f -exec sh -c '
|
||||||
|
for f; do cp "$f" "$f.rebal" && mv "$f.rebal" "$f"; done
|
||||||
|
' _ {} +
|
||||||
|
```
|
||||||
|
|
||||||
|
Avoid `zfs send/recv` for large datasets (e.g. 20TB) as this would concentrate
|
||||||
|
data on the emptiest vdev rather than spreading it evenly.
|
||||||
|
|
||||||
|
**Recommendation**: Do this after NixOS migration is stable. Not urgent - the pool
|
||||||
|
will function fine with uneven distribution, just slightly suboptimal for performance.
|
||||||
|
|
||||||
### Migration Advantages
|
### Migration Advantages
|
||||||
|
|
||||||
- **Low risk**: New pool created independently, old data remains intact during migration
|
- **No data migration**: ZFS pool imported directly, no copying terabytes of data
|
||||||
- **Incremental**: Can add old disks one at a time as space allows
|
- **Low risk**: Pool expansion done on stable TrueNAS before OS swap
|
||||||
- **Flexible**: BTRFS handles mixed disk sizes gracefully
|
- **Reversible**: Can boot back to TrueNAS if NixOS has issues (ZFS pool is OS-independent)
|
||||||
- **Reversible**: Keep TrueNAS running until fully validated
|
- **Quick cutover**: Once NixOS config is ready, the OS swap is fast
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
1. Decide on disk size (16TB vs 20-24TB)
|
1. ~~Decide on disk size~~ - 2x 24TB ordered
|
||||||
2. Purchase disks
|
2. Install drives and add mirror vdev to ZFS pool
|
||||||
3. Design NixOS host configuration (`hosts/nas1/`)
|
3. Check SMART data on 8TB drives - decide whether to keep or retire
|
||||||
4. Plan detailed migration timeline
|
4. Design NixOS host configuration (`hosts/nas1/`)
|
||||||
5. Document NFS export mapping (current → new)
|
5. Document NFS export mapping (current -> new)
|
||||||
|
6. Plan NixOS installation and cutover
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
- [ ] Final decision on disk size?
|
|
||||||
- [ ] Hostname for new NAS host? (nas1? storage1?)
|
- [ ] Hostname for new NAS host? (nas1? storage1?)
|
||||||
- [ ] IP address allocation (keep 10.69.12.50 or new IP?)
|
- [ ] IP address/subnet: NAS and Proxmox are both on 10GbE to the same switch but different subnets, forcing traffic through the router (bottleneck). Move to same subnet during migration.
|
||||||
- [ ] Timeline/maintenance window for migration?
|
- [x] Boot drive: Reuse TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root (no ZFS on boot path)
|
||||||
|
- [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed)
|
||||||
|
- [ ] Drive trays: do new 24TB drives fit, or order trays from abroad?
|
||||||
|
- [ ] Timeline/maintenance window for NixOS swap?
|
||||||
|
|||||||
20
flake.lock
generated
20
flake.lock
generated
@@ -28,11 +28,11 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1771004123,
|
"lastModified": 1771488195,
|
||||||
"narHash": "sha256-Jw36EzL4IGIc2TmeZGphAAUrJXoWqfvCbybF8bTHgMA=",
|
"narHash": "sha256-2kMxqdDyPluRQRoES22Y0oSjp7pc5fj2nRterfmSIyc=",
|
||||||
"ref": "master",
|
"ref": "master",
|
||||||
"rev": "e5e8be86ecdcae8a5962ba3bddddfe91b574792b",
|
"rev": "2d26de50559d8acb82ea803764e138325d95572c",
|
||||||
"revCount": 36,
|
"revCount": 37,
|
||||||
"type": "git",
|
"type": "git",
|
||||||
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
||||||
},
|
},
|
||||||
@@ -64,11 +64,11 @@
|
|||||||
},
|
},
|
||||||
"nixpkgs": {
|
"nixpkgs": {
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1771043024,
|
"lastModified": 1771419570,
|
||||||
"narHash": "sha256-O1XDr7EWbRp+kHrNNgLWgIrB0/US5wvw9K6RERWAj6I=",
|
"narHash": "sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU=",
|
||||||
"owner": "nixos",
|
"owner": "nixos",
|
||||||
"repo": "nixpkgs",
|
"repo": "nixpkgs",
|
||||||
"rev": "3aadb7ca9eac2891d52a9dec199d9580a6e2bf44",
|
"rev": "6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47",
|
||||||
"type": "github"
|
"type": "github"
|
||||||
},
|
},
|
||||||
"original": {
|
"original": {
|
||||||
@@ -80,11 +80,11 @@
|
|||||||
},
|
},
|
||||||
"nixpkgs-unstable": {
|
"nixpkgs-unstable": {
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1771008912,
|
"lastModified": 1771369470,
|
||||||
"narHash": "sha256-gf2AmWVTs8lEq7z/3ZAsgnZDhWIckkb+ZnAo5RzSxJg=",
|
"narHash": "sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ=",
|
||||||
"owner": "nixos",
|
"owner": "nixos",
|
||||||
"repo": "nixpkgs",
|
"repo": "nixpkgs",
|
||||||
"rev": "a82ccc39b39b621151d6732718e3e250109076fa",
|
"rev": "0182a361324364ae3f436a63005877674cf45efb",
|
||||||
"type": "github"
|
"type": "github"
|
||||||
},
|
},
|
||||||
"original": {
|
"original": {
|
||||||
|
|||||||
@@ -25,7 +25,7 @@
|
|||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
timeout = 7200;
|
timeout = 14400;
|
||||||
metrics.enable = true;
|
metrics.enable = true;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -19,7 +19,7 @@
|
|||||||
"title": "SSH Connections",
|
"title": "SSH Connections",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
|
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(oubliette_ssh_connections_total{job=\"apiary\"})",
|
"expr": "sum(oubliette_ssh_connections_total{job=\"apiary\"})",
|
||||||
@@ -51,7 +51,7 @@
|
|||||||
"title": "Active Sessions",
|
"title": "Active Sessions",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
|
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "oubliette_sessions_active{job=\"apiary\"}",
|
"expr": "oubliette_sessions_active{job=\"apiary\"}",
|
||||||
@@ -86,7 +86,7 @@
|
|||||||
"title": "Unique IPs",
|
"title": "Unique IPs",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
|
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "oubliette_storage_unique_ips{job=\"apiary\"}",
|
"expr": "oubliette_storage_unique_ips{job=\"apiary\"}",
|
||||||
@@ -118,7 +118,7 @@
|
|||||||
"title": "Total Login Attempts",
|
"title": "Total Login Attempts",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
|
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "oubliette_storage_login_attempts_total{job=\"apiary\"}",
|
"expr": "oubliette_storage_login_attempts_total{job=\"apiary\"}",
|
||||||
@@ -150,7 +150,7 @@
|
|||||||
"title": "SSH Connections Over Time",
|
"title": "SSH Connections Over Time",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"interval": "60s",
|
"interval": "60s",
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@@ -183,7 +183,7 @@
|
|||||||
"title": "Auth Attempts Over Time",
|
"title": "Auth Attempts Over Time",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"interval": "60s",
|
"interval": "60s",
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@@ -216,7 +216,7 @@
|
|||||||
"title": "Sessions by Shell",
|
"title": "Sessions by Shell",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"interval": "60s",
|
"interval": "60s",
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@@ -249,7 +249,7 @@
|
|||||||
"title": "Attempts by Country",
|
"title": "Attempts by Country",
|
||||||
"type": "geomap",
|
"type": "geomap",
|
||||||
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 12},
|
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 12},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "oubliette_auth_attempts_by_country_total{job=\"apiary\"}",
|
"expr": "oubliette_auth_attempts_by_country_total{job=\"apiary\"}",
|
||||||
@@ -318,7 +318,7 @@
|
|||||||
"title": "Session Duration Distribution",
|
"title": "Session Duration Distribution",
|
||||||
"type": "heatmap",
|
"type": "heatmap",
|
||||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 30},
|
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 30},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"interval": "60s",
|
"interval": "60s",
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@@ -359,7 +359,7 @@
|
|||||||
"title": "Commands Executed by Shell",
|
"title": "Commands Executed by Shell",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"interval": "60s",
|
"interval": "60s",
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -16,7 +16,7 @@
|
|||||||
"title": "Endpoints Monitored",
|
"title": "Endpoints Monitored",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"})",
|
"expr": "count(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"})",
|
||||||
@@ -48,7 +48,7 @@
|
|||||||
"title": "Probe Failures",
|
"title": "Probe Failures",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(probe_success{job=\"blackbox_tls\"} == 0) or vector(0)",
|
"expr": "count(probe_success{job=\"blackbox_tls\"} == 0) or vector(0)",
|
||||||
@@ -82,7 +82,7 @@
|
|||||||
"title": "Expiring Soon (< 7d)",
|
"title": "Expiring Soon (< 7d)",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) < 86400 * 7) or vector(0)",
|
"expr": "count((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) < 86400 * 7) or vector(0)",
|
||||||
@@ -116,7 +116,7 @@
|
|||||||
"title": "Expiring Critical (< 24h)",
|
"title": "Expiring Critical (< 24h)",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) < 86400) or vector(0)",
|
"expr": "count((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) < 86400) or vector(0)",
|
||||||
@@ -150,7 +150,7 @@
|
|||||||
"title": "Minimum Days Remaining",
|
"title": "Minimum Days Remaining",
|
||||||
"type": "gauge",
|
"type": "gauge",
|
||||||
"gridPos": {"h": 4, "w": 8, "x": 16, "y": 0},
|
"gridPos": {"h": 4, "w": 8, "x": 16, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "min((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400)",
|
"expr": "min((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400)",
|
||||||
@@ -187,7 +187,7 @@
|
|||||||
"title": "Certificate Expiry by Endpoint",
|
"title": "Certificate Expiry by Endpoint",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 12, "w": 12, "x": 0, "y": 4},
|
"gridPos": {"h": 12, "w": 12, "x": 0, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400",
|
"expr": "(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400",
|
||||||
@@ -253,7 +253,7 @@
|
|||||||
"title": "Probe Status",
|
"title": "Probe Status",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 12, "w": 12, "x": 12, "y": 4},
|
"gridPos": {"h": 12, "w": 12, "x": 12, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "probe_success{job=\"blackbox_tls\"}",
|
"expr": "probe_success{job=\"blackbox_tls\"}",
|
||||||
@@ -340,7 +340,7 @@
|
|||||||
"title": "Certificate Expiry Over Time",
|
"title": "Certificate Expiry Over Time",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 16},
|
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 16},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400",
|
"expr": "(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400",
|
||||||
@@ -378,7 +378,7 @@
|
|||||||
"title": "Probe Success Rate",
|
"title": "Probe Success Rate",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 24},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 24},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "avg(probe_success{job=\"blackbox_tls\"}) * 100",
|
"expr": "avg(probe_success{job=\"blackbox_tls\"}) * 100",
|
||||||
@@ -418,7 +418,7 @@
|
|||||||
"title": "Probe Duration",
|
"title": "Probe Duration",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 24},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 24},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "probe_duration_seconds{job=\"blackbox_tls\"}",
|
"expr": "probe_duration_seconds{job=\"blackbox_tls\"}",
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
{
|
{
|
||||||
"name": "tier",
|
"name": "tier",
|
||||||
"type": "query",
|
"type": "query",
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"query": "label_values(nixos_flake_info, tier)",
|
"query": "label_values(nixos_flake_info, tier)",
|
||||||
"refresh": 2,
|
"refresh": 2,
|
||||||
"includeAll": true,
|
"includeAll": true,
|
||||||
@@ -30,7 +30,7 @@
|
|||||||
"title": "Hosts Behind Remote",
|
"title": "Hosts Behind Remote",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 1)",
|
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 1)",
|
||||||
@@ -65,7 +65,7 @@
|
|||||||
"title": "Hosts Needing Reboot",
|
"title": "Hosts Needing Reboot",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(nixos_config_mismatch{tier=~\"$tier\"} == 1)",
|
"expr": "count(nixos_config_mismatch{tier=~\"$tier\"} == 1)",
|
||||||
@@ -100,7 +100,7 @@
|
|||||||
"title": "Total Hosts",
|
"title": "Total Hosts",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 3, "x": 8, "y": 0},
|
"gridPos": {"h": 4, "w": 3, "x": 8, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(nixos_flake_info{tier=~\"$tier\"})",
|
"expr": "count(nixos_flake_info{tier=~\"$tier\"})",
|
||||||
@@ -128,7 +128,7 @@
|
|||||||
"title": "Nixpkgs Age",
|
"title": "Nixpkgs Age",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 3, "x": 11, "y": 0},
|
"gridPos": {"h": 4, "w": 3, "x": 11, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "max(nixos_flake_input_age_seconds{input=\"nixpkgs\", tier=~\"$tier\"})",
|
"expr": "max(nixos_flake_input_age_seconds{input=\"nixpkgs\", tier=~\"$tier\"})",
|
||||||
@@ -163,7 +163,7 @@
|
|||||||
"title": "Hosts Up-to-date",
|
"title": "Hosts Up-to-date",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 3, "x": 14, "y": 0},
|
"gridPos": {"h": 4, "w": 3, "x": 14, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 0)",
|
"expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 0)",
|
||||||
@@ -192,7 +192,7 @@
|
|||||||
"title": "Deployments (24h)",
|
"title": "Deployments (24h)",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 3, "x": 17, "y": 0},
|
"gridPos": {"h": 4, "w": 3, "x": 17, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(increase(homelab_deploy_deployments_total{status=\"completed\"}[24h]))",
|
"expr": "sum(increase(homelab_deploy_deployments_total{status=\"completed\"}[24h]))",
|
||||||
@@ -222,7 +222,7 @@
|
|||||||
"title": "Avg Deploy Time",
|
"title": "Avg Deploy Time",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(increase(homelab_deploy_deployment_duration_seconds_sum{success=\"true\"}[24h])) / sum(increase(homelab_deploy_deployment_duration_seconds_count{success=\"true\"}[24h]))",
|
"expr": "sum(increase(homelab_deploy_deployment_duration_seconds_sum{success=\"true\"}[24h])) / sum(increase(homelab_deploy_deployment_duration_seconds_count{success=\"true\"}[24h]))",
|
||||||
@@ -256,7 +256,7 @@
|
|||||||
"title": "Fleet Status",
|
"title": "Fleet Status",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
|
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "nixos_flake_info{tier=~\"$tier\"}",
|
"expr": "nixos_flake_info{tier=~\"$tier\"}",
|
||||||
@@ -430,7 +430,7 @@
|
|||||||
"title": "Generation Age by Host",
|
"title": "Generation Age by Host",
|
||||||
"type": "bargauge",
|
"type": "bargauge",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sort_desc(nixos_generation_age_seconds{tier=~\"$tier\"})",
|
"expr": "sort_desc(nixos_generation_age_seconds{tier=~\"$tier\"})",
|
||||||
@@ -467,7 +467,7 @@
|
|||||||
"title": "Generations per Host",
|
"title": "Generations per Host",
|
||||||
"type": "bargauge",
|
"type": "bargauge",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sort_desc(nixos_generation_count{tier=~\"$tier\"})",
|
"expr": "sort_desc(nixos_generation_count{tier=~\"$tier\"})",
|
||||||
@@ -501,7 +501,7 @@
|
|||||||
"title": "Deployment Activity (Generation Age Over Time)",
|
"title": "Deployment Activity (Generation Age Over Time)",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 22},
|
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 22},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "nixos_generation_age_seconds{tier=~\"$tier\"}",
|
"expr": "nixos_generation_age_seconds{tier=~\"$tier\"}",
|
||||||
@@ -534,7 +534,7 @@
|
|||||||
"title": "Flake Input Ages",
|
"title": "Flake Input Ages",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
|
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "max by (input) (nixos_flake_input_age_seconds)",
|
"expr": "max by (input) (nixos_flake_input_age_seconds)",
|
||||||
@@ -577,7 +577,7 @@
|
|||||||
"title": "Hosts by Revision",
|
"title": "Hosts by Revision",
|
||||||
"type": "piechart",
|
"type": "piechart",
|
||||||
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 30},
|
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 30},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count by (current_rev) (nixos_flake_info{tier=~\"$tier\"})",
|
"expr": "count by (current_rev) (nixos_flake_info{tier=~\"$tier\"})",
|
||||||
@@ -601,7 +601,7 @@
|
|||||||
"title": "Hosts by Tier",
|
"title": "Hosts by Tier",
|
||||||
"type": "piechart",
|
"type": "piechart",
|
||||||
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 30},
|
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 30},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count by (tier) (nixos_flake_info)",
|
"expr": "count by (tier) (nixos_flake_info)",
|
||||||
@@ -641,7 +641,7 @@
|
|||||||
"title": "Builds (24h)",
|
"title": "Builds (24h)",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 37},
|
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 37},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(increase(homelab_deploy_build_host_total{status=\"success\"}[24h]))",
|
"expr": "sum(increase(homelab_deploy_build_host_total{status=\"success\"}[24h]))",
|
||||||
@@ -671,7 +671,7 @@
|
|||||||
"title": "Failed Builds (24h)",
|
"title": "Failed Builds (24h)",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 37},
|
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 37},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(increase(homelab_deploy_build_host_total{status=\"failure\"}[24h])) or vector(0)",
|
"expr": "sum(increase(homelab_deploy_build_host_total{status=\"failure\"}[24h])) or vector(0)",
|
||||||
@@ -705,7 +705,7 @@
|
|||||||
"title": "Last Build",
|
"title": "Last Build",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 37},
|
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 37},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "time() - max(homelab_deploy_build_last_timestamp)",
|
"expr": "time() - max(homelab_deploy_build_last_timestamp)",
|
||||||
@@ -739,7 +739,7 @@
|
|||||||
"title": "Avg Build Time",
|
"title": "Avg Build Time",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 37},
|
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 37},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(increase(homelab_deploy_build_duration_seconds_sum[24h])) / sum(increase(homelab_deploy_build_duration_seconds_count[24h]))",
|
"expr": "sum(increase(homelab_deploy_build_duration_seconds_sum[24h])) / sum(increase(homelab_deploy_build_duration_seconds_count[24h]))",
|
||||||
@@ -773,7 +773,7 @@
|
|||||||
"title": "Total Hosts Built",
|
"title": "Total Hosts Built",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 37},
|
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 37},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(homelab_deploy_build_duration_seconds_count)",
|
"expr": "count(homelab_deploy_build_duration_seconds_count)",
|
||||||
@@ -802,7 +802,7 @@
|
|||||||
"title": "Build Jobs (24h)",
|
"title": "Build Jobs (24h)",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 37},
|
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 37},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(increase(homelab_deploy_builds_total[24h]))",
|
"expr": "sum(increase(homelab_deploy_builds_total[24h]))",
|
||||||
@@ -832,7 +832,7 @@
|
|||||||
"title": "Build Time by Host",
|
"title": "Build Time by Host",
|
||||||
"type": "bargauge",
|
"type": "bargauge",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 41},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 41},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sort_desc(homelab_deploy_build_duration_seconds_sum / homelab_deploy_build_duration_seconds_count)",
|
"expr": "sort_desc(homelab_deploy_build_duration_seconds_sum / homelab_deploy_build_duration_seconds_count)",
|
||||||
@@ -869,7 +869,7 @@
|
|||||||
"title": "Build Count by Host",
|
"title": "Build Count by Host",
|
||||||
"type": "bargauge",
|
"type": "bargauge",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 41},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 41},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sort_desc(sum by (host) (homelab_deploy_build_host_total))",
|
"expr": "sort_desc(sum by (host) (homelab_deploy_build_host_total))",
|
||||||
@@ -903,7 +903,7 @@
|
|||||||
"title": "Build Activity",
|
"title": "Build Activity",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 49},
|
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 49},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(increase(homelab_deploy_build_host_total{status=\"success\"}[1h]))",
|
"expr": "sum(increase(homelab_deploy_build_host_total{status=\"success\"}[1h]))",
|
||||||
|
|||||||
@@ -11,7 +11,7 @@
|
|||||||
{
|
{
|
||||||
"name": "instance",
|
"name": "instance",
|
||||||
"type": "query",
|
"type": "query",
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"query": "label_values(node_uname_info, instance)",
|
"query": "label_values(node_uname_info, instance)",
|
||||||
"refresh": 2,
|
"refresh": 2,
|
||||||
"includeAll": false,
|
"includeAll": false,
|
||||||
@@ -26,7 +26,7 @@
|
|||||||
"title": "CPU Usage",
|
"title": "CPU Usage",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\", instance=~\"$instance\"}[5m])) * 100)",
|
"expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\", instance=~\"$instance\"}[5m])) * 100)",
|
||||||
@@ -55,7 +55,7 @@
|
|||||||
"title": "Memory Usage",
|
"title": "Memory Usage",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "(1 - (node_memory_MemAvailable_bytes{instance=~\"$instance\"} / node_memory_MemTotal_bytes{instance=~\"$instance\"})) * 100",
|
"expr": "(1 - (node_memory_MemAvailable_bytes{instance=~\"$instance\"} / node_memory_MemTotal_bytes{instance=~\"$instance\"})) * 100",
|
||||||
@@ -84,7 +84,7 @@
|
|||||||
"title": "Disk Usage",
|
"title": "Disk Usage",
|
||||||
"type": "gauge",
|
"type": "gauge",
|
||||||
"gridPos": {"h": 8, "w": 8, "x": 0, "y": 8},
|
"gridPos": {"h": 8, "w": 8, "x": 0, "y": 8},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "100 - ((node_filesystem_avail_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"} / node_filesystem_size_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"}) * 100)",
|
"expr": "100 - ((node_filesystem_avail_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"} / node_filesystem_size_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"}) * 100)",
|
||||||
@@ -113,7 +113,7 @@
|
|||||||
"title": "System Load",
|
"title": "System Load",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 8, "x": 8, "y": 8},
|
"gridPos": {"h": 8, "w": 8, "x": 8, "y": 8},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "node_load1{instance=~\"$instance\"}",
|
"expr": "node_load1{instance=~\"$instance\"}",
|
||||||
@@ -142,7 +142,7 @@
|
|||||||
"title": "Uptime",
|
"title": "Uptime",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 8, "w": 8, "x": 16, "y": 8},
|
"gridPos": {"h": 8, "w": 8, "x": 16, "y": 8},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "time() - node_boot_time_seconds{instance=~\"$instance\"}",
|
"expr": "time() - node_boot_time_seconds{instance=~\"$instance\"}",
|
||||||
@@ -161,7 +161,7 @@
|
|||||||
"title": "Network Traffic",
|
"title": "Network Traffic",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "rate(node_network_receive_bytes_total{instance=~\"$instance\",device!~\"lo|veth.*|br.*|docker.*\"}[5m])",
|
"expr": "rate(node_network_receive_bytes_total{instance=~\"$instance\",device!~\"lo|veth.*|br.*|docker.*\"}[5m])",
|
||||||
@@ -185,7 +185,7 @@
|
|||||||
"title": "Disk I/O",
|
"title": "Disk I/O",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "rate(node_disk_read_bytes_total{instance=~\"$instance\",device!~\"dm-.*\"}[5m])",
|
"expr": "rate(node_disk_read_bytes_total{instance=~\"$instance\",device!~\"dm-.*\"}[5m])",
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
{
|
{
|
||||||
"name": "vm",
|
"name": "vm",
|
||||||
"type": "query",
|
"type": "query",
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"query": "label_values(pve_guest_info{template=\"0\"}, name)",
|
"query": "label_values(pve_guest_info{template=\"0\"}, name)",
|
||||||
"refresh": 2,
|
"refresh": 2,
|
||||||
"includeAll": true,
|
"includeAll": true,
|
||||||
@@ -30,7 +30,7 @@
|
|||||||
"title": "VMs Running",
|
"title": "VMs Running",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 1)",
|
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 1)",
|
||||||
@@ -56,7 +56,7 @@
|
|||||||
"title": "VMs Stopped",
|
"title": "VMs Stopped",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 0)",
|
"expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 0)",
|
||||||
@@ -87,7 +87,7 @@
|
|||||||
"title": "Node CPU",
|
"title": "Node CPU",
|
||||||
"type": "gauge",
|
"type": "gauge",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_cpu_usage_ratio{id=~\"node/.*\"} * 100",
|
"expr": "pve_cpu_usage_ratio{id=~\"node/.*\"} * 100",
|
||||||
@@ -120,7 +120,7 @@
|
|||||||
"title": "Node Memory",
|
"title": "Node Memory",
|
||||||
"type": "gauge",
|
"type": "gauge",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_memory_usage_bytes{id=~\"node/.*\"} / pve_memory_size_bytes{id=~\"node/.*\"} * 100",
|
"expr": "pve_memory_usage_bytes{id=~\"node/.*\"} / pve_memory_size_bytes{id=~\"node/.*\"} * 100",
|
||||||
@@ -153,7 +153,7 @@
|
|||||||
"title": "Node Uptime",
|
"title": "Node Uptime",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_uptime_seconds{id=~\"node/.*\"}",
|
"expr": "pve_uptime_seconds{id=~\"node/.*\"}",
|
||||||
@@ -180,7 +180,7 @@
|
|||||||
"title": "Templates",
|
"title": "Templates",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(pve_guest_info{template=\"1\"})",
|
"expr": "count(pve_guest_info{template=\"1\"})",
|
||||||
@@ -206,7 +206,7 @@
|
|||||||
"title": "VM Status",
|
"title": "VM Status",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
|
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
"expr": "pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
||||||
@@ -362,7 +362,7 @@
|
|||||||
"title": "VM CPU Usage",
|
"title": "VM CPU Usage",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_cpu_usage_ratio{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} * 100",
|
"expr": "pve_cpu_usage_ratio{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} * 100",
|
||||||
@@ -391,7 +391,7 @@
|
|||||||
"title": "VM Memory Usage",
|
"title": "VM Memory Usage",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_memory_usage_bytes{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
"expr": "pve_memory_usage_bytes{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
||||||
@@ -420,7 +420,7 @@
|
|||||||
"title": "VM Network Traffic",
|
"title": "VM Network Traffic",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "rate(pve_network_receive_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
"expr": "rate(pve_network_receive_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
||||||
@@ -453,7 +453,7 @@
|
|||||||
"title": "VM Disk I/O",
|
"title": "VM Disk I/O",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "rate(pve_disk_read_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
"expr": "rate(pve_disk_read_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
|
||||||
@@ -486,7 +486,7 @@
|
|||||||
"title": "Storage Usage",
|
"title": "Storage Usage",
|
||||||
"type": "bargauge",
|
"type": "bargauge",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
|
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_disk_usage_bytes{id=~\"storage/.*\"} / pve_disk_size_bytes{id=~\"storage/.*\"} * 100",
|
"expr": "pve_disk_usage_bytes{id=~\"storage/.*\"} / pve_disk_size_bytes{id=~\"storage/.*\"} * 100",
|
||||||
@@ -531,7 +531,7 @@
|
|||||||
"title": "Storage Capacity",
|
"title": "Storage Capacity",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 30},
|
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 30},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "pve_disk_size_bytes{id=~\"storage/.*\"}",
|
"expr": "pve_disk_size_bytes{id=~\"storage/.*\"}",
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
{
|
{
|
||||||
"name": "hostname",
|
"name": "hostname",
|
||||||
"type": "query",
|
"type": "query",
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"query": "label_values(systemd_unit_state, hostname)",
|
"query": "label_values(systemd_unit_state, hostname)",
|
||||||
"refresh": 2,
|
"refresh": 2,
|
||||||
"includeAll": true,
|
"includeAll": true,
|
||||||
@@ -30,7 +30,7 @@
|
|||||||
"title": "Failed Units",
|
"title": "Failed Units",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1) or vector(0)",
|
"expr": "count(systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1) or vector(0)",
|
||||||
@@ -60,7 +60,7 @@
|
|||||||
"title": "Active Units",
|
"title": "Active Units",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1)",
|
"expr": "count(systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1)",
|
||||||
@@ -86,7 +86,7 @@
|
|||||||
"title": "Hosts Monitored",
|
"title": "Hosts Monitored",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(count by (hostname) (systemd_unit_state{hostname=~\"$hostname\"}))",
|
"expr": "count(count by (hostname) (systemd_unit_state{hostname=~\"$hostname\"}))",
|
||||||
@@ -112,7 +112,7 @@
|
|||||||
"title": "Total Service Restarts",
|
"title": "Total Service Restarts",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(systemd_service_restart_total{hostname=~\"$hostname\"})",
|
"expr": "sum(systemd_service_restart_total{hostname=~\"$hostname\"})",
|
||||||
@@ -143,7 +143,7 @@
|
|||||||
"title": "Inactive Units",
|
"title": "Inactive Units",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(systemd_unit_state{state=\"inactive\", hostname=~\"$hostname\"} == 1)",
|
"expr": "count(systemd_unit_state{state=\"inactive\", hostname=~\"$hostname\"} == 1)",
|
||||||
@@ -169,7 +169,7 @@
|
|||||||
"title": "Timers",
|
"title": "Timers",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
|
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "count(systemd_timer_last_trigger_seconds{hostname=~\"$hostname\"})",
|
"expr": "count(systemd_timer_last_trigger_seconds{hostname=~\"$hostname\"})",
|
||||||
@@ -195,7 +195,7 @@
|
|||||||
"title": "Failed Units",
|
"title": "Failed Units",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 4},
|
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1",
|
"expr": "systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1",
|
||||||
@@ -251,7 +251,7 @@
|
|||||||
"title": "Service Restarts (Top 15)",
|
"title": "Service Restarts (Top 15)",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 4},
|
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 4},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "topk(15, systemd_service_restart_total{hostname=~\"$hostname\"} > 0)",
|
"expr": "topk(15, systemd_service_restart_total{hostname=~\"$hostname\"} > 0)",
|
||||||
@@ -309,7 +309,7 @@
|
|||||||
"title": "Active Units per Host",
|
"title": "Active Units per Host",
|
||||||
"type": "bargauge",
|
"type": "bargauge",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 10},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 10},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sort_desc(count by (hostname) (systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1))",
|
"expr": "sort_desc(count by (hostname) (systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1))",
|
||||||
@@ -339,7 +339,7 @@
|
|||||||
"title": "NixOS Upgrade Timers",
|
"title": "NixOS Upgrade Timers",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 10},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 10},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "systemd_timer_last_trigger_seconds{name=\"nixos-upgrade.timer\", hostname=~\"$hostname\"}",
|
"expr": "systemd_timer_last_trigger_seconds{name=\"nixos-upgrade.timer\", hostname=~\"$hostname\"}",
|
||||||
@@ -429,7 +429,7 @@
|
|||||||
"title": "Backup Timers",
|
"title": "Backup Timers",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 18},
|
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 18},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "systemd_timer_last_trigger_seconds{name=~\"restic.*\", hostname=~\"$hostname\"}",
|
"expr": "systemd_timer_last_trigger_seconds{name=~\"restic.*\", hostname=~\"$hostname\"}",
|
||||||
@@ -524,7 +524,7 @@
|
|||||||
"title": "Service Restarts Over Time",
|
"title": "Service Restarts Over Time",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 18},
|
"gridPos": {"h": 6, "w": 12, "x": 12, "y": 18},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum by (hostname) (increase(systemd_service_restart_total{hostname=~\"$hostname\"}[1h]))",
|
"expr": "sum by (hostname) (increase(systemd_service_restart_total{hostname=~\"$hostname\"}[1h]))",
|
||||||
|
|||||||
@@ -19,7 +19,7 @@
|
|||||||
"title": "Current Temperatures",
|
"title": "Current Temperatures",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 0},
|
"gridPos": {"h": 6, "w": 12, "x": 0, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
|
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
|
||||||
@@ -71,7 +71,7 @@
|
|||||||
"title": "Average Home Temperature",
|
"title": "Average Home Temperature",
|
||||||
"type": "gauge",
|
"type": "gauge",
|
||||||
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 0},
|
"gridPos": {"h": 6, "w": 6, "x": 12, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "avg(hass_sensor_temperature_celsius{entity!~\".*device_temperature|.*server.*\"})",
|
"expr": "avg(hass_sensor_temperature_celsius{entity!~\".*device_temperature|.*server.*\"})",
|
||||||
@@ -108,7 +108,7 @@
|
|||||||
"title": "Current Humidity",
|
"title": "Current Humidity",
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 0},
|
"gridPos": {"h": 6, "w": 6, "x": 18, "y": 0},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "hass_sensor_humidity_percent{entity!~\".*server.*\"}",
|
"expr": "hass_sensor_humidity_percent{entity!~\".*server.*\"}",
|
||||||
@@ -154,7 +154,7 @@
|
|||||||
"title": "Temperature History (30 Days)",
|
"title": "Temperature History (30 Days)",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 6},
|
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 6},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
|
"expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
|
||||||
@@ -207,7 +207,7 @@
|
|||||||
"title": "Temperature Trend (1h rate of change)",
|
"title": "Temperature Trend (1h rate of change)",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
|
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "deriv(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[1h]) * 3600",
|
"expr": "deriv(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[1h]) * 3600",
|
||||||
@@ -268,7 +268,7 @@
|
|||||||
"title": "24h Min / Max / Avg",
|
"title": "24h Min / Max / Avg",
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
|
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "min_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
|
"expr": "min_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
|
||||||
@@ -346,7 +346,7 @@
|
|||||||
"title": "Humidity History (30 Days)",
|
"title": "Humidity History (30 Days)",
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 24},
|
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 24},
|
||||||
"datasource": {"type": "prometheus", "uid": "prometheus"},
|
"datasource": {"type": "prometheus", "uid": "victoriametrics"},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "hass_sensor_humidity_percent",
|
"expr": "hass_sensor_humidity_percent",
|
||||||
|
|||||||
@@ -37,6 +37,10 @@
|
|||||||
# Declarative datasources
|
# Declarative datasources
|
||||||
provision.datasources.settings = {
|
provision.datasources.settings = {
|
||||||
apiVersion = 1;
|
apiVersion = 1;
|
||||||
|
prune = true;
|
||||||
|
deleteDatasources = [
|
||||||
|
{ name = "Prometheus (monitoring01)"; orgId = 1; }
|
||||||
|
];
|
||||||
datasources = [
|
datasources = [
|
||||||
{
|
{
|
||||||
name = "VictoriaMetrics";
|
name = "VictoriaMetrics";
|
||||||
|
|||||||
@@ -61,7 +61,42 @@
|
|||||||
mode 644
|
mode 644
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
reverse_proxy http://jelly01.home.2rjus.net:8096
|
header Content-Type text/html
|
||||||
|
respond <<HTML
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>Jellyfin - Maintenance</title>
|
||||||
|
<style>
|
||||||
|
body {
|
||||||
|
background: #101020;
|
||||||
|
color: #ddd;
|
||||||
|
font-family: sans-serif;
|
||||||
|
display: flex;
|
||||||
|
justify-content: center;
|
||||||
|
align-items: center;
|
||||||
|
min-height: 100vh;
|
||||||
|
margin: 0;
|
||||||
|
text-align: center;
|
||||||
|
}
|
||||||
|
.container { max-width: 500px; }
|
||||||
|
.disk { font-size: 80px; animation: spin 3s linear infinite; display: inline-block; }
|
||||||
|
@keyframes spin { from { transform: rotate(0deg); } to { transform: rotate(360deg); } }
|
||||||
|
h1 { color: #00a4dc; }
|
||||||
|
p { font-size: 1.2em; line-height: 1.6; }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="container">
|
||||||
|
<div class="disk">💿</div>
|
||||||
|
<h1>Jellyfin is taking a nap</h1>
|
||||||
|
<p>The NAS is getting shiny new hard drives.<br>
|
||||||
|
Jellyfin will be back once the disks stop spinning up.</p>
|
||||||
|
<p style="color:#666;font-size:0.9em;">In the meantime, maybe go outside?</p>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
HTML 200
|
||||||
}
|
}
|
||||||
http://http-proxy.home.2rjus.net/metrics {
|
http://http-proxy.home.2rjus.net/metrics {
|
||||||
log {
|
log {
|
||||||
|
|||||||
@@ -67,13 +67,13 @@ groups:
|
|||||||
summary: "Promtail service not running on {{ $labels.instance }}"
|
summary: "Promtail service not running on {{ $labels.instance }}"
|
||||||
description: "The promtail service has not been active on {{ $labels.instance }} for 5 minutes."
|
description: "The promtail service has not been active on {{ $labels.instance }} for 5 minutes."
|
||||||
- alert: filesystem_filling_up
|
- alert: filesystem_filling_up
|
||||||
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24*3600) < 0
|
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[24h], 24*3600) < 0
|
||||||
for: 1h
|
for: 1h
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
annotations:
|
annotations:
|
||||||
summary: "Filesystem predicted to fill within 24h on {{ $labels.instance }}"
|
summary: "Filesystem predicted to fill within 24h on {{ $labels.instance }}"
|
||||||
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
description: "Based on the last 24h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
||||||
- alert: systemd_not_running
|
- alert: systemd_not_running
|
||||||
expr: node_systemd_system_running == 0
|
expr: node_systemd_system_running == 0
|
||||||
for: 10m
|
for: 10m
|
||||||
|
|||||||
Reference in New Issue
Block a user