http-proxy: set content-type header on maintenance page

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
http-proxy: temporary jellyfin maintenance page
2026-02-20 12:43:12 +01:00 · 2026-02-20 12:39:18 +01:00 · 2026-02-20 09:36:27 +01:00 · 2026-02-20 01:34:46 +01:00 · 2026-02-20 01:34:46 +01:00 · 2026-02-20 01:34:46 +01:00
16 changed files with 256 additions and 216 deletions
--- a/.claude/skills/quick-plan/SKILL.md
+++ b/.claude/skills/quick-plan/SKILL.md
@@ -73,6 +73,7 @@ Additional context, caveats, or references.
 - **Reference existing patterns**: Mention how this fits with existing infrastructure
 - **Tables for comparisons**: Use markdown tables when comparing options
 - **Practical focus**: Emphasize what needs to happen, not theory
+- **Mermaid diagrams**: Use mermaid code blocks for architecture diagrams, flow charts, or other graphs when relevant to the plan. Keep node labels short and use `<br/>` for line breaks

 ## Examples of Good Plans

--- a/docs/plans/host-migration-to-opentofu.md
+++ b/docs/plans/host-migration-to-opentofu.md
@@ -20,9 +20,9 @@ Hosts to migrate:
 | http-proxy | Stateless | Reverse proxy, recreate |
 | nats1 | Stateless | Messaging, recreate |
 | ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
-| monitoring01 | Stateful | Prometheus, Grafana, Loki |
+| ~~monitoring01~~ | ~~Decommission~~ | ✓ Complete — replaced by monitoring02 (VictoriaMetrics) |
 | jelly01 | Stateful | Jellyfin metadata, watch history, config |
-| pgdb1 | Decommission | Only used by Open WebUI on gunter, migrating to local postgres |
+| ~~pgdb1~~ | ~~Decommission~~ | ✓ Complete |
 | ~~jump~~ | ~~Decommission~~ | ✓ Complete |
 | ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
 | ~~ca~~ | ~~Deferred~~ | ✓ Complete |
@@ -31,10 +31,12 @@ Hosts to migrate:

 Before migrating any stateful host, ensure restic backups are in place and verified.

-### 1a. Expand monitoring01 Grafana Backup
+### ~~1a. Expand monitoring01 Grafana Backup~~ ✓ N/A

-The existing backup only covers `/var/lib/grafana/plugins` and a sqlite dump of `grafana.db`.
-Expand to back up all of `/var/lib/grafana/` to capture config directory and any other state.
+~~The existing backup only covers `/var/lib/grafana/plugins` and a sqlite dump of `grafana.db`.
+Expand to back up all of `/var/lib/grafana/` to capture config directory and any other state.~~
+
+No longer needed — monitoring01 decommissioned, replaced by monitoring02 with declarative Grafana dashboards.

 ### 1b. Add Jellyfin Backup to jelly01

@@ -94,15 +96,17 @@ For each stateful host, the procedure is:
 7. Start services and verify functionality
 8. Decommission the old VM

-### 3a. monitoring01
+### 3a. monitoring01 ✓ COMPLETE

-1. Run final Grafana backup
-2. Provision new monitoring01 via OpenTofu
-3. After bootstrap, restore `/var/lib/grafana/` from restic
-4. Restart Grafana, verify dashboards and datasources are intact
-5. Prometheus and Loki start fresh with empty data (acceptable)
-6. Verify all scrape targets are being collected
-7. Decommission old VM
+~~1. Run final Grafana backup~~
+~~2. Provision new monitoring01 via OpenTofu~~
+~~3. After bootstrap, restore `/var/lib/grafana/` from restic~~
+~~4. Restart Grafana, verify dashboards and datasources are intact~~
+~~5. Prometheus and Loki start fresh with empty data (acceptable)~~
+~~6. Verify all scrape targets are being collected~~
+~~7. Decommission old VM~~
+
+Replaced by monitoring02 with VictoriaMetrics, standalone Loki and Grafana modules. Host configuration, old service modules, and terraform resources removed.

 ### 3b. jelly01

@@ -163,19 +167,19 @@ Host was already removed from flake.nix and VM destroyed. Configuration cleaned

 Host configuration, services, and VM already removed.

-### pgdb1 (in progress)
+### pgdb1 ✓ COMPLETE

-Only consumer was Open WebUI on gunter, which has been migrated to use local PostgreSQL.
+~~Only consumer was Open WebUI on gunter, which has been migrated to use local PostgreSQL.~~

-1. ~~Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)~~ ✓
-2. ~~Remove host configuration from `hosts/pgdb1/`~~ ✓
-3. ~~Remove `services/postgres/` (only used by pgdb1)~~ ✓
-4. ~~Remove from `flake.nix`~~ ✓
-5. ~~Remove Vault AppRole from `terraform/vault/approle.tf`~~ ✓
-6. Destroy the VM in Proxmox
-7. ~~Commit cleanup~~ ✓
+~~1. Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)~~
+~~2. Remove host configuration from `hosts/pgdb1/`~~
+~~3. Remove `services/postgres/` (only used by pgdb1)~~
+~~4. Remove from `flake.nix`~~
+~~5. Remove Vault AppRole from `terraform/vault/approle.tf`~~
+~~6. Destroy the VM in Proxmox~~
+~~7. Commit cleanup~~

-See `docs/plans/pgdb1-decommission.md` for detailed plan.
+Host configuration, services, terraform resources, and VM removed. See `docs/plans/pgdb1-decommission.md` for detailed plan.

 ## Phase 5: Decommission ca Host ✓ COMPLETE

--- a/docs/plans/remote-access.md
+++ b/docs/plans/remote-access.md
@@ -24,29 +24,20 @@ After evaluating WireGuard gateway vs Headscale (self-hosted Tailscale), the **W

 ## Architecture

-```
-                    ┌─────────────────────────────────┐
-                    │  VPS (OpenStack)                │
-  Laptop/Phone ──→ │  WireGuard endpoint             │
-  (WireGuard)      │  Client peers: laptop, phone    │
-                    │  Routes 10.69.13.0/24 via tunnel│
-                    └──────────┬──────────────────────┘
-                               │ WireGuard tunnel
-                               ▼
-                    ┌─────────────────────────────────┐
-                    │  extgw01 (gateway + bastion)    │
-                    │  - WireGuard tunnel to VPS      │
-                    │  - Firewall (allowlist only)    │
-                    │  - SSH + 2FA (full access)      │
-                    └──────────┬──────────────────────┘
-                               │ allowed traffic only
-                               ▼
-                    ┌─────────────────────────────────┐
-                    │  Internal network 10.69.13.0/24 │
-                    │  - monitoring01:3000 (Grafana)  │
-                    │  - jelly01:8096 (Jellyfin)      │
-                    │  - *-jail hosts (arr stack)     │
-                    └─────────────────────────────────┘
+```mermaid
+graph TD
+    clients["Laptop / Phone"]
+    vps["VPS<br/>(WireGuard endpoint)"]
+    extgw["extgw01<br/>(gateway + bastion)"]
+    grafana["Grafana<br/>monitoring01:3000"]
+    jellyfin["Jellyfin<br/>jelly01:8096"]
+    arr["arr stack<br/>*-jail hosts"]
+
+    clients -->|WireGuard| vps
+    vps -->|WireGuard tunnel| extgw
+    extgw -->|allowed traffic| grafana
+    extgw -->|allowed traffic| jellyfin
+    extgw -->|allowed traffic| arr
 ```

 ### Existing path (unchanged)
--- a/docs/plans/truenas-migration.md
+++ b/docs/plans/truenas-migration.md
@@ -39,23 +39,17 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway
 - nzbget: NixOS service or OCI container
 - NFS exports: `services.nfs.server`

-### Filesystem: BTRFS RAID1
+### Filesystem: Keep ZFS

-**Decision**: Migrate from ZFS to BTRFS with RAID1
+**Decision**: Keep existing ZFS pool, import on NixOS

 **Rationale**:
- **In-kernel**: No out-of-tree module issues like ZFS
- **Flexible expansion**: Add individual disks, not required to buy pairs
- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach
- **RAID level conversion**: Can convert between RAID levels in place
- Built-in checksumming, snapshots, compression (zstd)
- NixOS has good BTRFS support
-
-**BTRFS RAID1 notes**:
- "RAID1" means 2 copies of all data
- Distributes across all available devices
- With 6+ disks, provides redundancy + capacity scaling
- RAID5/6 avoided (known issues), RAID1/10 are stable
+- **No data migration needed**: Existing ZFS pool can be imported directly on NixOS
+- **Proven reliability**: Pool has been running reliably on TrueNAS
+- **NixOS ZFS support**: Well-supported, declarative configuration via `boot.zfs` and `services.zfs`
+- **BTRFS RAID5/6 unreliable**: Research showed BTRFS RAID5/6 write hole is still unresolved
+- **BTRFS RAID1 wasteful**: With mixed disk sizes, RAID1 wastes significant capacity vs ZFS mirrors
+- Checksumming, snapshots, compression (lz4/zstd) all available

 ### Hardware: Keep Existing + Add Disks

@@ -69,83 +63,94 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway

 **Storage architecture**:

-**Bulk storage** (BTRFS RAID1 on HDDs):
- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB)
- Add: 2x new HDDs (size TBD)
+**hdd-pool** (ZFS mirrors):
+- Current: 3 mirror vdevs (2x16TB + 2x8TB + 2x8TB) = 32TB usable
+- Add: mirror-3 with 2x 24TB = +24TB usable
+- Total after expansion: ~56TB usable
 - Use: Media, downloads, backups, non-critical data
- Risk tolerance: High (data mostly replaceable)
-
-**Critical data** (small volume):
- Use 2x 240GB SSDs in mirror (BTRFS or ZFS)
- Or use 2TB NVMe for critical data
- Risk tolerance: Low (data important but small)

 ### Disk Purchase Decision

-**Options under consideration**:
-
-**Option A: 2x 16TB drives**
- Matches largest current drives
- Enables potential future RAID5 if desired (6x 16TB array)
- More conservative capacity increase
-
-**Option B: 2x 20-24TB drives**
- Larger capacity headroom
- Better $/TB ratio typically
- Future-proofs better
-
-**Initial purchase**: 2 drives (chassis has space for 2 more without modifications)
+**Decision**: 2x 24TB drives (ordered, arriving 2026-02-21)

 ## Migration Strategy

 ### High-Level Plan

-1. **Preparation**:
-   - Purchase 2x new HDDs (16TB or 20-24TB)
-   - Create NixOS configuration for new storage host
-   - Set up bare metal NixOS installation
+1. **Expand ZFS pool** (on TrueNAS):
+   - Install 2x 24TB drives (may need new drive trays - order from abroad if needed)
+   - If chassis space is limited, temporarily replace the two oldest 8TB drives (da0/ada4)
+   - Add as mirror-3 vdev to hdd-pool
+   - Verify pool health and resilver completes
+   - Check SMART data on old 8TB drives (all healthy as of 2026-02-20, no reallocated sectors)
+   - Burn-in: at minimum short + long SMART test before adding to pool

-2. **Initial BTRFS pool**:
-   - Install 2 new disks
-   - Create BTRFS filesystem in RAID1
-   - Mount and test NFS exports
+2. **Prepare NixOS configuration**:
+   - Create host configuration (`hosts/nas1/` or similar)
+   - Configure ZFS pool import (`boot.zfs.extraPools`)
+   - Set up services: radarr, sonarr, nzbget, restic-rest, NFS
+   - Configure monitoring (node-exporter, promtail, smartctl-exporter)

-3. **Data migration**:
-   - Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE
-   - Verify data integrity
+3. **Install NixOS**:
+   - `zfs export hdd-pool` on TrueNAS before shutdown (clean export)
+   - Wipe TrueNAS boot-pool SSDs, set up as mdadm RAID1 for NixOS root
+   - Install NixOS on mdadm mirror (keeps boot path ZFS-independent)
+   - Import hdd-pool via `boot.zfs.extraPools`
+   - Verify all datasets mount correctly

-4. **Expand pool**:
-   - As old ZFS pool is emptied, wipe drives and add to BTRFS pool
-   - Pool grows incrementally: 2 → 4 → 6 → 8 disks
-   - BTRFS rebalances data across new devices
+4. **Service migration**:
+   - Configure NixOS services to use ZFS dataset paths
+   - Update NFS exports
+   - Test from consuming hosts

-5. **Service migration**:
-   - Set up radarr/sonarr/nzbget/restic as NixOS services
-   - Update NFS client mounts on consuming hosts
-
-6. **Cutover**:
-   - Point consumers to new NAS host
+5. **Cutover**:
+   - Update DNS/client mounts if IP changes
+   - Verify monitoring integration
   - Decommission TrueNAS
-   - Repurpose hardware or keep as spare
+
+### Post-Expansion: Vdev Rebalancing
+
+ZFS has no built-in rebalance command. After adding the new 24TB vdev, ZFS will
+write new data preferentially to it (most free space), leaving old vdevs packed
+at ~97%. This is suboptimal but not urgent once overall pool usage drops to ~50%.
+
+To gradually rebalance, rewrite files in place so ZFS redistributes blocks across
+all vdevs proportional to free space:
+
+```bash
+# Rewrite files individually (spreads blocks across all vdevs)
+find /pool/dataset -type f -exec sh -c '
+  for f; do cp "$f" "$f.rebal" && mv "$f.rebal" "$f"; done
+' _ {} +
+```
+
+Avoid `zfs send/recv` for large datasets (e.g. 20TB) as this would concentrate
+data on the emptiest vdev rather than spreading it evenly.
+
+**Recommendation**: Do this after NixOS migration is stable. Not urgent - the pool
+will function fine with uneven distribution, just slightly suboptimal for performance.

 ### Migration Advantages

- **Low risk**: New pool created independently, old data remains intact during migration
- **Incremental**: Can add old disks one at a time as space allows
- **Flexible**: BTRFS handles mixed disk sizes gracefully
- **Reversible**: Keep TrueNAS running until fully validated
+- **No data migration**: ZFS pool imported directly, no copying terabytes of data
+- **Low risk**: Pool expansion done on stable TrueNAS before OS swap
+- **Reversible**: Can boot back to TrueNAS if NixOS has issues (ZFS pool is OS-independent)
+- **Quick cutover**: Once NixOS config is ready, the OS swap is fast

 ## Next Steps

-1. Decide on disk size (16TB vs 20-24TB)
-2. Purchase disks
-3. Design NixOS host configuration (`hosts/nas1/`)
-4. Plan detailed migration timeline
-5. Document NFS export mapping (current → new)
+1. ~~Decide on disk size~~ - 2x 24TB ordered
+2. Install drives and add mirror vdev to ZFS pool
+3. Check SMART data on 8TB drives - decide whether to keep or retire
+4. Design NixOS host configuration (`hosts/nas1/`)
+5. Document NFS export mapping (current -> new)
+6. Plan NixOS installation and cutover

 ## Open Questions

- [ ] Final decision on disk size?
 - [ ] Hostname for new NAS host? (nas1? storage1?)
- [ ] IP address allocation (keep 10.69.12.50 or new IP?)
- [ ] Timeline/maintenance window for migration?
+- [ ] IP address/subnet: NAS and Proxmox are both on 10GbE to the same switch but different subnets, forcing traffic through the router (bottleneck). Move to same subnet during migration.
+- [x] Boot drive: Reuse TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root (no ZFS on boot path)
+- [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed)
+- [ ] Drive trays: do new 24TB drives fit, or order trays from abroad?
+- [ ] Timeline/maintenance window for NixOS swap?
--- a/flake.lock
+++ b/flake.lock
@@ -28,11 +28,11 @@
        ]
      },
      "locked": {
-        "lastModified": 1771004123,
-        "narHash": "sha256-Jw36EzL4IGIc2TmeZGphAAUrJXoWqfvCbybF8bTHgMA=",
+        "lastModified": 1771488195,
+        "narHash": "sha256-2kMxqdDyPluRQRoES22Y0oSjp7pc5fj2nRterfmSIyc=",
        "ref": "master",
-        "rev": "e5e8be86ecdcae8a5962ba3bddddfe91b574792b",
-        "revCount": 36,
+        "rev": "2d26de50559d8acb82ea803764e138325d95572c",
+        "revCount": 37,
        "type": "git",
        "url": "https://git.t-juice.club/torjus/homelab-deploy"
      },
@@ -64,11 +64,11 @@
    },
    "nixpkgs": {
      "locked": {
-        "lastModified": 1771043024,
-        "narHash": "sha256-O1XDr7EWbRp+kHrNNgLWgIrB0/US5wvw9K6RERWAj6I=",
+        "lastModified": 1771419570,
+        "narHash": "sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU=",
        "owner": "nixos",
        "repo": "nixpkgs",
-        "rev": "3aadb7ca9eac2891d52a9dec199d9580a6e2bf44",
+        "rev": "6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47",
        "type": "github"
      },
      "original": {
@@ -80,11 +80,11 @@
    },
    "nixpkgs-unstable": {
      "locked": {
-        "lastModified": 1771008912,
-        "narHash": "sha256-gf2AmWVTs8lEq7z/3ZAsgnZDhWIckkb+ZnAo5RzSxJg=",
+        "lastModified": 1771369470,
+        "narHash": "sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ=",
        "owner": "nixos",
        "repo": "nixpkgs",
-        "rev": "a82ccc39b39b621151d6732718e3e250109076fa",
+        "rev": "0182a361324364ae3f436a63005877674cf45efb",
        "type": "github"
      },
      "original": {
--- a/hosts/nix-cache02/builder.nix
+++ b/hosts/nix-cache02/builder.nix
@@ -25,7 +25,7 @@
      };
    };

-    timeout = 7200;
+    timeout = 14400;
    metrics.enable = true;
  };

--- a/services/grafana/dashboards/apiary.json
+++ b/services/grafana/dashboards/apiary.json
@@ -19,7 +19,7 @@
      "title": "SSH Connections",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(oubliette_ssh_connections_total{job=\"apiary\"})",
@@ -51,7 +51,7 @@
      "title": "Active Sessions",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "oubliette_sessions_active{job=\"apiary\"}",
@@ -86,7 +86,7 @@
      "title": "Unique IPs",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "oubliette_storage_unique_ips{job=\"apiary\"}",
@@ -118,7 +118,7 @@
      "title": "Total Login Attempts",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "oubliette_storage_login_attempts_total{job=\"apiary\"}",
@@ -150,7 +150,7 @@
      "title": "SSH Connections Over Time",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "interval": "60s",
      "targets": [
        {
@@ -183,7 +183,7 @@
      "title": "Auth Attempts Over Time",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "interval": "60s",
      "targets": [
        {
@@ -216,7 +216,7 @@
      "title": "Sessions by Shell",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "interval": "60s",
      "targets": [
        {
@@ -249,7 +249,7 @@
      "title": "Attempts by Country",
      "type": "geomap",
      "gridPos": {"h": 10, "w": 24, "x": 0, "y": 12},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "oubliette_auth_attempts_by_country_total{job=\"apiary\"}",
@@ -318,7 +318,7 @@
      "title": "Session Duration Distribution",
      "type": "heatmap",
      "gridPos": {"h": 8, "w": 24, "x": 0, "y": 30},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "interval": "60s",
      "targets": [
        {
@@ -359,7 +359,7 @@
      "title": "Commands Executed by Shell",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "interval": "60s",
      "targets": [
        {
--- a/services/grafana/dashboards/certificates.json
+++ b/services/grafana/dashboards/certificates.json
@@ -16,7 +16,7 @@
      "title": "Endpoints Monitored",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"})",
@@ -48,7 +48,7 @@
      "title": "Probe Failures",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(probe_success{job=\"blackbox_tls\"} == 0) or vector(0)",
@@ -82,7 +82,7 @@
      "title": "Expiring Soon (< 7d)",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) < 86400 * 7) or vector(0)",
@@ -116,7 +116,7 @@
      "title": "Expiring Critical (< 24h)",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) < 86400) or vector(0)",
@@ -150,7 +150,7 @@
      "title": "Minimum Days Remaining",
      "type": "gauge",
      "gridPos": {"h": 4, "w": 8, "x": 16, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "min((probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400)",
@@ -187,7 +187,7 @@
      "title": "Certificate Expiry by Endpoint",
      "type": "table",
      "gridPos": {"h": 12, "w": 12, "x": 0, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400",
@@ -253,7 +253,7 @@
      "title": "Probe Status",
      "type": "table",
      "gridPos": {"h": 12, "w": 12, "x": 12, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "probe_success{job=\"blackbox_tls\"}",
@@ -340,7 +340,7 @@
      "title": "Certificate Expiry Over Time",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 24, "x": 0, "y": 16},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "(probe_ssl_earliest_cert_expiry{job=\"blackbox_tls\"} - time()) / 86400",
@@ -378,7 +378,7 @@
      "title": "Probe Success Rate",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 24},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "avg(probe_success{job=\"blackbox_tls\"}) * 100",
@@ -418,7 +418,7 @@
      "title": "Probe Duration",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 24},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "probe_duration_seconds{job=\"blackbox_tls\"}",
--- a/services/grafana/dashboards/nixos-fleet.json
+++ b/services/grafana/dashboards/nixos-fleet.json
@@ -15,7 +15,7 @@
      {
        "name": "tier",
        "type": "query",
-        "datasource": {"type": "prometheus", "uid": "prometheus"},
+        "datasource": {"type": "prometheus", "uid": "victoriametrics"},
        "query": "label_values(nixos_flake_info, tier)",
        "refresh": 2,
        "includeAll": true,
@@ -30,7 +30,7 @@
      "title": "Hosts Behind Remote",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 1)",
@@ -65,7 +65,7 @@
      "title": "Hosts Needing Reboot",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(nixos_config_mismatch{tier=~\"$tier\"} == 1)",
@@ -100,7 +100,7 @@
      "title": "Total Hosts",
      "type": "stat",
      "gridPos": {"h": 4, "w": 3, "x": 8, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(nixos_flake_info{tier=~\"$tier\"})",
@@ -128,7 +128,7 @@
      "title": "Nixpkgs Age",
      "type": "stat",
      "gridPos": {"h": 4, "w": 3, "x": 11, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "max(nixos_flake_input_age_seconds{input=\"nixpkgs\", tier=~\"$tier\"})",
@@ -163,7 +163,7 @@
      "title": "Hosts Up-to-date",
      "type": "stat",
      "gridPos": {"h": 4, "w": 3, "x": 14, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(nixos_flake_revision_behind{tier=~\"$tier\"} == 0)",
@@ -192,7 +192,7 @@
      "title": "Deployments (24h)",
      "type": "stat",
      "gridPos": {"h": 4, "w": 3, "x": 17, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(increase(homelab_deploy_deployments_total{status=\"completed\"}[24h]))",
@@ -222,7 +222,7 @@
      "title": "Avg Deploy Time",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(increase(homelab_deploy_deployment_duration_seconds_sum{success=\"true\"}[24h])) / sum(increase(homelab_deploy_deployment_duration_seconds_count{success=\"true\"}[24h]))",
@@ -256,7 +256,7 @@
      "title": "Fleet Status",
      "type": "table",
      "gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "nixos_flake_info{tier=~\"$tier\"}",
@@ -430,7 +430,7 @@
      "title": "Generation Age by Host",
      "type": "bargauge",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sort_desc(nixos_generation_age_seconds{tier=~\"$tier\"})",
@@ -467,7 +467,7 @@
      "title": "Generations per Host",
      "type": "bargauge",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sort_desc(nixos_generation_count{tier=~\"$tier\"})",
@@ -501,7 +501,7 @@
      "title": "Deployment Activity (Generation Age Over Time)",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 24, "x": 0, "y": 22},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "nixos_generation_age_seconds{tier=~\"$tier\"}",
@@ -534,7 +534,7 @@
      "title": "Flake Input Ages",
      "type": "table",
      "gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "max by (input) (nixos_flake_input_age_seconds)",
@@ -577,7 +577,7 @@
      "title": "Hosts by Revision",
      "type": "piechart",
      "gridPos": {"h": 6, "w": 6, "x": 12, "y": 30},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count by (current_rev) (nixos_flake_info{tier=~\"$tier\"})",
@@ -601,7 +601,7 @@
      "title": "Hosts by Tier",
      "type": "piechart",
      "gridPos": {"h": 6, "w": 6, "x": 18, "y": 30},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count by (tier) (nixos_flake_info)",
@@ -641,7 +641,7 @@
      "title": "Builds (24h)",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 0, "y": 37},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(increase(homelab_deploy_build_host_total{status=\"success\"}[24h]))",
@@ -671,7 +671,7 @@
      "title": "Failed Builds (24h)",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 4, "y": 37},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(increase(homelab_deploy_build_host_total{status=\"failure\"}[24h])) or vector(0)",
@@ -705,7 +705,7 @@
      "title": "Last Build",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 8, "y": 37},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "time() - max(homelab_deploy_build_last_timestamp)",
@@ -739,7 +739,7 @@
      "title": "Avg Build Time",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 12, "y": 37},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(increase(homelab_deploy_build_duration_seconds_sum[24h])) / sum(increase(homelab_deploy_build_duration_seconds_count[24h]))",
@@ -773,7 +773,7 @@
      "title": "Total Hosts Built",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 16, "y": 37},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(homelab_deploy_build_duration_seconds_count)",
@@ -802,7 +802,7 @@
      "title": "Build Jobs (24h)",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 20, "y": 37},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(increase(homelab_deploy_builds_total[24h]))",
@@ -832,7 +832,7 @@
      "title": "Build Time by Host",
      "type": "bargauge",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 41},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sort_desc(homelab_deploy_build_duration_seconds_sum / homelab_deploy_build_duration_seconds_count)",
@@ -869,7 +869,7 @@
      "title": "Build Count by Host",
      "type": "bargauge",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 41},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sort_desc(sum by (host) (homelab_deploy_build_host_total))",
@@ -903,7 +903,7 @@
      "title": "Build Activity",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 24, "x": 0, "y": 49},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(increase(homelab_deploy_build_host_total{status=\"success\"}[1h]))",
--- a/services/grafana/dashboards/node-exporter.json
+++ b/services/grafana/dashboards/node-exporter.json
@@ -11,7 +11,7 @@
      {
        "name": "instance",
        "type": "query",
-        "datasource": {"type": "prometheus", "uid": "prometheus"},
+        "datasource": {"type": "prometheus", "uid": "victoriametrics"},
        "query": "label_values(node_uname_info, instance)",
        "refresh": 2,
        "includeAll": false,
@@ -26,7 +26,7 @@
      "title": "CPU Usage",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\", instance=~\"$instance\"}[5m])) * 100)",
@@ -55,7 +55,7 @@
      "title": "Memory Usage",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "(1 - (node_memory_MemAvailable_bytes{instance=~\"$instance\"} / node_memory_MemTotal_bytes{instance=~\"$instance\"})) * 100",
@@ -84,7 +84,7 @@
      "title": "Disk Usage",
      "type": "gauge",
      "gridPos": {"h": 8, "w": 8, "x": 0, "y": 8},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "100 - ((node_filesystem_avail_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"} / node_filesystem_size_bytes{instance=~\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"}) * 100)",
@@ -113,7 +113,7 @@
      "title": "System Load",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 8, "x": 8, "y": 8},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "node_load1{instance=~\"$instance\"}",
@@ -142,7 +142,7 @@
      "title": "Uptime",
      "type": "stat",
      "gridPos": {"h": 8, "w": 8, "x": 16, "y": 8},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "time() - node_boot_time_seconds{instance=~\"$instance\"}",
@@ -161,7 +161,7 @@
      "title": "Network Traffic",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "rate(node_network_receive_bytes_total{instance=~\"$instance\",device!~\"lo|veth.*|br.*|docker.*\"}[5m])",
@@ -185,7 +185,7 @@
      "title": "Disk I/O",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "rate(node_disk_read_bytes_total{instance=~\"$instance\",device!~\"dm-.*\"}[5m])",
--- a/services/grafana/dashboards/proxmox.json
+++ b/services/grafana/dashboards/proxmox.json
@@ -15,7 +15,7 @@
      {
        "name": "vm",
        "type": "query",
-        "datasource": {"type": "prometheus", "uid": "prometheus"},
+        "datasource": {"type": "prometheus", "uid": "victoriametrics"},
        "query": "label_values(pve_guest_info{template=\"0\"}, name)",
        "refresh": 2,
        "includeAll": true,
@@ -30,7 +30,7 @@
      "title": "VMs Running",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 1)",
@@ -56,7 +56,7 @@
      "title": "VMs Stopped",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(pve_up{id=~\"qemu/.*\"} * on(id) pve_guest_info{template=\"0\"} == 0)",
@@ -87,7 +87,7 @@
      "title": "Node CPU",
      "type": "gauge",
      "gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_cpu_usage_ratio{id=~\"node/.*\"} * 100",
@@ -120,7 +120,7 @@
      "title": "Node Memory",
      "type": "gauge",
      "gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_memory_usage_bytes{id=~\"node/.*\"} / pve_memory_size_bytes{id=~\"node/.*\"} * 100",
@@ -153,7 +153,7 @@
      "title": "Node Uptime",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_uptime_seconds{id=~\"node/.*\"}",
@@ -180,7 +180,7 @@
      "title": "Templates",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(pve_guest_info{template=\"1\"})",
@@ -206,7 +206,7 @@
      "title": "VM Status",
      "type": "table",
      "gridPos": {"h": 10, "w": 24, "x": 0, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_guest_info{template=\"0\", name=~\"$vm\"}",
@@ -362,7 +362,7 @@
      "title": "VM CPU Usage",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 14},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_cpu_usage_ratio{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"} * 100",
@@ -391,7 +391,7 @@
      "title": "VM Memory Usage",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 14},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_memory_usage_bytes{id=~\"qemu/.*\"} * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
@@ -420,7 +420,7 @@
      "title": "VM Network Traffic",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 22},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "rate(pve_network_receive_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
@@ -453,7 +453,7 @@
      "title": "VM Disk I/O",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 22},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "rate(pve_disk_read_bytes{id=~\"qemu/.*\"}[5m]) * on(id) group_left(name) pve_guest_info{template=\"0\", name=~\"$vm\"}",
@@ -486,7 +486,7 @@
      "title": "Storage Usage",
      "type": "bargauge",
      "gridPos": {"h": 6, "w": 12, "x": 0, "y": 30},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_disk_usage_bytes{id=~\"storage/.*\"} / pve_disk_size_bytes{id=~\"storage/.*\"} * 100",
@@ -531,7 +531,7 @@
      "title": "Storage Capacity",
      "type": "table",
      "gridPos": {"h": 6, "w": 12, "x": 12, "y": 30},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "pve_disk_size_bytes{id=~\"storage/.*\"}",
--- a/services/grafana/dashboards/systemd.json
+++ b/services/grafana/dashboards/systemd.json
@@ -15,7 +15,7 @@
      {
        "name": "hostname",
        "type": "query",
-        "datasource": {"type": "prometheus", "uid": "prometheus"},
+        "datasource": {"type": "prometheus", "uid": "victoriametrics"},
        "query": "label_values(systemd_unit_state, hostname)",
        "refresh": 2,
        "includeAll": true,
@@ -30,7 +30,7 @@
      "title": "Failed Units",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1) or vector(0)",
@@ -60,7 +60,7 @@
      "title": "Active Units",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 4, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1)",
@@ -86,7 +86,7 @@
      "title": "Hosts Monitored",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 8, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(count by (hostname) (systemd_unit_state{hostname=~\"$hostname\"}))",
@@ -112,7 +112,7 @@
      "title": "Total Service Restarts",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 12, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum(systemd_service_restart_total{hostname=~\"$hostname\"})",
@@ -143,7 +143,7 @@
      "title": "Inactive Units",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 16, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(systemd_unit_state{state=\"inactive\", hostname=~\"$hostname\"} == 1)",
@@ -169,7 +169,7 @@
      "title": "Timers",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 20, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "count(systemd_timer_last_trigger_seconds{hostname=~\"$hostname\"})",
@@ -195,7 +195,7 @@
      "title": "Failed Units",
      "type": "table",
      "gridPos": {"h": 6, "w": 12, "x": 0, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "systemd_unit_state{state=\"failed\", hostname=~\"$hostname\"} == 1",
@@ -251,7 +251,7 @@
      "title": "Service Restarts (Top 15)",
      "type": "table",
      "gridPos": {"h": 6, "w": 12, "x": 12, "y": 4},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "topk(15, systemd_service_restart_total{hostname=~\"$hostname\"} > 0)",
@@ -309,7 +309,7 @@
      "title": "Active Units per Host",
      "type": "bargauge",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 10},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sort_desc(count by (hostname) (systemd_unit_state{state=\"active\", hostname=~\"$hostname\"} == 1))",
@@ -339,7 +339,7 @@
      "title": "NixOS Upgrade Timers",
      "type": "table",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 10},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "systemd_timer_last_trigger_seconds{name=\"nixos-upgrade.timer\", hostname=~\"$hostname\"}",
@@ -429,7 +429,7 @@
      "title": "Backup Timers",
      "type": "table",
      "gridPos": {"h": 6, "w": 12, "x": 0, "y": 18},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "systemd_timer_last_trigger_seconds{name=~\"restic.*\", hostname=~\"$hostname\"}",
@@ -524,7 +524,7 @@
      "title": "Service Restarts Over Time",
      "type": "timeseries",
      "gridPos": {"h": 6, "w": 12, "x": 12, "y": 18},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "sum by (hostname) (increase(systemd_service_restart_total{hostname=~\"$hostname\"}[1h]))",
--- a/services/grafana/dashboards/temperature.json
+++ b/services/grafana/dashboards/temperature.json
@@ -19,7 +19,7 @@
      "title": "Current Temperatures",
      "type": "stat",
      "gridPos": {"h": 6, "w": 12, "x": 0, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
@@ -71,7 +71,7 @@
      "title": "Average Home Temperature",
      "type": "gauge",
      "gridPos": {"h": 6, "w": 6, "x": 12, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "avg(hass_sensor_temperature_celsius{entity!~\".*device_temperature|.*server.*\"})",
@@ -108,7 +108,7 @@
      "title": "Current Humidity",
      "type": "stat",
      "gridPos": {"h": 6, "w": 6, "x": 18, "y": 0},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "hass_sensor_humidity_percent{entity!~\".*server.*\"}",
@@ -154,7 +154,7 @@
      "title": "Temperature History (30 Days)",
      "type": "timeseries",
      "gridPos": {"h": 10, "w": 24, "x": 0, "y": 6},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}",
@@ -207,7 +207,7 @@
      "title": "Temperature Trend (1h rate of change)",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "deriv(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[1h]) * 3600",
@@ -268,7 +268,7 @@
      "title": "24h Min / Max / Avg",
      "type": "table",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "min_over_time(hass_sensor_temperature_celsius{entity!~\".*device_temperature\"}[24h])",
@@ -346,7 +346,7 @@
      "title": "Humidity History (30 Days)",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 24, "x": 0, "y": 24},
-      "datasource": {"type": "prometheus", "uid": "prometheus"},
+      "datasource": {"type": "prometheus", "uid": "victoriametrics"},
      "targets": [
        {
          "expr": "hass_sensor_humidity_percent",
--- a/services/grafana/default.nix
+++ b/services/grafana/default.nix
@@ -37,6 +37,10 @@
    # Declarative datasources
    provision.datasources.settings = {
      apiVersion = 1;
+      prune = true;
+      deleteDatasources = [
+        { name = "Prometheus (monitoring01)"; orgId = 1; }
+      ];
      datasources = [
        {
          name = "VictoriaMetrics";
--- a/services/http-proxy/proxy.nix
+++ b/services/http-proxy/proxy.nix
@@ -61,7 +61,42 @@
            mode 644
          }
        }
-        reverse_proxy http://jelly01.home.2rjus.net:8096
+        header Content-Type text/html
+        respond <<HTML
+          <!DOCTYPE html>
+          <html>
+          <head>
+            <title>Jellyfin - Maintenance</title>
+            <style>
+              body {
+                background: #101020;
+                color: #ddd;
+                font-family: sans-serif;
+                display: flex;
+                justify-content: center;
+                align-items: center;
+                min-height: 100vh;
+                margin: 0;
+                text-align: center;
+              }
+              .container { max-width: 500px; }
+              .disk { font-size: 80px; animation: spin 3s linear infinite; display: inline-block; }
+              @keyframes spin { from { transform: rotate(0deg); } to { transform: rotate(360deg); } }
+              h1 { color: #00a4dc; }
+              p { font-size: 1.2em; line-height: 1.6; }
+            </style>
+          </head>
+          <body>
+            <div class="container">
+              <div class="disk">&#x1F4BF;</div>
+              <h1>Jellyfin is taking a nap</h1>
+              <p>The NAS is getting shiny new hard drives.<br>
+              Jellyfin will be back once the disks stop spinning up.</p>
+              <p style="color:#666;font-size:0.9em;">In the meantime, maybe go outside?</p>
+            </div>
+          </body>
+          </html>
+        HTML 200
      }
      http://http-proxy.home.2rjus.net/metrics {
        log {
--- a/services/monitoring/rules.yml
+++ b/services/monitoring/rules.yml
@@ -67,13 +67,13 @@ groups:
          summary: "Promtail service not running on {{ $labels.instance }}"
          description: "The promtail service has not been active on {{ $labels.instance }} for 5 minutes."
      - alert: filesystem_filling_up
-        expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24*3600) < 0
+        expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[24h], 24*3600) < 0
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Filesystem predicted to fill within 24h on {{ $labels.instance }}"
-          description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
+          description: "Based on the last 24h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
      - alert: systemd_not_running
        expr: node_systemd_system_running == 0
        for: 10m
Author	SHA1	Message	Date
Torjus Håkestad	16ef202530	http-proxy: set content-type header on maintenance page Some checks failed Run nix flake check / flake-check (push) Failing after 3m23s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 12:43:12 +01:00
Torjus Håkestad	5f3508a6d4	http-proxy: temporary jellyfin maintenance page Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 12:39:18 +01:00
Torjus Håkestad	2ca2509083	monitoring: increase filesystem_filling_up prediction window to 24h Some checks failed Run nix flake check / flake-check (push) Failing after 3m55s Details Reduces false positives from transient Nix store growth by basing the linear prediction on a 24h trend instead of 6h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 09:36:27 +01:00
Torjus Håkestad	58702bd10b	truenas-migration: note subnet issue for 10GbE traffic Some checks failed Run nix flake check / flake-check (push) Failing after 7m10s Details NAS and Proxmox are on the same 10GbE switch but different subnets, forcing traffic through the router. Need to fix during migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
Torjus Håkestad	c9f47acb01	truenas-migration: mdadm boot mirror, clean zfs export step Use TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root to keep the boot path ZFS-independent. Added zfs export step before shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
Torjus Håkestad	09ce018fb2	truenas-migration: switch from BTRFS to keeping ZFS, update plan BTRFS RAID5/6 write hole is still unresolved, and RAID1 wastes capacity with mixed disk sizes. Keep existing ZFS pool and import directly on NixOS instead. Updated migration strategy, disk purchase decision (2x 24TB ordered), SMART health notes, and vdev rebalancing guidance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:34:46 +01:00
torjus-bot	3042803c4d	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/fa56d7d6de78f5a7f997b0ea2bc6efd5868ad9e8?narHash=sha256-X01Q3DgSpjeBpapoGA4rzKOn25qdKxbPnxHeMLNoHTU%3D' (2026-02-16) → 'github:nixos/nixpkgs/6d41bc27aaf7b6a3ba6b169db3bd5d6159cfaa47?narHash=sha256-bxAlQgre3pcQcaRUm/8A0v/X8d2nhfraWSFqVmMcBcU%3D' (2026-02-18)	2026-02-20 00:07:01 +00:00
Torjus Håkestad	1e7200b494	quick-plan: add mermaid diagram guideline Some checks failed Run nix flake check / flake-check (push) Failing after 5m7s Details Periodic flake update / flake-update (push) Successful in 5m26s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:35:53 +01:00
Torjus Håkestad	eec1e374b2	docs: simplify mermaid diagram labels Some checks failed Run nix flake check / flake-check (push) Failing after 4m0s Details Use <br/> for line breaks and shorter node labels so the diagram renders cleanly in Gitea. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:29:52 +01:00
Torjus Håkestad	fcc410afad	docs: replace ASCII diagram with mermaid in remote-access plan Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:28:57 +01:00
Torjus Håkestad	59f0c7ceda	flake.lock: update homelab-deploy Some checks failed Run nix flake check / flake-check (push) Failing after 8m10s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 09:04:03 +01:00
torjus-bot	d713f06c6e	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/a82ccc39b39b621151d6732718e3e250109076fa?narHash=sha256-gf2AmWVTs8lEq7z/3ZAsgnZDhWIckkb%2BZnAo5RzSxJg%3D' (2026-02-13) → 'github:nixos/nixpkgs/0182a361324364ae3f436a63005877674cf45efb?narHash=sha256-0NBlEBKkN3lufyvFegY4TYv5mCNHbi5OmBDrzihbBMQ%3D' (2026-02-17)	2026-02-19 00:01:44 +00:00
Torjus Håkestad	7374d1ff7f	nix-cache02: increase builder timeout to 4 hours Some checks failed Run nix flake check / flake-check (push) Failing after 4m4s Details Periodic flake update / flake-update (push) Successful in 2m32s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:53:33 +01:00
torjus-bot	e912c75b6c	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/3aadb7ca9eac2891d52a9dec199d9580a6e2bf44?narHash=sha256-O1XDr7EWbRp%2BkHrNNgLWgIrB0/US5wvw9K6RERWAj6I%3D' (2026-02-14) → 'github:nixos/nixpkgs/fa56d7d6de78f5a7f997b0ea2bc6efd5868ad9e8?narHash=sha256-X01Q3DgSpjeBpapoGA4rzKOn25qdKxbPnxHeMLNoHTU%3D' (2026-02-16)	2026-02-18 00:01:34 +00:00
Torjus Håkestad	b218b4f8bc	docs: update migration plan for monitoring01 and pgdb1 completion Some checks failed Run nix flake check / flake-check (push) Failing after 16m37s Details Periodic flake update / flake-update (push) Successful in 2m21s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:26:23 +01:00
Torjus Håkestad	65acf13e6f	grafana: fix datasource UIDs for VictoriaMetrics migration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Update all dashboard datasource references from "prometheus" to "victoriametrics" to match the declared datasource UID. Enable prune and deleteDatasources to clean up the old Prometheus (monitoring01) datasource from Grafana's database. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:23:04 +01:00