6 Commits

Author SHA1 Message Date
16ef202530 http-proxy: set content-type header on maintenance page
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m23s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 12:43:12 +01:00
5f3508a6d4 http-proxy: temporary jellyfin maintenance page
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 12:39:18 +01:00
2ca2509083 monitoring: increase filesystem_filling_up prediction window to 24h
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m55s
Reduces false positives from transient Nix store growth by basing the
linear prediction on a 24h trend instead of 6h.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 09:36:27 +01:00
58702bd10b truenas-migration: note subnet issue for 10GbE traffic
Some checks failed
Run nix flake check / flake-check (push) Failing after 7m10s
NAS and Proxmox are on the same 10GbE switch but different subnets,
forcing traffic through the router. Need to fix during migration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 01:34:46 +01:00
c9f47acb01 truenas-migration: mdadm boot mirror, clean zfs export step
Use TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root to keep
the boot path ZFS-independent. Added zfs export step before shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 01:34:46 +01:00
09ce018fb2 truenas-migration: switch from BTRFS to keeping ZFS, update plan
BTRFS RAID5/6 write hole is still unresolved, and RAID1 wastes
capacity with mixed disk sizes. Keep existing ZFS pool and import
directly on NixOS instead. Updated migration strategy, disk purchase
decision (2x 24TB ordered), SMART health notes, and vdev rebalancing
guidance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 01:34:46 +01:00
3 changed files with 113 additions and 73 deletions

View File

@@ -39,23 +39,17 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway
- nzbget: NixOS service or OCI container
- NFS exports: `services.nfs.server`
### Filesystem: BTRFS RAID1
### Filesystem: Keep ZFS
**Decision**: Migrate from ZFS to BTRFS with RAID1
**Decision**: Keep existing ZFS pool, import on NixOS
**Rationale**:
- **In-kernel**: No out-of-tree module issues like ZFS
- **Flexible expansion**: Add individual disks, not required to buy pairs
- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach
- **RAID level conversion**: Can convert between RAID levels in place
- Built-in checksumming, snapshots, compression (zstd)
- NixOS has good BTRFS support
**BTRFS RAID1 notes**:
- "RAID1" means 2 copies of all data
- Distributes across all available devices
- With 6+ disks, provides redundancy + capacity scaling
- RAID5/6 avoided (known issues), RAID1/10 are stable
- **No data migration needed**: Existing ZFS pool can be imported directly on NixOS
- **Proven reliability**: Pool has been running reliably on TrueNAS
- **NixOS ZFS support**: Well-supported, declarative configuration via `boot.zfs` and `services.zfs`
- **BTRFS RAID5/6 unreliable**: Research showed BTRFS RAID5/6 write hole is still unresolved
- **BTRFS RAID1 wasteful**: With mixed disk sizes, RAID1 wastes significant capacity vs ZFS mirrors
- Checksumming, snapshots, compression (lz4/zstd) all available
### Hardware: Keep Existing + Add Disks
@@ -69,83 +63,94 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway
**Storage architecture**:
**Bulk storage** (BTRFS RAID1 on HDDs):
- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB)
- Add: 2x new HDDs (size TBD)
**hdd-pool** (ZFS mirrors):
- Current: 3 mirror vdevs (2x16TB + 2x8TB + 2x8TB) = 32TB usable
- Add: mirror-3 with 2x 24TB = +24TB usable
- Total after expansion: ~56TB usable
- Use: Media, downloads, backups, non-critical data
- Risk tolerance: High (data mostly replaceable)
**Critical data** (small volume):
- Use 2x 240GB SSDs in mirror (BTRFS or ZFS)
- Or use 2TB NVMe for critical data
- Risk tolerance: Low (data important but small)
### Disk Purchase Decision
**Options under consideration**:
**Option A: 2x 16TB drives**
- Matches largest current drives
- Enables potential future RAID5 if desired (6x 16TB array)
- More conservative capacity increase
**Option B: 2x 20-24TB drives**
- Larger capacity headroom
- Better $/TB ratio typically
- Future-proofs better
**Initial purchase**: 2 drives (chassis has space for 2 more without modifications)
**Decision**: 2x 24TB drives (ordered, arriving 2026-02-21)
## Migration Strategy
### High-Level Plan
1. **Preparation**:
- Purchase 2x new HDDs (16TB or 20-24TB)
- Create NixOS configuration for new storage host
- Set up bare metal NixOS installation
1. **Expand ZFS pool** (on TrueNAS):
- Install 2x 24TB drives (may need new drive trays - order from abroad if needed)
- If chassis space is limited, temporarily replace the two oldest 8TB drives (da0/ada4)
- Add as mirror-3 vdev to hdd-pool
- Verify pool health and resilver completes
- Check SMART data on old 8TB drives (all healthy as of 2026-02-20, no reallocated sectors)
- Burn-in: at minimum short + long SMART test before adding to pool
2. **Initial BTRFS pool**:
- Install 2 new disks
- Create BTRFS filesystem in RAID1
- Mount and test NFS exports
2. **Prepare NixOS configuration**:
- Create host configuration (`hosts/nas1/` or similar)
- Configure ZFS pool import (`boot.zfs.extraPools`)
- Set up services: radarr, sonarr, nzbget, restic-rest, NFS
- Configure monitoring (node-exporter, promtail, smartctl-exporter)
3. **Data migration**:
- Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE
- Verify data integrity
3. **Install NixOS**:
- `zfs export hdd-pool` on TrueNAS before shutdown (clean export)
- Wipe TrueNAS boot-pool SSDs, set up as mdadm RAID1 for NixOS root
- Install NixOS on mdadm mirror (keeps boot path ZFS-independent)
- Import hdd-pool via `boot.zfs.extraPools`
- Verify all datasets mount correctly
4. **Expand pool**:
- As old ZFS pool is emptied, wipe drives and add to BTRFS pool
- Pool grows incrementally: 2 → 4 → 6 → 8 disks
- BTRFS rebalances data across new devices
4. **Service migration**:
- Configure NixOS services to use ZFS dataset paths
- Update NFS exports
- Test from consuming hosts
5. **Service migration**:
- Set up radarr/sonarr/nzbget/restic as NixOS services
- Update NFS client mounts on consuming hosts
6. **Cutover**:
- Point consumers to new NAS host
5. **Cutover**:
- Update DNS/client mounts if IP changes
- Verify monitoring integration
- Decommission TrueNAS
- Repurpose hardware or keep as spare
### Post-Expansion: Vdev Rebalancing
ZFS has no built-in rebalance command. After adding the new 24TB vdev, ZFS will
write new data preferentially to it (most free space), leaving old vdevs packed
at ~97%. This is suboptimal but not urgent once overall pool usage drops to ~50%.
To gradually rebalance, rewrite files in place so ZFS redistributes blocks across
all vdevs proportional to free space:
```bash
# Rewrite files individually (spreads blocks across all vdevs)
find /pool/dataset -type f -exec sh -c '
for f; do cp "$f" "$f.rebal" && mv "$f.rebal" "$f"; done
' _ {} +
```
Avoid `zfs send/recv` for large datasets (e.g. 20TB) as this would concentrate
data on the emptiest vdev rather than spreading it evenly.
**Recommendation**: Do this after NixOS migration is stable. Not urgent - the pool
will function fine with uneven distribution, just slightly suboptimal for performance.
### Migration Advantages
- **Low risk**: New pool created independently, old data remains intact during migration
- **Incremental**: Can add old disks one at a time as space allows
- **Flexible**: BTRFS handles mixed disk sizes gracefully
- **Reversible**: Keep TrueNAS running until fully validated
- **No data migration**: ZFS pool imported directly, no copying terabytes of data
- **Low risk**: Pool expansion done on stable TrueNAS before OS swap
- **Reversible**: Can boot back to TrueNAS if NixOS has issues (ZFS pool is OS-independent)
- **Quick cutover**: Once NixOS config is ready, the OS swap is fast
## Next Steps
1. Decide on disk size (16TB vs 20-24TB)
2. Purchase disks
3. Design NixOS host configuration (`hosts/nas1/`)
4. Plan detailed migration timeline
5. Document NFS export mapping (current new)
1. ~~Decide on disk size~~ - 2x 24TB ordered
2. Install drives and add mirror vdev to ZFS pool
3. Check SMART data on 8TB drives - decide whether to keep or retire
4. Design NixOS host configuration (`hosts/nas1/`)
5. Document NFS export mapping (current -> new)
6. Plan NixOS installation and cutover
## Open Questions
- [ ] Final decision on disk size?
- [ ] Hostname for new NAS host? (nas1? storage1?)
- [ ] IP address allocation (keep 10.69.12.50 or new IP?)
- [ ] Timeline/maintenance window for migration?
- [ ] IP address/subnet: NAS and Proxmox are both on 10GbE to the same switch but different subnets, forcing traffic through the router (bottleneck). Move to same subnet during migration.
- [x] Boot drive: Reuse TrueNAS boot-pool SSDs as mdadm RAID1 for NixOS root (no ZFS on boot path)
- [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed)
- [ ] Drive trays: do new 24TB drives fit, or order trays from abroad?
- [ ] Timeline/maintenance window for NixOS swap?

View File

@@ -61,7 +61,42 @@
mode 644
}
}
reverse_proxy http://jelly01.home.2rjus.net:8096
header Content-Type text/html
respond <<HTML
<!DOCTYPE html>
<html>
<head>
<title>Jellyfin - Maintenance</title>
<style>
body {
background: #101020;
color: #ddd;
font-family: sans-serif;
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
margin: 0;
text-align: center;
}
.container { max-width: 500px; }
.disk { font-size: 80px; animation: spin 3s linear infinite; display: inline-block; }
@keyframes spin { from { transform: rotate(0deg); } to { transform: rotate(360deg); } }
h1 { color: #00a4dc; }
p { font-size: 1.2em; line-height: 1.6; }
</style>
</head>
<body>
<div class="container">
<div class="disk">&#x1F4BF;</div>
<h1>Jellyfin is taking a nap</h1>
<p>The NAS is getting shiny new hard drives.<br>
Jellyfin will be back once the disks stop spinning up.</p>
<p style="color:#666;font-size:0.9em;">In the meantime, maybe go outside?</p>
</div>
</body>
</html>
HTML 200
}
http://http-proxy.home.2rjus.net/metrics {
log {

View File

@@ -67,13 +67,13 @@ groups:
summary: "Promtail service not running on {{ $labels.instance }}"
description: "The promtail service has not been active on {{ $labels.instance }} for 5 minutes."
- alert: filesystem_filling_up
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24*3600) < 0
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[24h], 24*3600) < 0
for: 1h
labels:
severity: warning
annotations:
summary: "Filesystem predicted to fill within 24h on {{ $labels.instance }}"
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
description: "Based on the last 24h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
- alert: systemd_not_running
expr: node_systemd_system_running == 0
for: 10m