From 09ce018fb27a8ed3dd730f131f4f7eb38770c296 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Fri, 20 Feb 2026 01:21:40 +0100 Subject: [PATCH] truenas-migration: switch from BTRFS to keeping ZFS, update plan BTRFS RAID5/6 write hole is still unresolved, and RAID1 wastes capacity with mixed disk sizes. Keep existing ZFS pool and import directly on NixOS instead. Updated migration strategy, disk purchase decision (2x 24TB ordered), SMART health notes, and vdev rebalancing guidance. Co-Authored-By: Claude Opus 4.6 --- docs/plans/truenas-migration.md | 141 ++++++++++++++++---------------- 1 file changed, 72 insertions(+), 69 deletions(-) diff --git a/docs/plans/truenas-migration.md b/docs/plans/truenas-migration.md index 0cb3ad4..db2fb60 100644 --- a/docs/plans/truenas-migration.md +++ b/docs/plans/truenas-migration.md @@ -39,23 +39,17 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway - nzbget: NixOS service or OCI container - NFS exports: `services.nfs.server` -### Filesystem: BTRFS RAID1 +### Filesystem: Keep ZFS -**Decision**: Migrate from ZFS to BTRFS with RAID1 +**Decision**: Keep existing ZFS pool, import on NixOS **Rationale**: -- **In-kernel**: No out-of-tree module issues like ZFS -- **Flexible expansion**: Add individual disks, not required to buy pairs -- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach -- **RAID level conversion**: Can convert between RAID levels in place -- Built-in checksumming, snapshots, compression (zstd) -- NixOS has good BTRFS support - -**BTRFS RAID1 notes**: -- "RAID1" means 2 copies of all data -- Distributes across all available devices -- With 6+ disks, provides redundancy + capacity scaling -- RAID5/6 avoided (known issues), RAID1/10 are stable +- **No data migration needed**: Existing ZFS pool can be imported directly on NixOS +- **Proven reliability**: Pool has been running reliably on TrueNAS +- **NixOS ZFS support**: Well-supported, declarative configuration via `boot.zfs` and `services.zfs` +- **BTRFS RAID5/6 unreliable**: Research showed BTRFS RAID5/6 write hole is still unresolved +- **BTRFS RAID1 wasteful**: With mixed disk sizes, RAID1 wastes significant capacity vs ZFS mirrors +- Checksumming, snapshots, compression (lz4/zstd) all available ### Hardware: Keep Existing + Add Disks @@ -69,83 +63,92 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway **Storage architecture**: -**Bulk storage** (BTRFS RAID1 on HDDs): -- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB) -- Add: 2x new HDDs (size TBD) +**hdd-pool** (ZFS mirrors): +- Current: 3 mirror vdevs (2x16TB + 2x8TB + 2x8TB) = 32TB usable +- Add: mirror-3 with 2x 24TB = +24TB usable +- Total after expansion: ~56TB usable - Use: Media, downloads, backups, non-critical data -- Risk tolerance: High (data mostly replaceable) - -**Critical data** (small volume): -- Use 2x 240GB SSDs in mirror (BTRFS or ZFS) -- Or use 2TB NVMe for critical data -- Risk tolerance: Low (data important but small) ### Disk Purchase Decision -**Options under consideration**: - -**Option A: 2x 16TB drives** -- Matches largest current drives -- Enables potential future RAID5 if desired (6x 16TB array) -- More conservative capacity increase - -**Option B: 2x 20-24TB drives** -- Larger capacity headroom -- Better $/TB ratio typically -- Future-proofs better - -**Initial purchase**: 2 drives (chassis has space for 2 more without modifications) +**Decision**: 2x 24TB drives (ordered, arriving 2026-02-21) ## Migration Strategy ### High-Level Plan -1. **Preparation**: - - Purchase 2x new HDDs (16TB or 20-24TB) - - Create NixOS configuration for new storage host - - Set up bare metal NixOS installation +1. **Expand ZFS pool** (on TrueNAS): + - Install 2x 24TB drives (may need new drive trays - order from abroad if needed) + - If chassis space is limited, temporarily replace the two oldest 8TB drives (da0/ada4) + - Add as mirror-3 vdev to hdd-pool + - Verify pool health and resilver completes + - Check SMART data on old 8TB drives (all healthy as of 2026-02-20, no reallocated sectors) + - Burn-in: at minimum short + long SMART test before adding to pool -2. **Initial BTRFS pool**: - - Install 2 new disks - - Create BTRFS filesystem in RAID1 - - Mount and test NFS exports +2. **Prepare NixOS configuration**: + - Create host configuration (`hosts/nas1/` or similar) + - Configure ZFS pool import (`boot.zfs.extraPools`) + - Set up services: radarr, sonarr, nzbget, restic-rest, NFS + - Configure monitoring (node-exporter, promtail, smartctl-exporter) -3. **Data migration**: - - Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE - - Verify data integrity +3. **Install NixOS**: + - Install NixOS on boot drive (SSD/NVMe, separate from ZFS pool) + - Import existing ZFS pool + - Verify all datasets mount correctly -4. **Expand pool**: - - As old ZFS pool is emptied, wipe drives and add to BTRFS pool - - Pool grows incrementally: 2 → 4 → 6 → 8 disks - - BTRFS rebalances data across new devices +4. **Service migration**: + - Configure NixOS services to use ZFS dataset paths + - Update NFS exports + - Test from consuming hosts -5. **Service migration**: - - Set up radarr/sonarr/nzbget/restic as NixOS services - - Update NFS client mounts on consuming hosts - -6. **Cutover**: - - Point consumers to new NAS host +5. **Cutover**: + - Update DNS/client mounts if IP changes + - Verify monitoring integration - Decommission TrueNAS - - Repurpose hardware or keep as spare + +### Post-Expansion: Vdev Rebalancing + +ZFS has no built-in rebalance command. After adding the new 24TB vdev, ZFS will +write new data preferentially to it (most free space), leaving old vdevs packed +at ~97%. This is suboptimal but not urgent once overall pool usage drops to ~50%. + +To gradually rebalance, rewrite files in place so ZFS redistributes blocks across +all vdevs proportional to free space: + +```bash +# Rewrite files individually (spreads blocks across all vdevs) +find /pool/dataset -type f -exec sh -c ' + for f; do cp "$f" "$f.rebal" && mv "$f.rebal" "$f"; done +' _ {} + +``` + +Avoid `zfs send/recv` for large datasets (e.g. 20TB) as this would concentrate +data on the emptiest vdev rather than spreading it evenly. + +**Recommendation**: Do this after NixOS migration is stable. Not urgent - the pool +will function fine with uneven distribution, just slightly suboptimal for performance. ### Migration Advantages -- **Low risk**: New pool created independently, old data remains intact during migration -- **Incremental**: Can add old disks one at a time as space allows -- **Flexible**: BTRFS handles mixed disk sizes gracefully -- **Reversible**: Keep TrueNAS running until fully validated +- **No data migration**: ZFS pool imported directly, no copying terabytes of data +- **Low risk**: Pool expansion done on stable TrueNAS before OS swap +- **Reversible**: Can boot back to TrueNAS if NixOS has issues (ZFS pool is OS-independent) +- **Quick cutover**: Once NixOS config is ready, the OS swap is fast ## Next Steps -1. Decide on disk size (16TB vs 20-24TB) -2. Purchase disks -3. Design NixOS host configuration (`hosts/nas1/`) -4. Plan detailed migration timeline -5. Document NFS export mapping (current → new) +1. ~~Decide on disk size~~ - 2x 24TB ordered +2. Install drives and add mirror vdev to ZFS pool +3. Check SMART data on 8TB drives - decide whether to keep or retire +4. Design NixOS host configuration (`hosts/nas1/`) +5. Document NFS export mapping (current -> new) +6. Plan NixOS installation and cutover ## Open Questions -- [ ] Final decision on disk size? - [ ] Hostname for new NAS host? (nas1? storage1?) - [ ] IP address allocation (keep 10.69.12.50 or new IP?) -- [ ] Timeline/maintenance window for migration? +- [ ] Boot drive: which SSD/NVMe for NixOS root? +- [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed) +- [ ] Drive trays: do new 24TB drives fit, or order trays from abroad? +- [ ] Timeline/maintenance window for NixOS swap?