diff --git a/docs/plans/truenas-migration.md b/docs/plans/truenas-migration.md index 0cb3ad4..db2fb60 100644 --- a/docs/plans/truenas-migration.md +++ b/docs/plans/truenas-migration.md @@ -39,23 +39,17 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway - nzbget: NixOS service or OCI container - NFS exports: `services.nfs.server` -### Filesystem: BTRFS RAID1 +### Filesystem: Keep ZFS -**Decision**: Migrate from ZFS to BTRFS with RAID1 +**Decision**: Keep existing ZFS pool, import on NixOS **Rationale**: -- **In-kernel**: No out-of-tree module issues like ZFS -- **Flexible expansion**: Add individual disks, not required to buy pairs -- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach -- **RAID level conversion**: Can convert between RAID levels in place -- Built-in checksumming, snapshots, compression (zstd) -- NixOS has good BTRFS support - -**BTRFS RAID1 notes**: -- "RAID1" means 2 copies of all data -- Distributes across all available devices -- With 6+ disks, provides redundancy + capacity scaling -- RAID5/6 avoided (known issues), RAID1/10 are stable +- **No data migration needed**: Existing ZFS pool can be imported directly on NixOS +- **Proven reliability**: Pool has been running reliably on TrueNAS +- **NixOS ZFS support**: Well-supported, declarative configuration via `boot.zfs` and `services.zfs` +- **BTRFS RAID5/6 unreliable**: Research showed BTRFS RAID5/6 write hole is still unresolved +- **BTRFS RAID1 wasteful**: With mixed disk sizes, RAID1 wastes significant capacity vs ZFS mirrors +- Checksumming, snapshots, compression (lz4/zstd) all available ### Hardware: Keep Existing + Add Disks @@ -69,83 +63,92 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway **Storage architecture**: -**Bulk storage** (BTRFS RAID1 on HDDs): -- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB) -- Add: 2x new HDDs (size TBD) +**hdd-pool** (ZFS mirrors): +- Current: 3 mirror vdevs (2x16TB + 2x8TB + 2x8TB) = 32TB usable +- Add: mirror-3 with 2x 24TB = +24TB usable +- Total after expansion: ~56TB usable - Use: Media, downloads, backups, non-critical data -- Risk tolerance: High (data mostly replaceable) - -**Critical data** (small volume): -- Use 2x 240GB SSDs in mirror (BTRFS or ZFS) -- Or use 2TB NVMe for critical data -- Risk tolerance: Low (data important but small) ### Disk Purchase Decision -**Options under consideration**: - -**Option A: 2x 16TB drives** -- Matches largest current drives -- Enables potential future RAID5 if desired (6x 16TB array) -- More conservative capacity increase - -**Option B: 2x 20-24TB drives** -- Larger capacity headroom -- Better $/TB ratio typically -- Future-proofs better - -**Initial purchase**: 2 drives (chassis has space for 2 more without modifications) +**Decision**: 2x 24TB drives (ordered, arriving 2026-02-21) ## Migration Strategy ### High-Level Plan -1. **Preparation**: - - Purchase 2x new HDDs (16TB or 20-24TB) - - Create NixOS configuration for new storage host - - Set up bare metal NixOS installation +1. **Expand ZFS pool** (on TrueNAS): + - Install 2x 24TB drives (may need new drive trays - order from abroad if needed) + - If chassis space is limited, temporarily replace the two oldest 8TB drives (da0/ada4) + - Add as mirror-3 vdev to hdd-pool + - Verify pool health and resilver completes + - Check SMART data on old 8TB drives (all healthy as of 2026-02-20, no reallocated sectors) + - Burn-in: at minimum short + long SMART test before adding to pool -2. **Initial BTRFS pool**: - - Install 2 new disks - - Create BTRFS filesystem in RAID1 - - Mount and test NFS exports +2. **Prepare NixOS configuration**: + - Create host configuration (`hosts/nas1/` or similar) + - Configure ZFS pool import (`boot.zfs.extraPools`) + - Set up services: radarr, sonarr, nzbget, restic-rest, NFS + - Configure monitoring (node-exporter, promtail, smartctl-exporter) -3. **Data migration**: - - Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE - - Verify data integrity +3. **Install NixOS**: + - Install NixOS on boot drive (SSD/NVMe, separate from ZFS pool) + - Import existing ZFS pool + - Verify all datasets mount correctly -4. **Expand pool**: - - As old ZFS pool is emptied, wipe drives and add to BTRFS pool - - Pool grows incrementally: 2 → 4 → 6 → 8 disks - - BTRFS rebalances data across new devices +4. **Service migration**: + - Configure NixOS services to use ZFS dataset paths + - Update NFS exports + - Test from consuming hosts -5. **Service migration**: - - Set up radarr/sonarr/nzbget/restic as NixOS services - - Update NFS client mounts on consuming hosts - -6. **Cutover**: - - Point consumers to new NAS host +5. **Cutover**: + - Update DNS/client mounts if IP changes + - Verify monitoring integration - Decommission TrueNAS - - Repurpose hardware or keep as spare + +### Post-Expansion: Vdev Rebalancing + +ZFS has no built-in rebalance command. After adding the new 24TB vdev, ZFS will +write new data preferentially to it (most free space), leaving old vdevs packed +at ~97%. This is suboptimal but not urgent once overall pool usage drops to ~50%. + +To gradually rebalance, rewrite files in place so ZFS redistributes blocks across +all vdevs proportional to free space: + +```bash +# Rewrite files individually (spreads blocks across all vdevs) +find /pool/dataset -type f -exec sh -c ' + for f; do cp "$f" "$f.rebal" && mv "$f.rebal" "$f"; done +' _ {} + +``` + +Avoid `zfs send/recv` for large datasets (e.g. 20TB) as this would concentrate +data on the emptiest vdev rather than spreading it evenly. + +**Recommendation**: Do this after NixOS migration is stable. Not urgent - the pool +will function fine with uneven distribution, just slightly suboptimal for performance. ### Migration Advantages -- **Low risk**: New pool created independently, old data remains intact during migration -- **Incremental**: Can add old disks one at a time as space allows -- **Flexible**: BTRFS handles mixed disk sizes gracefully -- **Reversible**: Keep TrueNAS running until fully validated +- **No data migration**: ZFS pool imported directly, no copying terabytes of data +- **Low risk**: Pool expansion done on stable TrueNAS before OS swap +- **Reversible**: Can boot back to TrueNAS if NixOS has issues (ZFS pool is OS-independent) +- **Quick cutover**: Once NixOS config is ready, the OS swap is fast ## Next Steps -1. Decide on disk size (16TB vs 20-24TB) -2. Purchase disks -3. Design NixOS host configuration (`hosts/nas1/`) -4. Plan detailed migration timeline -5. Document NFS export mapping (current → new) +1. ~~Decide on disk size~~ - 2x 24TB ordered +2. Install drives and add mirror vdev to ZFS pool +3. Check SMART data on 8TB drives - decide whether to keep or retire +4. Design NixOS host configuration (`hosts/nas1/`) +5. Document NFS export mapping (current -> new) +6. Plan NixOS installation and cutover ## Open Questions -- [ ] Final decision on disk size? - [ ] Hostname for new NAS host? (nas1? storage1?) - [ ] IP address allocation (keep 10.69.12.50 or new IP?) -- [ ] Timeline/maintenance window for migration? +- [ ] Boot drive: which SSD/NVMe for NixOS root? +- [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed) +- [ ] Drive trays: do new 24TB drives fit, or order trays from abroad? +- [ ] Timeline/maintenance window for NixOS swap?