truenas-migration: switch from BTRFS to keeping ZFS, update plan

BTRFS RAID5/6 write hole is still unresolved, and RAID1 wastes
capacity with mixed disk sizes. Keep existing ZFS pool and import
directly on NixOS instead. Updated migration strategy, disk purchase
decision (2x 24TB ordered), SMART health notes, and vdev rebalancing
guidance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-20 01:21:40 +01:00
parent 3042803c4d
commit 09ce018fb2

View File

@@ -39,23 +39,17 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway
- nzbget: NixOS service or OCI container - nzbget: NixOS service or OCI container
- NFS exports: `services.nfs.server` - NFS exports: `services.nfs.server`
### Filesystem: BTRFS RAID1 ### Filesystem: Keep ZFS
**Decision**: Migrate from ZFS to BTRFS with RAID1 **Decision**: Keep existing ZFS pool, import on NixOS
**Rationale**: **Rationale**:
- **In-kernel**: No out-of-tree module issues like ZFS - **No data migration needed**: Existing ZFS pool can be imported directly on NixOS
- **Flexible expansion**: Add individual disks, not required to buy pairs - **Proven reliability**: Pool has been running reliably on TrueNAS
- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach - **NixOS ZFS support**: Well-supported, declarative configuration via `boot.zfs` and `services.zfs`
- **RAID level conversion**: Can convert between RAID levels in place - **BTRFS RAID5/6 unreliable**: Research showed BTRFS RAID5/6 write hole is still unresolved
- Built-in checksumming, snapshots, compression (zstd) - **BTRFS RAID1 wasteful**: With mixed disk sizes, RAID1 wastes significant capacity vs ZFS mirrors
- NixOS has good BTRFS support - Checksumming, snapshots, compression (lz4/zstd) all available
**BTRFS RAID1 notes**:
- "RAID1" means 2 copies of all data
- Distributes across all available devices
- With 6+ disks, provides redundancy + capacity scaling
- RAID5/6 avoided (known issues), RAID1/10 are stable
### Hardware: Keep Existing + Add Disks ### Hardware: Keep Existing + Add Disks
@@ -69,83 +63,92 @@ Expand storage capacity for the main hdd-pool. Since we need to add disks anyway
**Storage architecture**: **Storage architecture**:
**Bulk storage** (BTRFS RAID1 on HDDs): **hdd-pool** (ZFS mirrors):
- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB) - Current: 3 mirror vdevs (2x16TB + 2x8TB + 2x8TB) = 32TB usable
- Add: 2x new HDDs (size TBD) - Add: mirror-3 with 2x 24TB = +24TB usable
- Total after expansion: ~56TB usable
- Use: Media, downloads, backups, non-critical data - Use: Media, downloads, backups, non-critical data
- Risk tolerance: High (data mostly replaceable)
**Critical data** (small volume):
- Use 2x 240GB SSDs in mirror (BTRFS or ZFS)
- Or use 2TB NVMe for critical data
- Risk tolerance: Low (data important but small)
### Disk Purchase Decision ### Disk Purchase Decision
**Options under consideration**: **Decision**: 2x 24TB drives (ordered, arriving 2026-02-21)
**Option A: 2x 16TB drives**
- Matches largest current drives
- Enables potential future RAID5 if desired (6x 16TB array)
- More conservative capacity increase
**Option B: 2x 20-24TB drives**
- Larger capacity headroom
- Better $/TB ratio typically
- Future-proofs better
**Initial purchase**: 2 drives (chassis has space for 2 more without modifications)
## Migration Strategy ## Migration Strategy
### High-Level Plan ### High-Level Plan
1. **Preparation**: 1. **Expand ZFS pool** (on TrueNAS):
- Purchase 2x new HDDs (16TB or 20-24TB) - Install 2x 24TB drives (may need new drive trays - order from abroad if needed)
- Create NixOS configuration for new storage host - If chassis space is limited, temporarily replace the two oldest 8TB drives (da0/ada4)
- Set up bare metal NixOS installation - Add as mirror-3 vdev to hdd-pool
- Verify pool health and resilver completes
- Check SMART data on old 8TB drives (all healthy as of 2026-02-20, no reallocated sectors)
- Burn-in: at minimum short + long SMART test before adding to pool
2. **Initial BTRFS pool**: 2. **Prepare NixOS configuration**:
- Install 2 new disks - Create host configuration (`hosts/nas1/` or similar)
- Create BTRFS filesystem in RAID1 - Configure ZFS pool import (`boot.zfs.extraPools`)
- Mount and test NFS exports - Set up services: radarr, sonarr, nzbget, restic-rest, NFS
- Configure monitoring (node-exporter, promtail, smartctl-exporter)
3. **Data migration**: 3. **Install NixOS**:
- Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE - Install NixOS on boot drive (SSD/NVMe, separate from ZFS pool)
- Verify data integrity - Import existing ZFS pool
- Verify all datasets mount correctly
4. **Expand pool**: 4. **Service migration**:
- As old ZFS pool is emptied, wipe drives and add to BTRFS pool - Configure NixOS services to use ZFS dataset paths
- Pool grows incrementally: 2 → 4 → 6 → 8 disks - Update NFS exports
- BTRFS rebalances data across new devices - Test from consuming hosts
5. **Service migration**: 5. **Cutover**:
- Set up radarr/sonarr/nzbget/restic as NixOS services - Update DNS/client mounts if IP changes
- Update NFS client mounts on consuming hosts - Verify monitoring integration
6. **Cutover**:
- Point consumers to new NAS host
- Decommission TrueNAS - Decommission TrueNAS
- Repurpose hardware or keep as spare
### Post-Expansion: Vdev Rebalancing
ZFS has no built-in rebalance command. After adding the new 24TB vdev, ZFS will
write new data preferentially to it (most free space), leaving old vdevs packed
at ~97%. This is suboptimal but not urgent once overall pool usage drops to ~50%.
To gradually rebalance, rewrite files in place so ZFS redistributes blocks across
all vdevs proportional to free space:
```bash
# Rewrite files individually (spreads blocks across all vdevs)
find /pool/dataset -type f -exec sh -c '
for f; do cp "$f" "$f.rebal" && mv "$f.rebal" "$f"; done
' _ {} +
```
Avoid `zfs send/recv` for large datasets (e.g. 20TB) as this would concentrate
data on the emptiest vdev rather than spreading it evenly.
**Recommendation**: Do this after NixOS migration is stable. Not urgent - the pool
will function fine with uneven distribution, just slightly suboptimal for performance.
### Migration Advantages ### Migration Advantages
- **Low risk**: New pool created independently, old data remains intact during migration - **No data migration**: ZFS pool imported directly, no copying terabytes of data
- **Incremental**: Can add old disks one at a time as space allows - **Low risk**: Pool expansion done on stable TrueNAS before OS swap
- **Flexible**: BTRFS handles mixed disk sizes gracefully - **Reversible**: Can boot back to TrueNAS if NixOS has issues (ZFS pool is OS-independent)
- **Reversible**: Keep TrueNAS running until fully validated - **Quick cutover**: Once NixOS config is ready, the OS swap is fast
## Next Steps ## Next Steps
1. Decide on disk size (16TB vs 20-24TB) 1. ~~Decide on disk size~~ - 2x 24TB ordered
2. Purchase disks 2. Install drives and add mirror vdev to ZFS pool
3. Design NixOS host configuration (`hosts/nas1/`) 3. Check SMART data on 8TB drives - decide whether to keep or retire
4. Plan detailed migration timeline 4. Design NixOS host configuration (`hosts/nas1/`)
5. Document NFS export mapping (current new) 5. Document NFS export mapping (current -> new)
6. Plan NixOS installation and cutover
## Open Questions ## Open Questions
- [ ] Final decision on disk size?
- [ ] Hostname for new NAS host? (nas1? storage1?) - [ ] Hostname for new NAS host? (nas1? storage1?)
- [ ] IP address allocation (keep 10.69.12.50 or new IP?) - [ ] IP address allocation (keep 10.69.12.50 or new IP?)
- [ ] Timeline/maintenance window for migration? - [ ] Boot drive: which SSD/NVMe for NixOS root?
- [ ] Retire old 8TB drives? (SMART looks healthy, keep unless chassis space is needed)
- [ ] Drive trays: do new 24TB drives fit, or order trays from abroad?
- [ ] Timeline/maintenance window for NixOS swap?