diff --git a/docs/plans/nix-cache-reprovision.md b/docs/plans/nix-cache-reprovision.md index db4df07..160f3b5 100644 --- a/docs/plans/nix-cache-reprovision.md +++ b/docs/plans/nix-cache-reprovision.md @@ -11,7 +11,7 @@ Reprovision `nix-cache01` using the OpenTofu workflow, and improve the build/cac **Phase 1: New Build Host** - COMPLETE **Phase 2: NATS Build Triggering** - COMPLETE **Phase 3: Safe Flake Update Workflow** - NOT STARTED -**Phase 4: Complete Migration** - IN PROGRESS (Harmonia configured, DNS cutover pending) +**Phase 4: Complete Migration** - COMPLETE (cleanup pending) ## Completed Work @@ -64,20 +64,21 @@ The `homelab-deploy` tool was extended with a builder mode: ## Current State -### Old System (nix-cache01) -- Still running at 10.69.13.15 -- Serves binary cache via Harmonia -- Has the old `build-flakes.sh` timer (every 30 min) -- Will be decommissioned after nix-cache02 is fully validated +### Old System (nix-cache01) - PENDING DECOMMISSION +- Running at 10.69.13.15 +- No longer serving the canonical `nix-cache.home.2rjus.net` (now serves `nix-cache01.home.2rjus.net`) +- Still has the old `build-flakes.sh` timer (every 30 min) - to be removed +- Ready for decommission -### New System (nix-cache02) +### New System (nix-cache02) - NOW ACTIVE - Running at 10.69.13.25 +- **Now serving `https://nix-cache.home.2rjus.net`** (canonical URL) - Builder service active, responding to NATS build requests - Metrics exposed on port 9973 (`homelab-deploy-builder` job) - Harmonia binary cache server running -- New signing key configured (`nix-cache02.home.2rjus.net-1`) -- Currently serving at `https://nix-cache02.home.2rjus.net` (for testing) +- New signing key: `nix-cache02.home.2rjus.net-1` - Trusted public key deployed to all hosts +- Promoted to prod tier with `build-host` role ## Remaining Work @@ -93,9 +94,10 @@ The `homelab-deploy` tool was extended with a builder mode: 1. ~~**Add Harmonia to nix-cache02**~~ ✅ Done - new signing key, parameterized service 2. ~~**Add trusted public key to all hosts**~~ ✅ Done - `system/nix.nix` updated 3. ~~**Test cache from other hosts**~~ ✅ Done - verified from testvm01 -4. **Update proxy** - Add `nix-cache.home.2rjus.net` to nix-cache02's Caddy config -5. **Increase RAM** - Bump to 24GB after nix-cache01 is gone -6. **Decommission nix-cache01**: +4. ~~**Update proxy and DNS**~~ ✅ Done - `nix-cache.home.2rjus.net` CNAME now points to nix-cache02 +5. ~~**Deploy to all hosts**~~ ✅ Done - all hosts have new trusted key +6. **Increase RAM** - Bump to 24GB after nix-cache01 is gone +7. **Decommission nix-cache01**: - Remove from `terraform/vms.tf` - Remove old build script (`services/nix-cache/build-flakes.nix`, `build-flakes.sh`) - Archive or delete host config @@ -153,5 +155,5 @@ Available metrics: ## Open Questions -- [ ] When to cut over DNS from nix-cache01 to nix-cache02? +- [x] ~~When to cut over DNS from nix-cache01 to nix-cache02?~~ Done - 2026-02-10 - [ ] Implement safe flake update workflow before or after full migration?