From ec4ac1477e592e13d478dc6d43abf7bfd2b0ded2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Sat, 7 Feb 2026 22:49:53 +0100 Subject: [PATCH] docs: mark pgdb1 for decommissioning instead of migration Only consumer was Open WebUI on gunter, which will migrate to local PostgreSQL. Removed pgdb1 backup/migration phases and added to decommission list. Co-Authored-By: Claude Opus 4.5 --- docs/plans/host-migration-to-opentofu.md | 65 +++++++++--------------- 1 file changed, 23 insertions(+), 42 deletions(-) diff --git a/docs/plans/host-migration-to-opentofu.md b/docs/plans/host-migration-to-opentofu.md index 5e7e9f8..5efe23d 100644 --- a/docs/plans/host-migration-to-opentofu.md +++ b/docs/plans/host-migration-to-opentofu.md @@ -19,11 +19,10 @@ Hosts to migrate: | nix-cache01 | Stateless | Binary cache, recreate | | http-proxy | Stateless | Reverse proxy, recreate | | nats1 | Stateless | Messaging, recreate | -| auth01 | Decommission | No longer in use | | ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto | | monitoring01 | Stateful | Prometheus, Grafana, Loki | | jelly01 | Stateful | Jellyfin metadata, watch history, config | -| pgdb1 | Stateful | PostgreSQL databases | +| pgdb1 | Decommission | Only used by Open WebUI on gunter, migrating to local postgres | | ~~jump~~ | ~~Decommission~~ | ✓ Complete | | ~~auth01~~ | ~~Decommission~~ | ✓ Complete | | ~~ca~~ | ~~Deferred~~ | ✓ Complete | @@ -46,39 +45,19 @@ No backup currently exists. Add a restic backup job for `/var/lib/jellyfin/` whi Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup. The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates. -### 1c. Add PostgreSQL Backup to pgdb1 - -No backup currently exists. Add a restic backup job with a `pg_dumpall` pre-hook to capture -all databases and roles. The dump should be piped through restic's stdin backup (similar to -the Grafana DB dump pattern on monitoring01). - -### 1d. Verify Existing ha1 Backup +### 1c. Verify Existing ha1 Backup ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify these backups are current and restorable before proceeding with migration. -### 1e. Verify All Backups +### 1d. Verify All Backups After adding/expanding backup jobs: 1. Trigger a manual backup run on each host 2. Verify backup integrity with `restic check` 3. Test a restore to a temporary location to confirm data is recoverable -## Phase 2: Declare pgdb1 Databases in Nix - -Before migrating pgdb1, audit the manually-created databases and users on the running -instance, then declare them in the Nix configuration using `ensureDatabases` and -`ensureUsers`. This makes the PostgreSQL setup reproducible on the new host. - -Steps: -1. SSH to pgdb1, run `\l` and `\du` in psql to list databases and roles -2. Add `ensureDatabases` and `ensureUsers` to `services/postgres/postgres.nix` -3. Document any non-default PostgreSQL settings or extensions per database - -After reprovisioning, the databases will be created by NixOS, and data restored from the -`pg_dumpall` backup. - -## Phase 3: Stateless Host Migration +## Phase 2: Stateless Host Migration These hosts have no meaningful state and can be recreated fresh. For each host: @@ -102,7 +81,7 @@ Migrate stateless hosts in an order that minimizes disruption: migration complete. All hosts use both ns1 and ns2 as resolvers, so ns1 being down briefly during migration is tolerable. -## Phase 4: Stateful Host Migration +## Phase 3: Stateful Host Migration For each stateful host, the procedure is: @@ -115,17 +94,7 @@ For each stateful host, the procedure is: 7. Start services and verify functionality 8. Decommission the old VM -### 4a. pgdb1 - -1. Run final `pg_dumpall` backup via restic -2. Stop PostgreSQL on the old host -3. Provision new pgdb1 via OpenTofu -4. After bootstrap, NixOS creates the declared databases/users -5. Restore data with `pg_restore` or `psql < dumpall.sql` -6. Verify database connectivity from gunter (`10.69.30.105`) -7. Decommission old VM - -### 4b. monitoring01 +### 3a. monitoring01 1. Run final Grafana backup 2. Provision new monitoring01 via OpenTofu @@ -135,7 +104,7 @@ For each stateful host, the procedure is: 6. Verify all scrape targets are being collected 7. Decommission old VM -### 4c. jelly01 +### 3b. jelly01 1. Run final Jellyfin backup 2. Provision new jelly01 via OpenTofu @@ -144,7 +113,7 @@ For each stateful host, the procedure is: 5. Start Jellyfin, verify watch history and library metadata are present 6. Decommission old VM -### 4d. ha1 +### 3c. ha1 1. Verify latest restic backup is current 2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host @@ -168,7 +137,7 @@ OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropr `usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed through before starting Zigbee2MQTT on the new host. -## Phase 5: Decommission jump and auth01 Hosts +## Phase 4: Decommission Hosts ### jump ✓ COMPLETE @@ -194,7 +163,19 @@ Host was already removed from flake.nix and VM destroyed. Configuration cleaned Host configuration, services, and VM already removed. -## Phase 6: Decommission ca Host ✓ COMPLETE +### pgdb1 + +Only consumer was Open WebUI on gunter, which is being migrated to use local PostgreSQL. + +1. Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1) +2. Remove host configuration from `hosts/pgdb1/` +3. Remove `services/postgres/` (only used by pgdb1) +4. Remove from `flake.nix` +5. Remove Vault AppRole from `terraform/vault/approle.tf` +6. Destroy the VM in Proxmox +7. Commit cleanup + +## Phase 5: Decommission ca Host ✓ COMPLETE ~~Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following @@ -202,7 +183,7 @@ the same cleanup steps as the jump host.~~ PKI migration to OpenBao complete. Host configuration, `services/ca/`, and VM removed. -## Phase 7: Remove sops-nix ✓ COMPLETE +## Phase 6: Remove sops-nix ✓ COMPLETE ~~Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove all remnants:~~