docs: mark pgdb1 for decommissioning instead of migration
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s

Only consumer was Open WebUI on gunter, which will migrate to local
PostgreSQL. Removed pgdb1 backup/migration phases and added to
decommission list.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-07 22:49:53 +01:00
parent e937c68965
commit ec4ac1477e

View File

@@ -19,11 +19,10 @@ Hosts to migrate:
| nix-cache01 | Stateless | Binary cache, recreate |
| http-proxy | Stateless | Reverse proxy, recreate |
| nats1 | Stateless | Messaging, recreate |
| auth01 | Decommission | No longer in use |
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
| pgdb1 | Stateful | PostgreSQL databases |
| pgdb1 | Decommission | Only used by Open WebUI on gunter, migrating to local postgres |
| ~~jump~~ | ~~Decommission~~ | ✓ Complete |
| ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
| ~~ca~~ | ~~Deferred~~ | ✓ Complete |
@@ -46,39 +45,19 @@ No backup currently exists. Add a restic backup job for `/var/lib/jellyfin/` whi
Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup.
The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates.
### 1c. Add PostgreSQL Backup to pgdb1
No backup currently exists. Add a restic backup job with a `pg_dumpall` pre-hook to capture
all databases and roles. The dump should be piped through restic's stdin backup (similar to
the Grafana DB dump pattern on monitoring01).
### 1d. Verify Existing ha1 Backup
### 1c. Verify Existing ha1 Backup
ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify
these backups are current and restorable before proceeding with migration.
### 1e. Verify All Backups
### 1d. Verify All Backups
After adding/expanding backup jobs:
1. Trigger a manual backup run on each host
2. Verify backup integrity with `restic check`
3. Test a restore to a temporary location to confirm data is recoverable
## Phase 2: Declare pgdb1 Databases in Nix
Before migrating pgdb1, audit the manually-created databases and users on the running
instance, then declare them in the Nix configuration using `ensureDatabases` and
`ensureUsers`. This makes the PostgreSQL setup reproducible on the new host.
Steps:
1. SSH to pgdb1, run `\l` and `\du` in psql to list databases and roles
2. Add `ensureDatabases` and `ensureUsers` to `services/postgres/postgres.nix`
3. Document any non-default PostgreSQL settings or extensions per database
After reprovisioning, the databases will be created by NixOS, and data restored from the
`pg_dumpall` backup.
## Phase 3: Stateless Host Migration
## Phase 2: Stateless Host Migration
These hosts have no meaningful state and can be recreated fresh. For each host:
@@ -102,7 +81,7 @@ Migrate stateless hosts in an order that minimizes disruption:
migration complete. All hosts use both ns1 and ns2 as resolvers, so ns1 being down briefly
during migration is tolerable.
## Phase 4: Stateful Host Migration
## Phase 3: Stateful Host Migration
For each stateful host, the procedure is:
@@ -115,17 +94,7 @@ For each stateful host, the procedure is:
7. Start services and verify functionality
8. Decommission the old VM
### 4a. pgdb1
1. Run final `pg_dumpall` backup via restic
2. Stop PostgreSQL on the old host
3. Provision new pgdb1 via OpenTofu
4. After bootstrap, NixOS creates the declared databases/users
5. Restore data with `pg_restore` or `psql < dumpall.sql`
6. Verify database connectivity from gunter (`10.69.30.105`)
7. Decommission old VM
### 4b. monitoring01
### 3a. monitoring01
1. Run final Grafana backup
2. Provision new monitoring01 via OpenTofu
@@ -135,7 +104,7 @@ For each stateful host, the procedure is:
6. Verify all scrape targets are being collected
7. Decommission old VM
### 4c. jelly01
### 3b. jelly01
1. Run final Jellyfin backup
2. Provision new jelly01 via OpenTofu
@@ -144,7 +113,7 @@ For each stateful host, the procedure is:
5. Start Jellyfin, verify watch history and library metadata are present
6. Decommission old VM
### 4d. ha1
### 3c. ha1
1. Verify latest restic backup is current
2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host
@@ -168,7 +137,7 @@ OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropr
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
through before starting Zigbee2MQTT on the new host.
## Phase 5: Decommission jump and auth01 Hosts
## Phase 4: Decommission Hosts
### jump ✓ COMPLETE
@@ -194,7 +163,19 @@ Host was already removed from flake.nix and VM destroyed. Configuration cleaned
Host configuration, services, and VM already removed.
## Phase 6: Decommission ca Host ✓ COMPLETE
### pgdb1
Only consumer was Open WebUI on gunter, which is being migrated to use local PostgreSQL.
1. Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)
2. Remove host configuration from `hosts/pgdb1/`
3. Remove `services/postgres/` (only used by pgdb1)
4. Remove from `flake.nix`
5. Remove Vault AppRole from `terraform/vault/approle.tf`
6. Destroy the VM in Proxmox
7. Commit cleanup
## Phase 5: Decommission ca Host ✓ COMPLETE
~~Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
@@ -202,7 +183,7 @@ the same cleanup steps as the jump host.~~
PKI migration to OpenBao complete. Host configuration, `services/ca/`, and VM removed.
## Phase 7: Remove sops-nix ✓ COMPLETE
## Phase 6: Remove sops-nix ✓ COMPLETE
~~Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
all remnants:~~