docs: mark pgdb1 for decommissioning instead of migration
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Only consumer was Open WebUI on gunter, which will migrate to local PostgreSQL. Removed pgdb1 backup/migration phases and added to decommission list. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -19,11 +19,10 @@ Hosts to migrate:
|
||||
| nix-cache01 | Stateless | Binary cache, recreate |
|
||||
| http-proxy | Stateless | Reverse proxy, recreate |
|
||||
| nats1 | Stateless | Messaging, recreate |
|
||||
| auth01 | Decommission | No longer in use |
|
||||
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
||||
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
||||
| pgdb1 | Stateful | PostgreSQL databases |
|
||||
| pgdb1 | Decommission | Only used by Open WebUI on gunter, migrating to local postgres |
|
||||
| ~~jump~~ | ~~Decommission~~ | ✓ Complete |
|
||||
| ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
|
||||
| ~~ca~~ | ~~Deferred~~ | ✓ Complete |
|
||||
@@ -46,39 +45,19 @@ No backup currently exists. Add a restic backup job for `/var/lib/jellyfin/` whi
|
||||
Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup.
|
||||
The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates.
|
||||
|
||||
### 1c. Add PostgreSQL Backup to pgdb1
|
||||
|
||||
No backup currently exists. Add a restic backup job with a `pg_dumpall` pre-hook to capture
|
||||
all databases and roles. The dump should be piped through restic's stdin backup (similar to
|
||||
the Grafana DB dump pattern on monitoring01).
|
||||
|
||||
### 1d. Verify Existing ha1 Backup
|
||||
### 1c. Verify Existing ha1 Backup
|
||||
|
||||
ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify
|
||||
these backups are current and restorable before proceeding with migration.
|
||||
|
||||
### 1e. Verify All Backups
|
||||
### 1d. Verify All Backups
|
||||
|
||||
After adding/expanding backup jobs:
|
||||
1. Trigger a manual backup run on each host
|
||||
2. Verify backup integrity with `restic check`
|
||||
3. Test a restore to a temporary location to confirm data is recoverable
|
||||
|
||||
## Phase 2: Declare pgdb1 Databases in Nix
|
||||
|
||||
Before migrating pgdb1, audit the manually-created databases and users on the running
|
||||
instance, then declare them in the Nix configuration using `ensureDatabases` and
|
||||
`ensureUsers`. This makes the PostgreSQL setup reproducible on the new host.
|
||||
|
||||
Steps:
|
||||
1. SSH to pgdb1, run `\l` and `\du` in psql to list databases and roles
|
||||
2. Add `ensureDatabases` and `ensureUsers` to `services/postgres/postgres.nix`
|
||||
3. Document any non-default PostgreSQL settings or extensions per database
|
||||
|
||||
After reprovisioning, the databases will be created by NixOS, and data restored from the
|
||||
`pg_dumpall` backup.
|
||||
|
||||
## Phase 3: Stateless Host Migration
|
||||
## Phase 2: Stateless Host Migration
|
||||
|
||||
These hosts have no meaningful state and can be recreated fresh. For each host:
|
||||
|
||||
@@ -102,7 +81,7 @@ Migrate stateless hosts in an order that minimizes disruption:
|
||||
migration complete. All hosts use both ns1 and ns2 as resolvers, so ns1 being down briefly
|
||||
during migration is tolerable.
|
||||
|
||||
## Phase 4: Stateful Host Migration
|
||||
## Phase 3: Stateful Host Migration
|
||||
|
||||
For each stateful host, the procedure is:
|
||||
|
||||
@@ -115,17 +94,7 @@ For each stateful host, the procedure is:
|
||||
7. Start services and verify functionality
|
||||
8. Decommission the old VM
|
||||
|
||||
### 4a. pgdb1
|
||||
|
||||
1. Run final `pg_dumpall` backup via restic
|
||||
2. Stop PostgreSQL on the old host
|
||||
3. Provision new pgdb1 via OpenTofu
|
||||
4. After bootstrap, NixOS creates the declared databases/users
|
||||
5. Restore data with `pg_restore` or `psql < dumpall.sql`
|
||||
6. Verify database connectivity from gunter (`10.69.30.105`)
|
||||
7. Decommission old VM
|
||||
|
||||
### 4b. monitoring01
|
||||
### 3a. monitoring01
|
||||
|
||||
1. Run final Grafana backup
|
||||
2. Provision new monitoring01 via OpenTofu
|
||||
@@ -135,7 +104,7 @@ For each stateful host, the procedure is:
|
||||
6. Verify all scrape targets are being collected
|
||||
7. Decommission old VM
|
||||
|
||||
### 4c. jelly01
|
||||
### 3b. jelly01
|
||||
|
||||
1. Run final Jellyfin backup
|
||||
2. Provision new jelly01 via OpenTofu
|
||||
@@ -144,7 +113,7 @@ For each stateful host, the procedure is:
|
||||
5. Start Jellyfin, verify watch history and library metadata are present
|
||||
6. Decommission old VM
|
||||
|
||||
### 4d. ha1
|
||||
### 3c. ha1
|
||||
|
||||
1. Verify latest restic backup is current
|
||||
2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host
|
||||
@@ -168,7 +137,7 @@ OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropr
|
||||
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
||||
through before starting Zigbee2MQTT on the new host.
|
||||
|
||||
## Phase 5: Decommission jump and auth01 Hosts
|
||||
## Phase 4: Decommission Hosts
|
||||
|
||||
### jump ✓ COMPLETE
|
||||
|
||||
@@ -194,7 +163,19 @@ Host was already removed from flake.nix and VM destroyed. Configuration cleaned
|
||||
|
||||
Host configuration, services, and VM already removed.
|
||||
|
||||
## Phase 6: Decommission ca Host ✓ COMPLETE
|
||||
### pgdb1
|
||||
|
||||
Only consumer was Open WebUI on gunter, which is being migrated to use local PostgreSQL.
|
||||
|
||||
1. Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)
|
||||
2. Remove host configuration from `hosts/pgdb1/`
|
||||
3. Remove `services/postgres/` (only used by pgdb1)
|
||||
4. Remove from `flake.nix`
|
||||
5. Remove Vault AppRole from `terraform/vault/approle.tf`
|
||||
6. Destroy the VM in Proxmox
|
||||
7. Commit cleanup
|
||||
|
||||
## Phase 5: Decommission ca Host ✓ COMPLETE
|
||||
|
||||
~~Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
||||
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
||||
@@ -202,7 +183,7 @@ the same cleanup steps as the jump host.~~
|
||||
|
||||
PKI migration to OpenBao complete. Host configuration, `services/ca/`, and VM removed.
|
||||
|
||||
## Phase 7: Remove sops-nix ✓ COMPLETE
|
||||
## Phase 6: Remove sops-nix ✓ COMPLETE
|
||||
|
||||
~~Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
||||
all remnants:~~
|
||||
|
||||
Reference in New Issue
Block a user