docs: mark pgdb1 for decommissioning instead of migration
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Only consumer was Open WebUI on gunter, which will migrate to local PostgreSQL. Removed pgdb1 backup/migration phases and added to decommission list. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -19,11 +19,10 @@ Hosts to migrate:
|
|||||||
| nix-cache01 | Stateless | Binary cache, recreate |
|
| nix-cache01 | Stateless | Binary cache, recreate |
|
||||||
| http-proxy | Stateless | Reverse proxy, recreate |
|
| http-proxy | Stateless | Reverse proxy, recreate |
|
||||||
| nats1 | Stateless | Messaging, recreate |
|
| nats1 | Stateless | Messaging, recreate |
|
||||||
| auth01 | Decommission | No longer in use |
|
|
||||||
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||||
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
||||||
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
||||||
| pgdb1 | Stateful | PostgreSQL databases |
|
| pgdb1 | Decommission | Only used by Open WebUI on gunter, migrating to local postgres |
|
||||||
| ~~jump~~ | ~~Decommission~~ | ✓ Complete |
|
| ~~jump~~ | ~~Decommission~~ | ✓ Complete |
|
||||||
| ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
|
| ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
|
||||||
| ~~ca~~ | ~~Deferred~~ | ✓ Complete |
|
| ~~ca~~ | ~~Deferred~~ | ✓ Complete |
|
||||||
@@ -46,39 +45,19 @@ No backup currently exists. Add a restic backup job for `/var/lib/jellyfin/` whi
|
|||||||
Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup.
|
Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup.
|
||||||
The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates.
|
The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates.
|
||||||
|
|
||||||
### 1c. Add PostgreSQL Backup to pgdb1
|
### 1c. Verify Existing ha1 Backup
|
||||||
|
|
||||||
No backup currently exists. Add a restic backup job with a `pg_dumpall` pre-hook to capture
|
|
||||||
all databases and roles. The dump should be piped through restic's stdin backup (similar to
|
|
||||||
the Grafana DB dump pattern on monitoring01).
|
|
||||||
|
|
||||||
### 1d. Verify Existing ha1 Backup
|
|
||||||
|
|
||||||
ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify
|
ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify
|
||||||
these backups are current and restorable before proceeding with migration.
|
these backups are current and restorable before proceeding with migration.
|
||||||
|
|
||||||
### 1e. Verify All Backups
|
### 1d. Verify All Backups
|
||||||
|
|
||||||
After adding/expanding backup jobs:
|
After adding/expanding backup jobs:
|
||||||
1. Trigger a manual backup run on each host
|
1. Trigger a manual backup run on each host
|
||||||
2. Verify backup integrity with `restic check`
|
2. Verify backup integrity with `restic check`
|
||||||
3. Test a restore to a temporary location to confirm data is recoverable
|
3. Test a restore to a temporary location to confirm data is recoverable
|
||||||
|
|
||||||
## Phase 2: Declare pgdb1 Databases in Nix
|
## Phase 2: Stateless Host Migration
|
||||||
|
|
||||||
Before migrating pgdb1, audit the manually-created databases and users on the running
|
|
||||||
instance, then declare them in the Nix configuration using `ensureDatabases` and
|
|
||||||
`ensureUsers`. This makes the PostgreSQL setup reproducible on the new host.
|
|
||||||
|
|
||||||
Steps:
|
|
||||||
1. SSH to pgdb1, run `\l` and `\du` in psql to list databases and roles
|
|
||||||
2. Add `ensureDatabases` and `ensureUsers` to `services/postgres/postgres.nix`
|
|
||||||
3. Document any non-default PostgreSQL settings or extensions per database
|
|
||||||
|
|
||||||
After reprovisioning, the databases will be created by NixOS, and data restored from the
|
|
||||||
`pg_dumpall` backup.
|
|
||||||
|
|
||||||
## Phase 3: Stateless Host Migration
|
|
||||||
|
|
||||||
These hosts have no meaningful state and can be recreated fresh. For each host:
|
These hosts have no meaningful state and can be recreated fresh. For each host:
|
||||||
|
|
||||||
@@ -102,7 +81,7 @@ Migrate stateless hosts in an order that minimizes disruption:
|
|||||||
migration complete. All hosts use both ns1 and ns2 as resolvers, so ns1 being down briefly
|
migration complete. All hosts use both ns1 and ns2 as resolvers, so ns1 being down briefly
|
||||||
during migration is tolerable.
|
during migration is tolerable.
|
||||||
|
|
||||||
## Phase 4: Stateful Host Migration
|
## Phase 3: Stateful Host Migration
|
||||||
|
|
||||||
For each stateful host, the procedure is:
|
For each stateful host, the procedure is:
|
||||||
|
|
||||||
@@ -115,17 +94,7 @@ For each stateful host, the procedure is:
|
|||||||
7. Start services and verify functionality
|
7. Start services and verify functionality
|
||||||
8. Decommission the old VM
|
8. Decommission the old VM
|
||||||
|
|
||||||
### 4a. pgdb1
|
### 3a. monitoring01
|
||||||
|
|
||||||
1. Run final `pg_dumpall` backup via restic
|
|
||||||
2. Stop PostgreSQL on the old host
|
|
||||||
3. Provision new pgdb1 via OpenTofu
|
|
||||||
4. After bootstrap, NixOS creates the declared databases/users
|
|
||||||
5. Restore data with `pg_restore` or `psql < dumpall.sql`
|
|
||||||
6. Verify database connectivity from gunter (`10.69.30.105`)
|
|
||||||
7. Decommission old VM
|
|
||||||
|
|
||||||
### 4b. monitoring01
|
|
||||||
|
|
||||||
1. Run final Grafana backup
|
1. Run final Grafana backup
|
||||||
2. Provision new monitoring01 via OpenTofu
|
2. Provision new monitoring01 via OpenTofu
|
||||||
@@ -135,7 +104,7 @@ For each stateful host, the procedure is:
|
|||||||
6. Verify all scrape targets are being collected
|
6. Verify all scrape targets are being collected
|
||||||
7. Decommission old VM
|
7. Decommission old VM
|
||||||
|
|
||||||
### 4c. jelly01
|
### 3b. jelly01
|
||||||
|
|
||||||
1. Run final Jellyfin backup
|
1. Run final Jellyfin backup
|
||||||
2. Provision new jelly01 via OpenTofu
|
2. Provision new jelly01 via OpenTofu
|
||||||
@@ -144,7 +113,7 @@ For each stateful host, the procedure is:
|
|||||||
5. Start Jellyfin, verify watch history and library metadata are present
|
5. Start Jellyfin, verify watch history and library metadata are present
|
||||||
6. Decommission old VM
|
6. Decommission old VM
|
||||||
|
|
||||||
### 4d. ha1
|
### 3c. ha1
|
||||||
|
|
||||||
1. Verify latest restic backup is current
|
1. Verify latest restic backup is current
|
||||||
2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host
|
2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host
|
||||||
@@ -168,7 +137,7 @@ OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropr
|
|||||||
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
||||||
through before starting Zigbee2MQTT on the new host.
|
through before starting Zigbee2MQTT on the new host.
|
||||||
|
|
||||||
## Phase 5: Decommission jump and auth01 Hosts
|
## Phase 4: Decommission Hosts
|
||||||
|
|
||||||
### jump ✓ COMPLETE
|
### jump ✓ COMPLETE
|
||||||
|
|
||||||
@@ -194,7 +163,19 @@ Host was already removed from flake.nix and VM destroyed. Configuration cleaned
|
|||||||
|
|
||||||
Host configuration, services, and VM already removed.
|
Host configuration, services, and VM already removed.
|
||||||
|
|
||||||
## Phase 6: Decommission ca Host ✓ COMPLETE
|
### pgdb1
|
||||||
|
|
||||||
|
Only consumer was Open WebUI on gunter, which is being migrated to use local PostgreSQL.
|
||||||
|
|
||||||
|
1. Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)
|
||||||
|
2. Remove host configuration from `hosts/pgdb1/`
|
||||||
|
3. Remove `services/postgres/` (only used by pgdb1)
|
||||||
|
4. Remove from `flake.nix`
|
||||||
|
5. Remove Vault AppRole from `terraform/vault/approle.tf`
|
||||||
|
6. Destroy the VM in Proxmox
|
||||||
|
7. Commit cleanup
|
||||||
|
|
||||||
|
## Phase 5: Decommission ca Host ✓ COMPLETE
|
||||||
|
|
||||||
~~Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
~~Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
||||||
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
||||||
@@ -202,7 +183,7 @@ the same cleanup steps as the jump host.~~
|
|||||||
|
|
||||||
PKI migration to OpenBao complete. Host configuration, `services/ca/`, and VM removed.
|
PKI migration to OpenBao complete. Host configuration, `services/ca/`, and VM removed.
|
||||||
|
|
||||||
## Phase 7: Remove sops-nix ✓ COMPLETE
|
## Phase 6: Remove sops-nix ✓ COMPLETE
|
||||||
|
|
||||||
~~Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
~~Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
||||||
all remnants:~~
|
all remnants:~~
|
||||||
|
|||||||
Reference in New Issue
Block a user