docs: update for sops-to-openbao migration completion
Some checks failed
Run nix flake check / flake-check (push) Failing after 18m17s
Some checks failed
Run nix flake check / flake-check (push) Failing after 18m17s
Update CLAUDE.md and README.md to reflect that secrets are now managed by OpenBao, with sops only remaining for ca. Update migration plans with sops cleanup checklist and auth01 decommission. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
51
CLAUDE.md
51
CLAUDE.md
@@ -52,7 +52,12 @@ nix develop
|
|||||||
|
|
||||||
### Secrets Management
|
### Secrets Management
|
||||||
|
|
||||||
Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the
|
||||||
|
`vault.secrets` option defined in `system/vault-secrets.nix` to fetch secrets at boot.
|
||||||
|
Terraform manages the secrets and AppRole policies in `terraform/vault/`.
|
||||||
|
|
||||||
|
Legacy sops-nix is still present but only actively used by the `ca` host. Do not edit any
|
||||||
|
`.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
||||||
|
|
||||||
### Git Workflow
|
### Git Workflow
|
||||||
|
|
||||||
@@ -119,7 +124,7 @@ This ensures documentation matches the exact nixpkgs version (currently NixOS 25
|
|||||||
- `default.nix` - Entry point, imports configuration.nix and services
|
- `default.nix` - Entry point, imports configuration.nix and services
|
||||||
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
||||||
- `/system/` - Shared system-level configurations applied to ALL hosts
|
- `/system/` - Shared system-level configurations applied to ALL hosts
|
||||||
- Core modules: nix.nix, sshd.nix, sops.nix, acme.nix, autoupgrade.nix
|
- Core modules: nix.nix, sshd.nix, sops.nix (legacy), vault-secrets.nix, acme.nix, autoupgrade.nix
|
||||||
- Monitoring: node-exporter and promtail on every host
|
- Monitoring: node-exporter and promtail on every host
|
||||||
- `/modules/` - Custom NixOS modules
|
- `/modules/` - Custom NixOS modules
|
||||||
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
|
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
|
||||||
@@ -131,13 +136,13 @@ This ensures documentation matches the exact nixpkgs version (currently NixOS 25
|
|||||||
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
||||||
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
||||||
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
||||||
- `/secrets/` - SOPS-encrypted secrets with age encryption
|
- `/secrets/` - SOPS-encrypted secrets with age encryption (legacy, only used by ca)
|
||||||
- `/common/` - Shared configurations (e.g., VM guest agent)
|
- `/common/` - Shared configurations (e.g., VM guest agent)
|
||||||
- `/docs/` - Documentation and plans
|
- `/docs/` - Documentation and plans
|
||||||
- `plans/` - Future plans and proposals
|
- `plans/` - Future plans and proposals
|
||||||
- `plans/completed/` - Completed plans (moved here when done)
|
- `plans/completed/` - Completed plans (moved here when done)
|
||||||
- `/playbooks/` - Ansible playbooks for fleet management
|
- `/playbooks/` - Ansible playbooks for fleet management
|
||||||
- `/.sops.yaml` - SOPS configuration with age keys for all servers
|
- `/.sops.yaml` - SOPS configuration with age keys (legacy, only used by ca)
|
||||||
|
|
||||||
### Configuration Inheritance
|
### Configuration Inheritance
|
||||||
|
|
||||||
@@ -153,7 +158,7 @@ hosts/<hostname>/default.nix
|
|||||||
All hosts automatically get:
|
All hosts automatically get:
|
||||||
- Nix binary cache (nix-cache.home.2rjus.net)
|
- Nix binary cache (nix-cache.home.2rjus.net)
|
||||||
- SSH with root login enabled
|
- SSH with root login enabled
|
||||||
- SOPS secrets management with auto-generated age keys
|
- OpenBao (Vault) secrets management via AppRole
|
||||||
- Internal ACME CA integration (ca.home.2rjus.net)
|
- Internal ACME CA integration (ca.home.2rjus.net)
|
||||||
- Daily auto-upgrades with auto-reboot
|
- Daily auto-upgrades with auto-reboot
|
||||||
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
||||||
@@ -173,7 +178,6 @@ Production servers managed by `rebuild-all.sh`:
|
|||||||
- `nix-cache01` - Binary cache server
|
- `nix-cache01` - Binary cache server
|
||||||
- `pgdb1` - PostgreSQL database
|
- `pgdb1` - PostgreSQL database
|
||||||
- `nats1` - NATS messaging server
|
- `nats1` - NATS messaging server
|
||||||
- `auth01` - Authentication service
|
|
||||||
|
|
||||||
Template/test hosts:
|
Template/test hosts:
|
||||||
- `template1` - Base template for cloning new hosts
|
- `template1` - Base template for cloning new hosts
|
||||||
@@ -182,7 +186,7 @@ Template/test hosts:
|
|||||||
|
|
||||||
- `nixpkgs` - NixOS 25.11 stable (primary)
|
- `nixpkgs` - NixOS 25.11 stable (primary)
|
||||||
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
||||||
- `sops-nix` - Secrets management
|
- `sops-nix` - Secrets management (legacy, only used by ca)
|
||||||
- Custom packages from git.t-juice.club:
|
- Custom packages from git.t-juice.club:
|
||||||
- `alerttonotify` - Alert routing
|
- `alerttonotify` - Alert routing
|
||||||
- `labmon` - Lab monitoring
|
- `labmon` - Lab monitoring
|
||||||
@@ -198,12 +202,21 @@ Template/test hosts:
|
|||||||
|
|
||||||
### Secrets Management
|
### Secrets Management
|
||||||
|
|
||||||
- Uses SOPS with age encryption
|
Most hosts use OpenBao (Vault) for secrets:
|
||||||
- Each server has unique age key in `.sops.yaml`
|
- Vault server at `vault01.home.2rjus.net:8200`
|
||||||
- Keys auto-generated at `/var/lib/sops-nix/key.txt` on first boot
|
- AppRole authentication with credentials at `/var/lib/vault/approle/`
|
||||||
|
- Secrets defined in Terraform (`terraform/vault/secrets.tf`)
|
||||||
|
- AppRole policies in Terraform (`terraform/vault/approle.tf`)
|
||||||
|
- NixOS module: `system/vault-secrets.nix` with `vault.secrets.<name>` options
|
||||||
|
- `extractKey` option extracts a single key from vault JSON as a plain file
|
||||||
|
- Secrets fetched at boot by `vault-secret-<name>.service` systemd units
|
||||||
|
- Fallback to cached secrets in `/var/lib/vault/cache/` when Vault is unreachable
|
||||||
|
- Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
||||||
|
|
||||||
|
Legacy SOPS (only used by `ca` host):
|
||||||
|
- SOPS with age encryption, keys in `.sops.yaml`
|
||||||
- Shared secrets: `/secrets/secrets.yaml`
|
- Shared secrets: `/secrets/secrets.yaml`
|
||||||
- Per-host secrets: `/secrets/<hostname>/`
|
- Per-host secrets: `/secrets/<hostname>/`
|
||||||
- All production servers can decrypt shared secrets; host-specific secrets require specific host keys
|
|
||||||
|
|
||||||
### Auto-Upgrade System
|
### Auto-Upgrade System
|
||||||
|
|
||||||
@@ -303,13 +316,15 @@ This means:
|
|||||||
3. Add host entry to `flake.nix` nixosConfigurations
|
3. Add host entry to `flake.nix` nixosConfigurations
|
||||||
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
|
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
|
||||||
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
|
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
|
||||||
6. User clones template host
|
6. Add `vault.enable = true;` to the host configuration
|
||||||
7. User runs `prepare-host.sh` on new host, this deletes files which should be regenerated, like ssh host keys, machine-id etc. It also creates a new age key, and prints the public key
|
7. Add AppRole policy in `terraform/vault/approle.tf` and any secrets in `secrets.tf`
|
||||||
8. This key is then added to `.sops.yaml`
|
8. Run `tofu apply` in `terraform/vault/`
|
||||||
9. Create `/secrets/<hostname>/` if needed
|
9. User clones template host
|
||||||
10. Commit changes, and merge to master.
|
10. User runs `prepare-host.sh` on new host
|
||||||
11. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
11. Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
||||||
12. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
|
12. Commit changes, and merge to master.
|
||||||
|
13. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
||||||
|
14. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
|
||||||
|
|
||||||
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
||||||
|
|
||||||
|
|||||||
@@ -15,7 +15,6 @@ NixOS Flake-based configuration repository for a homelab infrastructure. All hos
|
|||||||
| `nix-cache01` | Nix binary cache |
|
| `nix-cache01` | Nix binary cache |
|
||||||
| `pgdb1` | PostgreSQL |
|
| `pgdb1` | PostgreSQL |
|
||||||
| `nats1` | NATS messaging |
|
| `nats1` | NATS messaging |
|
||||||
| `auth01` | Authentication (LLDAP + Authelia) |
|
|
||||||
| `vault01` | OpenBao (Vault) secrets management |
|
| `vault01` | OpenBao (Vault) secrets management |
|
||||||
| `template1`, `template2` | VM templates for cloning new hosts |
|
| `template1`, `template2` | VM templates for cloning new hosts |
|
||||||
|
|
||||||
@@ -28,7 +27,7 @@ system/ # Shared modules applied to ALL hosts
|
|||||||
services/ # Reusable service modules, selectively imported per host
|
services/ # Reusable service modules, selectively imported per host
|
||||||
modules/ # Custom NixOS module definitions
|
modules/ # Custom NixOS module definitions
|
||||||
lib/ # Nix library functions (DNS zone generation, etc.)
|
lib/ # Nix library functions (DNS zone generation, etc.)
|
||||||
secrets/ # SOPS-encrypted secrets (age encryption)
|
secrets/ # SOPS-encrypted secrets (legacy, only used by ca)
|
||||||
common/ # Shared configurations (e.g., VM guest agent)
|
common/ # Shared configurations (e.g., VM guest agent)
|
||||||
terraform/ # OpenTofu configs for Proxmox VM provisioning
|
terraform/ # OpenTofu configs for Proxmox VM provisioning
|
||||||
terraform/vault/ # OpenTofu configs for OpenBao (secrets, PKI, AppRoles)
|
terraform/vault/ # OpenTofu configs for OpenBao (secrets, PKI, AppRoles)
|
||||||
@@ -40,7 +39,7 @@ scripts/ # Helper scripts (create-host, vault-fetch)
|
|||||||
|
|
||||||
**Automatic DNS zone generation** - A records are derived from each host's static IP configuration. CNAME aliases are defined via `homelab.dns.cnames`. No manual zone file editing required.
|
**Automatic DNS zone generation** - A records are derived from each host's static IP configuration. CNAME aliases are defined via `homelab.dns.cnames`. No manual zone file editing required.
|
||||||
|
|
||||||
**SOPS secrets management** - Each host has a unique age key. Shared secrets live in `secrets/secrets.yaml`, per-host secrets in `secrets/<hostname>/`.
|
**OpenBao (Vault) secrets** - Hosts authenticate via AppRole and fetch secrets at boot. Secrets and policies are managed as code in `terraform/vault/`. Legacy SOPS remains only for the `ca` host.
|
||||||
|
|
||||||
**Daily auto-upgrades** - All hosts pull from the master branch and automatically rebuild and reboot on a randomized schedule.
|
**Daily auto-upgrades** - All hosts pull from the master branch and automatically rebuild and reboot on a randomized schedule.
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,22 @@
|
|||||||
# Sops to OpenBao Secrets Migration Plan
|
# Sops to OpenBao Secrets Migration Plan
|
||||||
|
|
||||||
## Status: In Progress
|
## Status: Complete (except ca, deferred)
|
||||||
|
|
||||||
|
## Remaining sops cleanup
|
||||||
|
|
||||||
|
The `sops-nix` flake input, `system/sops.nix`, `.sops.yaml`, and `secrets/` directory are
|
||||||
|
still present because `ca` still uses sops for its step-ca secrets (5 secrets in
|
||||||
|
`services/ca/default.nix`). The `services/authelia/` and `services/lldap/` modules also
|
||||||
|
reference sops but are only used by auth01 (decommissioned).
|
||||||
|
|
||||||
|
Once `ca` is migrated to OpenBao PKI (Phase 4c in host-migration-to-opentofu.md), remove:
|
||||||
|
- `sops-nix` input from `flake.nix`
|
||||||
|
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
||||||
|
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
||||||
|
- `system/sops.nix` and its import in `system/default.nix`
|
||||||
|
- `.sops.yaml`
|
||||||
|
- `secrets/` directory
|
||||||
|
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
|
|||||||
@@ -20,7 +20,7 @@ Hosts to migrate:
|
|||||||
| nix-cache01 | Stateless | Binary cache, recreate |
|
| nix-cache01 | Stateless | Binary cache, recreate |
|
||||||
| http-proxy | Stateless | Reverse proxy, recreate |
|
| http-proxy | Stateless | Reverse proxy, recreate |
|
||||||
| nats1 | Stateless | Messaging, recreate |
|
| nats1 | Stateless | Messaging, recreate |
|
||||||
| auth01 | Stateless | Authentication, recreate |
|
| auth01 | Decommission | No longer in use |
|
||||||
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||||
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
||||||
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
||||||
@@ -94,8 +94,7 @@ These hosts have no meaningful state and can be recreated fresh. For each host:
|
|||||||
Migrate stateless hosts in an order that minimizes disruption:
|
Migrate stateless hosts in an order that minimizes disruption:
|
||||||
|
|
||||||
1. **nix-cache01** — low risk, no downstream dependencies during migration
|
1. **nix-cache01** — low risk, no downstream dependencies during migration
|
||||||
2. **auth01** — low risk
|
2. **nats1** — low risk, verify no persistent JetStream streams first
|
||||||
3. **nats1** — low risk, verify no persistent JetStream streams first
|
|
||||||
4. **http-proxy** — brief disruption to proxied services, migrate during low-traffic window
|
4. **http-proxy** — brief disruption to proxied services, migrate during low-traffic window
|
||||||
5. **ns1, ns2** — migrate one at a time, verify DNS resolution between each
|
5. **ns1, ns2** — migrate one at a time, verify DNS resolution between each
|
||||||
|
|
||||||
@@ -168,8 +167,9 @@ OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropr
|
|||||||
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
||||||
through before starting Zigbee2MQTT on the new host.
|
through before starting Zigbee2MQTT on the new host.
|
||||||
|
|
||||||
## Phase 5: Decommission jump Host
|
## Phase 5: Decommission jump and auth01 Hosts
|
||||||
|
|
||||||
|
### jump
|
||||||
1. Verify nothing depends on the jump host (no SSH proxy configs pointing to it, etc.)
|
1. Verify nothing depends on the jump host (no SSH proxy configs pointing to it, etc.)
|
||||||
2. Remove host configuration from `hosts/jump/`
|
2. Remove host configuration from `hosts/jump/`
|
||||||
3. Remove from `flake.nix`
|
3. Remove from `flake.nix`
|
||||||
@@ -178,12 +178,37 @@ through before starting Zigbee2MQTT on the new host.
|
|||||||
6. Destroy the VM in Proxmox
|
6. Destroy the VM in Proxmox
|
||||||
7. Commit cleanup
|
7. Commit cleanup
|
||||||
|
|
||||||
|
### auth01
|
||||||
|
1. Remove host configuration from `hosts/auth01/`
|
||||||
|
2. Remove from `flake.nix`
|
||||||
|
3. Remove any secrets in `secrets/auth01/`
|
||||||
|
4. Remove from `.sops.yaml`
|
||||||
|
5. Remove `services/authelia/` and `services/lldap/` (only used by auth01)
|
||||||
|
6. Destroy the VM in Proxmox
|
||||||
|
7. Commit cleanup
|
||||||
|
|
||||||
## Phase 6: Decommission ca Host (Deferred)
|
## Phase 6: Decommission ca Host (Deferred)
|
||||||
|
|
||||||
Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
||||||
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
||||||
the same cleanup steps as the jump host.
|
the same cleanup steps as the jump host.
|
||||||
|
|
||||||
|
## Phase 7: Remove sops-nix
|
||||||
|
|
||||||
|
Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
||||||
|
all remnants:
|
||||||
|
- `sops-nix` input from `flake.nix` and `flake.lock`
|
||||||
|
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
||||||
|
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
||||||
|
- `system/sops.nix` and its import in `system/default.nix`
|
||||||
|
- `.sops.yaml`
|
||||||
|
- `secrets/` directory
|
||||||
|
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
||||||
|
- Template scripts that generate age keys for sops (`hosts/template/scripts.nix`,
|
||||||
|
`hosts/template2/scripts.nix`)
|
||||||
|
|
||||||
|
See `docs/plans/completed/sops-to-openbao-migration.md` for full context.
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Each host migration should be done individually, not in bulk, to limit blast radius
|
- Each host migration should be done individually, not in bulk, to limit blast radius
|
||||||
|
|||||||
Reference in New Issue
Block a user