14 Commits

Author SHA1 Message Date
b5364d2ccc flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/62c8382960464ceb98ea593cb8321a2cf8f9e3e5?narHash=sha256-kKB3bqYJU5nzYeIROI82Ef9VtTbu4uA3YydSk/Bioa8%3D' (2026-01-30)
  → 'github:nixos/nixpkgs/cb369ef2efd432b3cdf8622b0ffc0a97a02f3137?narHash=sha256-VKS4ZLNx4PNrABoB0L8KUpc1fE7CLpQXQs985tGfaCU%3D' (2026-02-02)
2026-02-03 00:01:39 +00:00
7fc69c40a6 docs: add truenas-migration plan
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m18s
Periodic flake update / flake-update (push) Successful in 1m13s
2026-02-02 18:29:11 +01:00
34a2f2ab50 docs: add infrastructure documentation
Some checks failed
Run nix flake check / flake-check (push) Failing after 11m9s
2026-02-02 17:36:55 +01:00
16b3214982 Merge pull request 'vault-setup' (#10) from vault-setup into master
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m19s
Reviewed-on: #10
2026-02-02 15:28:58 +00:00
244dd0c78b flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/fa83fd837f3098e3e678e6cf017b2b36102c7211?narHash=sha256-e7VO/kGLgRMbWtpBqdWl0uFg8Y2XWFMdz0uUJvlML8o%3D' (2026-01-28)
  → 'github:nixos/nixpkgs/41e216c0ca66c83b12ab7a98cc326b5db01db646?narHash=sha256-I7Lmgj3owOTBGuauy9FL6qdpeK2umDoe07lM4V%2BPnyA%3D' (2026-01-31)
• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/bfc1b8a4574108ceef22f02bafcf6611380c100d?narHash=sha256-msG8SU5WsBUfVVa/9RPLaymvi5bI8edTavbIq3vRlhI%3D' (2026-01-26)
  → 'github:nixos/nixpkgs/62c8382960464ceb98ea593cb8321a2cf8f9e3e5?narHash=sha256-kKB3bqYJU5nzYeIROI82Ef9VtTbu4uA3YydSk/Bioa8%3D' (2026-01-30)
• Updated input 'sops-nix':
    'github:Mic92/sops-nix/c5eebd4eb2e3372fe12a8d70a248a6ee9dd02eff?narHash=sha256-wFcr32ZqspCxk4%2BFvIxIL0AZktRs6DuF8oOsLt59YBU%3D' (2026-01-26)
  → 'github:Mic92/sops-nix/1e89149dcfc229e7e2ae24a8030f124a31e4f24f?narHash=sha256-twBMKGQvaztZQxFxbZnkg7y/50BW9yjtCBWwdjtOZew%3D' (2026-02-01)
2026-02-02 00:00:56 +00:00
238ad45c14 chore: update TODO.md
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m53s
Run nix flake check / flake-check (pull_request) Successful in 2m16s
2026-02-02 00:47:31 +01:00
c694b9889a vault: add auto-unseal
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m16s
2026-02-02 00:28:24 +01:00
3f2f91aedd terraform: add vault pki management to terraform
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
2026-02-01 23:23:03 +01:00
5d513fd5af terraform: add vault secret managment to terraform 2026-02-01 23:07:47 +01:00
b6f1e80c2a chore: run tofu fmt 2026-02-01 23:04:02 +01:00
4133eafc4e flake: add openbao to devshell
Some checks failed
Run nix flake check / flake-check (push) Failing after 18m52s
2026-02-01 22:16:52 +01:00
ace848b29c vault: replace vault with openbao 2026-02-01 22:16:52 +01:00
b012df9f34 ns: add vault01 host to zone
Some checks failed
Run nix flake check / flake-check (push) Failing after 15m40s
Periodic flake update / flake-update (push) Successful in 1m7s
2026-02-01 20:54:22 +01:00
ab053c25bd opentofu: add tmp device to vms 2026-02-01 20:54:05 +01:00
20 changed files with 1733 additions and 187 deletions

9
.gitignore vendored
View File

@@ -10,3 +10,12 @@ terraform/terraform.tfvars
terraform/*.auto.tfvars terraform/*.auto.tfvars
terraform/crash.log terraform/crash.log
terraform/crash.*.log terraform/crash.*.log
terraform/vault/.terraform/
terraform/vault/.terraform.lock.hcl
terraform/vault/*.tfstate
terraform/vault/*.tfstate.*
terraform/vault/terraform.tfvars
terraform/vault/*.auto.tfvars
terraform/vault/crash.log
terraform/vault/crash.*.log

290
TODO.md
View File

@@ -153,7 +153,9 @@ create-host \
--- ---
### Phase 4: Secrets Management with HashiCorp Vault ### Phase 4: Secrets Management with OpenBao (Vault)
**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys **Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
@@ -164,161 +166,225 @@ create-host \
4. User commits, pushes 4. User commits, pushes
5. VM can now decrypt secrets 5. VM can now decrypt secrets
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management **Selected approach:** Migrate to OpenBao (Vault fork) for centralized secrets management
**Why OpenBao instead of HashiCorp Vault:**
- HashiCorp Vault switched to BSL (Business Source License), unavailable in NixOS cache
- OpenBao is the community fork maintaining the pre-BSL MPL 2.0 license
- API-compatible with Vault, uses same Terraform provider
- Maintains all Vault features we need
**Benefits:** **Benefits:**
- Industry-standard secrets management (Vault experience transferable to work) - Industry-standard secrets management (Vault-compatible experience)
- Eliminates manual age key distribution step - Eliminates manual age key distribution step
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned) - Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA) - Centralized PKI management with ACME support (ready to replace step-ca)
- Automatic secret rotation capabilities - Automatic secret rotation capabilities
- Audit logging for all secret access - Audit logging for all secret access (not yet enabled)
- AppRole authentication enables automated bootstrap - AppRole authentication enables automated bootstrap
**Architecture:** **Current Architecture:**
``` ```
vault.home.2rjus.net vault.home.2rjus.net (10.69.13.19)
├─ KV Secrets Engine (replaces sops-nix) ├─ KV Secrets Engine (ready to replace sops-nix)
├─ PKI Engine (replaces step-ca for TLS) │ ├─ secret/hosts/{hostname}/*
├─ SSH CA Engine (replaces step-ca SSH CA) │ ├─ secret/services/{service}/*
└─ AppRole Auth (per-host authentication) └─ secret/shared/{category}/*
├─ PKI Engine (ready to replace step-ca for TLS)
│ ├─ Root CA (EC P-384, 10 year)
│ ├─ Intermediate CA (EC P-384, 5 year)
│ └─ ACME endpoint enabled
├─ SSH CA Engine (TODO: Phase 4c)
└─ AppRole Auth (per-host authentication configured)
New hosts authenticate on first boot [Phase 4d] New hosts authenticate on first boot
Fetch secrets via Vault API [Phase 4d] Fetch secrets via Vault API
No manual key distribution needed No manual key distribution needed
``` ```
**Completed:**
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
**Next Steps:**
- Phase 4c: Migrate from step-ca to OpenBao PKI
- Phase 4d: Bootstrap integration for automated secrets access
--- ---
#### Phase 4a: Vault Server Setup #### Phase 4a: Vault Server Setup ✅ COMPLETED
**Status:** ✅ Fully implemented and tested
**Completed:** 2026-02-02
**Goal:** Deploy and configure Vault server with auto-unseal **Goal:** Deploy and configure Vault server with auto-unseal
**Tasks:** **Implementation:**
- [ ] Create `hosts/vault01/` configuration - Used **OpenBao** (Vault fork) instead of HashiCorp Vault due to BSL licensing concerns
- [ ] Basic NixOS configuration (hostname, networking, etc.) - TPM2-based auto-unseal using systemd's native `LoadCredentialEncrypted`
- [ ] Vault service configuration - Self-signed bootstrap TLS certificates (avoiding circular dependency with step-ca)
- [ ] Firewall rules (8200 for API, 8201 for cluster) - File-based storage backend at `/var/lib/openbao`
- [ ] Add to flake.nix and terraform - Unix socket + TCP listener (0.0.0.0:8200) configuration
- [ ] Implement auto-unseal mechanism
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
- [ ] Use tpm2-tools to seal/unseal Vault keys
- [ ] Systemd service to unseal on boot
- [ ] **Fallback:** Shamir secret sharing with systemd automation
- [ ] Generate 3 keys, threshold 2
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
- [ ] Systemd service auto-unseals using 2 keys
- [ ] Initial Vault setup
- [ ] Initialize Vault
- [ ] Configure storage backend (integrated raft or file)
- [ ] Set up root token management
- [ ] Enable audit logging
- [ ] Deploy to infrastructure
- [ ] Add DNS entry for vault.home.2rjus.net
- [ ] Deploy VM via terraform
- [ ] Bootstrap and verify Vault is running
**Deliverable:** Running Vault server that auto-unseals on boot **Tasks:**
- [x] Create `hosts/vault01/` configuration
- [x] Basic NixOS configuration (hostname: vault01, IP: 10.69.13.19/24)
- [x] Created reusable `services/vault` module
- [x] Firewall not needed (trusted network)
- [x] Already in flake.nix, deployed via terraform
- [x] Implement auto-unseal mechanism
- [x] **TPM2-based auto-unseal** (preferred option)
- [x] systemd `LoadCredentialEncrypted` with TPM2 binding
- [x] `writeShellApplication` script with proper runtime dependencies
- [x] Reads multiple unseal keys (one per line) until unsealed
- [x] Auto-unseals on service start via `ExecStartPost`
- [x] Initial Vault setup
- [x] Initialized OpenBao with Shamir secret sharing (5 keys, threshold 3)
- [x] File storage backend
- [x] Self-signed TLS certificates via LoadCredential
- [x] Deploy to infrastructure
- [x] DNS entry added for vault.home.2rjus.net
- [x] VM deployed via terraform
- [x] Verified OpenBao running and auto-unsealing
**Changes from Original Plan:**
- Used OpenBao instead of HashiCorp Vault (licensing)
- Used systemd's native TPM2 support instead of tpm2-tools directly
- Skipped audit logging (can be enabled later)
- Used self-signed certs initially (will migrate to OpenBao PKI later)
**Deliverable:** ✅ Running OpenBao server that auto-unseals on boot using TPM2
**Documentation:**
- `/services/vault/README.md` - Service module overview
- `/docs/vault/auto-unseal.md` - Complete TPM2 auto-unseal setup guide
--- ---
#### Phase 4b: Vault-as-Code with OpenTofu #### Phase 4b: Vault-as-Code with OpenTofu ✅ COMPLETED
**Status:** ✅ Fully implemented and tested
**Completed:** 2026-02-02
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code **Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
**Implementation:**
- Complete Terraform/OpenTofu configuration in `terraform/vault/`
- Locals-based pattern (similar to `vms.tf`) for declaring secrets and policies
- Auto-generation of secrets using `random_password` provider
- Three-tier secrets path hierarchy: `hosts/`, `services/`, `shared/`
- PKI infrastructure with **Elliptic Curve certificates** (P-384 for CAs, P-256 for leaf certs)
- ACME support enabled on intermediate CA
**Tasks:** **Tasks:**
- [ ] Set up Vault Terraform provider - [x] Set up Vault Terraform provider
- [ ] Create `terraform/vault/` directory - [x] Created `terraform/vault/` directory
- [ ] Configure Vault provider (address, auth) - [x] Configured Vault provider (uses HashiCorp provider, compatible with OpenBao)
- [ ] Store Vault token securely (terraform.tfvars, gitignored) - [x] Credentials in terraform.tfvars (gitignored)
- [ ] Enable and configure secrets engines - [x] terraform.tfvars.example for reference
- [ ] Enable KV v2 secrets engine at `secret/` - [x] Enable and configure secrets engines
- [ ] Define secret path structure (per-service, per-host) - [x] KV v2 engine at `secret/`
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1` - [x] Three-tier path structure:
- [ ] Define policies as code - `secret/hosts/{hostname}/*` - Host-specific secrets
- [ ] Create policies for different service tiers - `secret/services/{service}/*` - Service-wide secrets
- [ ] Principle of least privilege (hosts only read their secrets) - `secret/shared/{category}/*` - Shared secrets (SMTP, backups, etc.)
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*` - [x] Define policies as code
- [ ] Set up AppRole authentication - [x] Policies auto-generated from `locals.host_policies`
- [ ] Enable AppRole auth backend - [x] Per-host policies with read/list on designated paths
- [ ] Create role per host type (monitoring, dns, database, etc.) - [x] Principle of least privilege enforced
- [ ] Bind policies to roles - [x] Set up AppRole authentication
- [ ] Configure TTL and token policies - [x] AppRole backend enabled at `approle/`
- [ ] Migrate existing secrets from sops-nix - [x] Roles auto-generated per host from `locals.host_policies`
- [ ] Create migration script/playbook - [x] Token TTL: 1 hour, max 24 hours
- [ ] Decrypt sops secrets and load into Vault KV - [x] Policies bound to roles
- [ ] Verify all secrets migrated successfully - [x] Implement secrets-as-code patterns
- [ ] Keep sops as backup during transition - [x] Auto-generated secrets using `random_password` provider
- [ ] Implement secrets-as-code patterns - [x] Manual secrets supported via variables in terraform.tfvars
- [ ] Secret values in gitignored terraform.tfvars - [x] Secret structure versioned in .tf files
- [ ] Or use random_password for auto-generated secrets - [x] Secret values excluded from git
- [ ] Secret structure/paths in version-controlled .tf files - [x] Set up PKI infrastructure
- [x] Root CA (10 year TTL, EC P-384)
- [x] Intermediate CA (5 year TTL, EC P-384)
- [x] PKI role for `*.home.2rjus.net` (30 day max TTL, EC P-256)
- [x] ACME enabled on intermediate CA
- [x] Support for static certificate issuance via Terraform
- [x] CRL, OCSP, and issuing certificate URLs configured
**Example OpenTofu:** **Changes from Original Plan:**
```hcl - Used Elliptic Curve instead of RSA for all certificates (better performance, smaller keys)
resource "vault_kv_secret_v2" "monitoring_grafana" { - Implemented PKI infrastructure in Phase 4b instead of Phase 4c (more logical grouping)
mount = "secret" - ACME support configured immediately (ready for migration from step-ca)
name = "monitoring/grafana" - Did not migrate existing sops-nix secrets yet (deferred to gradual migration)
data_json = jsonencode({
admin_password = var.grafana_admin_password
smtp_password = var.smtp_password
})
}
resource "vault_policy" "monitoring" { **Files:**
name = "monitoring-policy" - `terraform/vault/main.tf` - Provider configuration
policy = <<EOT - `terraform/vault/variables.tf` - Variable definitions
path "secret/data/monitoring/*" { - `terraform/vault/approle.tf` - AppRole authentication (locals-based pattern)
capabilities = ["read"] - `terraform/vault/pki.tf` - PKI infrastructure with EC certificates
} - `terraform/vault/secrets.tf` - KV secrets engine (auto-generation support)
EOT - `terraform/vault/README.md` - Complete documentation and usage examples
} - `terraform/vault/terraform.tfvars.example` - Example credentials
resource "vault_approle_auth_backend_role" "monitoring01" { **Deliverable:** ✅ All secrets, policies, AppRoles, and PKI managed as OpenTofu code in `terraform/vault/`
backend = "approle"
role_name = "monitoring01"
token_policies = ["monitoring-policy"]
}
```
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/` **Documentation:**
- `/terraform/vault/README.md` - Comprehensive guide covering:
- Setup and deployment
- AppRole usage and host access patterns
- PKI certificate issuance (ACME, static, manual)
- Secrets management patterns
- ACME configuration and troubleshooting
--- ---
#### Phase 4c: PKI Migration (Replace step-ca) #### Phase 4c: PKI Migration (Replace step-ca)
**Goal:** Consolidate PKI infrastructure into Vault **Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates
**Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support)
**Tasks:** **Tasks:**
- [ ] Set up Vault PKI engines - [x] Set up OpenBao PKI engines (completed in Phase 4b)
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL) - [x] Root CA (`pki/` mount, 10 year TTL, EC P-384)
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL) - [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384)
- [ ] Sign intermediate with root CA - [x] Signed intermediate with root CA
- [ ] Configure CRL and OCSP - [x] Configured CRL, OCSP, and issuing certificate URLs
- [ ] Enable ACME support - [x] Enable ACME support (completed in Phase 4b)
- [ ] Enable ACME on intermediate CA (Vault 1.14+) - [x] Enabled ACME on intermediate CA
- [ ] Create PKI role for homelab domain - [x] Created PKI role for `*.home.2rjus.net`
- [ ] Set certificate TTLs and allowed domains - [x] Set certificate TTLs (30 day max) and allowed domains
- [ ] Configure SSH CA in Vault - [x] ACME directory: `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Download and distribute root CA certificate
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
- [ ] Deploy via auto-upgrade
- [ ] Test certificate issuance
- [ ] Issue test certificate using ACME client (lego/certbot)
- [ ] Or issue static certificate via OpenBao CLI
- [ ] Verify certificate chain and trust
- [ ] Migrate vault01's own certificate
- [ ] Issue new certificate from OpenBao PKI (self-issued)
- [ ] Replace self-signed bootstrap certificate
- [ ] Update service configuration
- [ ] Migrate hosts from step-ca to OpenBao
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test on one host (non-critical service)
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Configure SSH CA in OpenBao (optional, future work)
- [ ] Enable SSH secrets engine (`ssh/` mount) - [ ] Enable SSH secrets engine (`ssh/` mount)
- [ ] Generate SSH signing keys - [ ] Generate SSH signing keys
- [ ] Create roles for host and user certificates - [ ] Create roles for host and user certificates
- [ ] Configure TTLs and allowed principals - [ ] Configure TTLs and allowed principals
- [ ] Migrate hosts from step-ca to Vault - [ ] Distribute SSH CA public key to all hosts
- [ ] Update system/acme.nix to use Vault ACME endpoint - [ ] Update sshd_config to trust OpenBao CA
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
- [ ] Test certificate issuance on one host
- [ ] Roll out to all hosts via auto-upgrade
- [ ] Migrate SSH CA trust
- [ ] Distribute Vault SSH CA public key to all hosts
- [ ] Update sshd_config to trust Vault CA
- [ ] Test SSH certificate authentication
- [ ] Decommission step-ca - [ ] Decommission step-ca
- [ ] Verify all services migrated - [ ] Verify all ACME services migrated and working
- [ ] Stop step-ca service on ca host - [ ] Stop step-ca service on ca host
- [ ] Archive step-ca configuration for backup - [ ] Archive step-ca configuration for backup
- [ ] Update documentation
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired **Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired
--- ---

282
docs/infrastructure.md Normal file
View File

@@ -0,0 +1,282 @@
# Homelab Infrastructure
This document describes the physical and virtual infrastructure components that support the NixOS-managed servers in this repository.
## Overview
The homelab consists of several core infrastructure components:
- **Proxmox VE** - Hypervisor hosting all NixOS VMs
- **TrueNAS** - Network storage and backup target
- **Ubiquiti EdgeRouter** - Primary router and gateway
- **Mikrotik Switch** - Core network switching
All NixOS configurations in this repository run as VMs on Proxmox and rely on these underlying infrastructure components.
## Network Topology
### Subnets
VLAN numbers are based on third octet of ip address.
TODO: VLAN naming is currently inconsistent across router/switch/Proxmox configurations. Need to standardize VLAN names and update all device configs to use consistent naming.
- `10.69.8.x` - Kubernetes (no longer in use)
- `10.69.12.x` - Core services
- `10.69.13.x` - NixOS VMs and core services
- `10.69.30.x` - Client network 1
- `10.69.31.x` - Clients network 2
- `10.69.99.x` - Management network
### Core Network Services
- **Gateway**: Web UI exposed on 10.69.10.1
- **DNS**: ns1 (10.69.13.5), ns2 (10.69.13.6)
- **Primary DNS Domain**: `home.2rjus.net`
## Hardware Components
### Proxmox Hypervisor
**Purpose**: Hosts all NixOS VMs defined in this repository
**Hardware**:
- CPU: AMD Ryzen 9 3900X 12-Core Processor
- RAM: 96GB (94Gi)
- Storage: 1TB NVMe SSD (nvme0n1)
**Management**:
- Web UI: `https://pve1.home.2rjus.net:8006`
- Cluster: Standalone
- Version: Proxmox VE 8.4.16 (kernel 6.8.12-18-pve)
**VM Provisioning**:
- Template VM: ID 9000 (built from `hosts/template2`)
- See `/terraform` directory for automated VM deployment using OpenTofu
**Storage**:
- ZFS pool: `rpool` on NVMe partition (nvme0n1p3)
- Total capacity: ~900GB (232GB used, 667GB available)
- Configuration: Single disk (no RAID)
- Scrub status: Last scrub completed successfully with 0 errors
**Networking**:
- Management interface: `vmbr0` - 10.69.12.75/24 (VLAN 12 - Core services)
- Physical interface: `enp9s0` (primary), `enp4s0` (unused)
- VM bridges:
- `vmbr0` - Main bridge (bridged to enp9s0)
- `vmbr0v8` - VLAN 8 (Kubernetes - deprecated)
- `vmbr0v13` - VLAN 13 (NixOS VMs and core services)
### TrueNAS
**Purpose**: Network storage, backup target, media storage
**Hardware**:
- Model: Custom build
- CPU: AMD Ryzen 5 5600G with Radeon Graphics
- RAM: 32GB (31.2 GiB)
- Disks:
- 2x Kingston SA400S37 240GB SSD (boot pool, mirrored)
- 2x Seagate ST16000NE000 16TB HDD (hdd-pool mirror-0)
- 2x WD WD80EFBX 8TB HDD (hdd-pool mirror-1)
- 2x Seagate ST8000VN004 8TB HDD (hdd-pool mirror-2)
- 1x NVMe 2TB (nvme-pool, no redundancy)
**Management**:
- Web UI: `https://nas.home.2rjus.net` (10.69.12.50)
- Hostname: `nas.home.2rjus.net`
- Version: TrueNAS-13.0-U6.1 (Core)
**Networking**:
- Primary interface: `mlxen0` - 10GbE (10Gbase-CX4) connected to sw1
- IP: 10.69.12.50/24 (VLAN 12 - Core services)
**ZFS Pools**:
- `boot-pool`: 206GB (mirrored SSDs) - 4% used
- Mirror of 2x Kingston 240GB SSDs
- Last scrub: No errors
- `hdd-pool`: 29.1TB total (3-way mirror, 28.4TB used, 658GB free) - 97% capacity
- mirror-0: 2x 16TB Seagate ST16000NE000
- mirror-1: 2x 8TB WD WD80EFBX
- mirror-2: 2x 8TB Seagate ST8000VN004
- Last scrub: No errors
- `nvme-pool`: 1.81TB (single NVMe, 70.4GB used, 1.74TB free) - 3% capacity
- Single NVMe drive, no redundancy
- Last scrub: No errors
**NFS Exports**:
- `/mnt/hdd-pool/media` - Media storage (exported to 10.69.0.0/16, used by Jellyfin)
- `/mnt/hdd-pool/virt/nfs-iso` - ISO storage for Proxmox
- `/mnt/hdd-pool/virt/kube-prod-pvc` - Kubernetes storage (deprecated)
**Jails**:
TrueNAS runs several FreeBSD jails for media management:
- nzbget - Usenet downloader
- restic-rest - Restic REST server for backups
- radarr - Movie management
- sonarr - TV show management
### Ubiquiti EdgeRouter
**Purpose**: Primary router, gateway, firewall, inter-VLAN routing
**Model**: EdgeRouter X 5-Port
**Hardware**:
- Serial: F09FC20E1A4C
**Management**:
- SSH: `ssh ubnt@10.69.10.1`
- Web UI: `https://10.69.10.1`
- Version: EdgeOS v2.0.9-hotfix.6 (build 5574651, 12/30/22)
**WAN Connection**:
- Interface: eth0
- Public IP: 84.213.73.123/20
- Gateway: 84.213.64.1
**Interface Layout**:
- **eth0**: WAN (public IP)
- **eth1**: 10.69.31.1/24 - Clients network 2
- **eth2**: Unused (down)
- **eth3**: 10.69.30.1/24 - Client network 1
- **eth4**: Trunk port to Mikrotik switch (carries all VLANs)
- eth4.8: 10.69.8.1/24 - K8S (deprecated)
- eth4.10: 10.69.10.1/24 - TRUSTED (management access)
- eth4.12: 10.69.12.1/24 - SERVER (Proxmox, TrueNAS, core services)
- eth4.13: 10.69.13.1/24 - SVC (NixOS VMs)
- eth4.21: 10.69.21.1/24 - CLIENTS
- eth4.22: 10.69.22.1/24 - WLAN (wireless clients)
- eth4.23: 10.69.23.1/24 - IOT
- eth4.99: 10.69.99.1/24 - MGMT (device management)
**Routing**:
- Default route: 0.0.0.0/0 via 84.213.64.1 (WAN gateway)
- Static route: 192.168.100.0/24 via eth0
- All internal VLANs directly connected
**DHCP Servers**:
Active DHCP pools on all networks:
- dhcp-8: VLAN 8 (K8S) - 91 addresses
- dhcp-12: VLAN 12 (SERVER) - 51 addresses
- dhcp-13: VLAN 13 (SVC) - 41 addresses
- dhcp-21: VLAN 21 (CLIENTS) - 141 addresses
- dhcp-22: VLAN 22 (WLAN) - 101 addresses
- dhcp-23: VLAN 23 (IOT) - 191 addresses
- dhcp-30: eth3 (Client network 1) - 101 addresses
- dhcp-31: eth1 (Clients network 2) - 21 addresses
- dhcp-mgmt: VLAN 99 (MGMT) - 51 addresses
**NAT/Firewall**:
- Masquerading on WAN interface (eth0)
### Mikrotik Switch
**Purpose**: Core Layer 2/3 switching
**Model**: MikroTik CRS326-24G-2S+ (24x 1GbE + 2x 10GbE SFP+)
**Hardware**:
- CPU: ARMv7 @ 800MHz
- RAM: 512MB
- Uptime: 21+ weeks
**Management**:
- Hostname: `sw1.home.2rjus.net`
- SSH access: `ssh admin@sw1.home.2rjus.net` (using gunter SSH key)
- Management IP: 10.69.99.2/24 (VLAN 99)
- Version: RouterOS 6.47.10 (long-term)
**VLANs**:
- VLAN 8: Kubernetes (deprecated)
- VLAN 12: SERVERS - Core services subnet
- VLAN 13: SVC - Services subnet
- VLAN 21: CLIENTS
- VLAN 22: WLAN - Wireless network
- VLAN 23: IOT
- VLAN 99: MGMT - Management network
**Port Layout** (active ports):
- **ether1**: Uplink to EdgeRouter (trunk, carries all VLANs)
- **ether11**: virt-mini1 (VLAN 12 - SERVERS)
- **ether12**: Home Assistant (VLAN 12 - SERVERS)
- **ether24**: Wireless AP (VLAN 22 - WLAN)
- **sfp-sfpplus1**: Media server/Jellyfin (VLAN 12) - 10Gbps, 7m copper DAC
- **sfp-sfpplus2**: TrueNAS (VLAN 12) - 10Gbps, 1m copper DAC
**Bridge Configuration**:
- All ports bridged to main bridge interface
- Hardware offloading enabled
- VLAN filtering enabled on bridge
## Backup & Disaster Recovery
### Backup Strategy
**NixOS VMs**:
- Declarative configurations in this git repository
- Secrets: SOPS-encrypted, backed up with repository
- State/data: Some hosts are backed up to nas host, but this should be improved and expanded to more hosts.
**Proxmox**:
- VM backups: Not currently implemented
**Critical Credentials**:
TODO: Document this
- OpenBao root token and unseal keys: _[offline secure storage location]_
- Proxmox root password: _[secure storage]_
- TrueNAS admin password: _[secure storage]_
- Router admin credentials: _[secure storage]_
### Disaster Recovery Procedures
**Total Infrastructure Loss**:
1. Restore Proxmox from installation media
2. Restore TrueNAS from installation media, import ZFS pools
3. Restore network configuration on EdgeRouter and Mikrotik
4. Rebuild NixOS VMs from this repository using Proxmox template
5. Restore stateful data from TrueNAS backups
6. Re-initialize OpenBao and restore from backup if needed
**Individual VM Loss**:
1. Deploy new VM from template using OpenTofu (`terraform/`)
2. Run `nixos-rebuild` with appropriate flake configuration
3. Restore any stateful data from backups
4. For vault01: follow re-provisioning steps in `docs/vault/auto-unseal.md`
**Network Device Failure**:
- EdgeRouter: _[config backup location, restoration procedure]_
- Mikrotik: _[config backup location, restoration procedure]_
## Future Additions
- Additional Proxmox nodes for clustering
- Backup Proxmox Backup Server
- Additional TrueNAS for replication
## Maintenance Notes
### Proxmox Updates
- Update schedule: manual
- Pre-update checklist: yolo
### TrueNAS Updates
- Update schedule: manual
### Network Device Updates
- EdgeRouter: manual
- Mikrotik: manual
## Monitoring
**Infrastructure Monitoring**:
TODO: Improve monitoring for physical hosts (proxmox, nas)
TODO: Improve monitoring for networking equipment
All NixOS VMs ship metrics to monitoring01 via node-exporter and logs via Promtail. See `/services/monitoring/` for the observability stack configuration.

View File

@@ -0,0 +1,151 @@
# TrueNAS Migration Planning
## Current State
### Hardware
- CPU: AMD Ryzen 5 5600G with Radeon Graphics
- RAM: 32GB
- Network: 10GbE (mlxen0)
- Software: TrueNAS-13.0-U6.1 (Core)
### Storage Status
**hdd-pool**: 29.1TB total, **28.4TB used, 658GB free (97% capacity)** ⚠️
- mirror-0: 2x Seagate ST16000NE000 16TB HDD (16TB usable)
- mirror-1: 2x WD WD80EFBX 8TB HDD (8TB usable)
- mirror-2: 2x Seagate ST8000VN004 8TB HDD (8TB usable)
## Goal
Expand storage capacity for the main hdd-pool. Since we need to add disks anyway, also evaluating whether to upgrade or replace the entire system.
## Decisions
### Migration Approach: Option 3 - Migrate to NixOS
**Decision**: Replace TrueNAS with NixOS bare metal installation
**Rationale**:
- Aligns with existing infrastructure (16+ NixOS hosts already managed in this repo)
- Declarative configuration fits homelab philosophy
- Automatic monitoring/logging integration (Prometheus + Promtail)
- Auto-upgrades via same mechanism as other hosts
- SOPS secrets management integration
- TrueNAS-specific features (WebGUI, jails) not heavily utilized
**Service migration**:
- radarr/sonarr: Native NixOS services (`services.radarr`, `services.sonarr`)
- restic-rest: `services.restic.server`
- nzbget: NixOS service or OCI container
- NFS exports: `services.nfs.server`
### Filesystem: BTRFS RAID1
**Decision**: Migrate from ZFS to BTRFS with RAID1
**Rationale**:
- **In-kernel**: No out-of-tree module issues like ZFS
- **Flexible expansion**: Add individual disks, not required to buy pairs
- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach
- **RAID level conversion**: Can convert between RAID levels in place
- Built-in checksumming, snapshots, compression (zstd)
- NixOS has good BTRFS support
**BTRFS RAID1 notes**:
- "RAID1" means 2 copies of all data
- Distributes across all available devices
- With 6+ disks, provides redundancy + capacity scaling
- RAID5/6 avoided (known issues), RAID1/10 are stable
### Hardware: Keep Existing + Add Disks
**Decision**: Retain current hardware, expand disk capacity
**Hardware to keep**:
- AMD Ryzen 5 5600G (sufficient for NAS workload)
- 32GB RAM (adequate)
- 10GbE network interface
- Chassis
**Storage architecture**:
**Bulk storage** (BTRFS RAID1 on HDDs):
- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB)
- Add: 2x new HDDs (size TBD)
- Use: Media, downloads, backups, non-critical data
- Risk tolerance: High (data mostly replaceable)
**Critical data** (small volume):
- Use 2x 240GB SSDs in mirror (BTRFS or ZFS)
- Or use 2TB NVMe for critical data
- Risk tolerance: Low (data important but small)
### Disk Purchase Decision
**Options under consideration**:
**Option A: 2x 16TB drives**
- Matches largest current drives
- Enables potential future RAID5 if desired (6x 16TB array)
- More conservative capacity increase
**Option B: 2x 20-24TB drives**
- Larger capacity headroom
- Better $/TB ratio typically
- Future-proofs better
**Initial purchase**: 2 drives (chassis has space for 2 more without modifications)
## Migration Strategy
### High-Level Plan
1. **Preparation**:
- Purchase 2x new HDDs (16TB or 20-24TB)
- Create NixOS configuration for new storage host
- Set up bare metal NixOS installation
2. **Initial BTRFS pool**:
- Install 2 new disks
- Create BTRFS filesystem in RAID1
- Mount and test NFS exports
3. **Data migration**:
- Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE
- Verify data integrity
4. **Expand pool**:
- As old ZFS pool is emptied, wipe drives and add to BTRFS pool
- Pool grows incrementally: 2 → 4 → 6 → 8 disks
- BTRFS rebalances data across new devices
5. **Service migration**:
- Set up radarr/sonarr/nzbget/restic as NixOS services
- Update NFS client mounts on consuming hosts
6. **Cutover**:
- Point consumers to new NAS host
- Decommission TrueNAS
- Repurpose hardware or keep as spare
### Migration Advantages
- **Low risk**: New pool created independently, old data remains intact during migration
- **Incremental**: Can add old disks one at a time as space allows
- **Flexible**: BTRFS handles mixed disk sizes gracefully
- **Reversible**: Keep TrueNAS running until fully validated
## Next Steps
1. Decide on disk size (16TB vs 20-24TB)
2. Purchase disks
3. Design NixOS host configuration (`hosts/nas1/`)
4. Plan detailed migration timeline
5. Document NFS export mapping (current → new)
## Open Questions
- [ ] Final decision on disk size?
- [ ] Hostname for new NAS host? (nas1? storage1?)
- [ ] IP address allocation (keep 10.69.12.50 or new IP?)
- [ ] Timeline/maintenance window for migration?

178
docs/vault/auto-unseal.md Normal file
View File

@@ -0,0 +1,178 @@
# OpenBao TPM2 Auto-Unseal Setup
This document describes the one-time setup process for enabling TPM2-based auto-unsealing on vault01.
## Overview
The auto-unseal feature uses systemd's `LoadCredentialEncrypted` with TPM2 to securely store and retrieve an unseal key. On service start, systemd automatically decrypts the credential using the VM's TPM, and the service unseals OpenBao.
## Prerequisites
- OpenBao must be initialized (`bao operator init` completed)
- You must have at least one unseal key from the initialization
- vault01 must have a TPM2 device (virtual TPM for Proxmox VMs)
## Initial Setup
Perform these steps on vault01 after deploying the service configuration:
### 1. Save Unseal Key
```bash
# Create temporary file with one of your unseal keys
echo "paste-your-unseal-key-here" > /tmp/unseal-key.txt
```
### 2. Encrypt with TPM2
```bash
# Encrypt the key using TPM2 binding
systemd-creds encrypt \
--with-key=tpm2 \
--name=unseal-key \
/tmp/unseal-key.txt \
/var/lib/openbao/unseal-key.cred
# Set proper ownership and permissions
chown openbao:openbao /var/lib/openbao/unseal-key.cred
chmod 600 /var/lib/openbao/unseal-key.cred
```
### 3. Cleanup
```bash
# Securely delete the plaintext key
shred -u /tmp/unseal-key.txt
```
### 4. Test Auto-Unseal
```bash
# Restart the service - it should auto-unseal
systemctl restart openbao
# Verify it's unsealed
bao status
# Should show: Sealed = false
```
## TPM PCR Binding
The default `--with-key=tpm2` binds the credential to PCR 7 (Secure Boot state). For stricter binding that includes firmware and boot state:
```bash
systemd-creds encrypt \
--with-key=tpm2 \
--tpm2-pcrs=0+7+14 \
--name=unseal-key \
/tmp/unseal-key.txt \
/var/lib/openbao/unseal-key.cred
```
PCR meanings:
- **PCR 0**: BIOS/UEFI firmware measurements
- **PCR 7**: Secure Boot state (UEFI variables)
- **PCR 14**: MOK (Machine Owner Key) state
**Trade-off**: Stricter PCR binding improves security but may require re-encrypting the credential after firmware updates or kernel changes.
## Re-provisioning
If you need to reprovision vault01 from scratch:
1. **Before destroying**: Back up your root token and all unseal keys (stored securely offline)
2. **After recreating the VM**:
- Initialize OpenBao: `bao operator init`
- Follow the setup steps above to encrypt a new unseal key with TPM2
3. **Restore data** (if migrating): Copy `/var/lib/openbao` from backup
## Handling System Changes
**After firmware updates, kernel updates, or boot configuration changes**, PCR values may change, causing TPM decryption to fail.
### Symptoms
- Service fails to start
- Logs show: `Failed to decrypt credentials`
- OpenBao remains sealed after reboot
### Fix
1. Unseal manually with one of your offline unseal keys:
```bash
bao operator unseal
```
2. Re-encrypt the credential with updated PCR values:
```bash
echo "your-unseal-key" > /tmp/unseal-key.txt
systemd-creds encrypt \
--with-key=tpm2 \
--name=unseal-key \
/tmp/unseal-key.txt \
/var/lib/openbao/unseal-key.cred
chown openbao:openbao /var/lib/openbao/unseal-key.cred
chmod 600 /var/lib/openbao/unseal-key.cred
shred -u /tmp/unseal-key.txt
```
3. Restart the service:
```bash
systemctl restart openbao
```
## Security Considerations
### What This Protects Against
- **Data at rest**: Vault data is encrypted and cannot be accessed without unsealing
- **VM snapshot theft**: An attacker with a VM snapshot cannot decrypt the unseal key without the TPM state
- **TPM binding**: The key can only be decrypted by the same VM with matching PCR values
### What This Does NOT Protect Against
- **Compromised host**: If an attacker gains root access to vault01 while running, they can access unsealed data
- **Boot-time attacks**: If an attacker can modify the boot process to match PCR values, they may retrieve the key
- **VM console access**: An attacker with VM console access during boot could potentially access the unsealed vault
### Recommendations
- **Keep offline backups** of root token and all unseal keys in a secure location (password manager, encrypted USB, etc.)
- **Use Shamir secret sharing**: The default 5-key threshold means even if the TPM key is compromised, an attacker needs the other keys
- **Monitor access**: Use OpenBao's audit logging to detect unauthorized access
- **Consider stricter PCR binding** (PCR 0+7+14) for production, accepting the maintenance overhead
## Troubleshooting
### Check if credential exists
```bash
ls -la /var/lib/openbao/unseal-key.cred
```
### Test credential decryption manually
```bash
# Should output your unseal key if TPM decryption works
systemd-creds decrypt /var/lib/openbao/unseal-key.cred -
```
### View service logs
```bash
journalctl -u openbao -n 50
```
### Manual unseal
```bash
bao operator unseal
# Enter one of your offline unseal keys when prompted
```
### Check TPM status
```bash
# Check if TPM2 is available
ls /dev/tpm*
# View TPM PCR values
tpm2_pcrread
```
## References
- [systemd.exec - Credentials](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Credentials)
- [systemd-creds man page](https://www.freedesktop.org/software/systemd/man/systemd-creds.html)
- [TPM2 PCR Documentation](https://uapi-group.org/specifications/specs/linux_tpm_pcr_registry/)
- [OpenBao Documentation](https://openbao.org/docs/)

18
flake.lock generated
View File

@@ -65,11 +65,11 @@
}, },
"nixpkgs": { "nixpkgs": {
"locked": { "locked": {
"lastModified": 1769598131, "lastModified": 1769900590,
"narHash": "sha256-e7VO/kGLgRMbWtpBqdWl0uFg8Y2XWFMdz0uUJvlML8o=", "narHash": "sha256-I7Lmgj3owOTBGuauy9FL6qdpeK2umDoe07lM4V+PnyA=",
"owner": "nixos", "owner": "nixos",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "fa83fd837f3098e3e678e6cf017b2b36102c7211", "rev": "41e216c0ca66c83b12ab7a98cc326b5db01db646",
"type": "github" "type": "github"
}, },
"original": { "original": {
@@ -81,11 +81,11 @@
}, },
"nixpkgs-unstable": { "nixpkgs-unstable": {
"locked": { "locked": {
"lastModified": 1769461804, "lastModified": 1770019141,
"narHash": "sha256-msG8SU5WsBUfVVa/9RPLaymvi5bI8edTavbIq3vRlhI=", "narHash": "sha256-VKS4ZLNx4PNrABoB0L8KUpc1fE7CLpQXQs985tGfaCU=",
"owner": "nixos", "owner": "nixos",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "bfc1b8a4574108ceef22f02bafcf6611380c100d", "rev": "cb369ef2efd432b3cdf8622b0ffc0a97a02f3137",
"type": "github" "type": "github"
}, },
"original": { "original": {
@@ -112,11 +112,11 @@
] ]
}, },
"locked": { "locked": {
"lastModified": 1769469829, "lastModified": 1769921679,
"narHash": "sha256-wFcr32ZqspCxk4+FvIxIL0AZktRs6DuF8oOsLt59YBU=", "narHash": "sha256-twBMKGQvaztZQxFxbZnkg7y/50BW9yjtCBWwdjtOZew=",
"owner": "Mic92", "owner": "Mic92",
"repo": "sops-nix", "repo": "sops-nix",
"rev": "c5eebd4eb2e3372fe12a8d70a248a6ee9dd02eff", "rev": "1e89149dcfc229e7e2ae24a8030f124a31e4f24f",
"type": "github" "type": "github"
}, },
"original": { "original": {

View File

@@ -380,6 +380,7 @@
packages = with pkgs; [ packages = with pkgs; [
ansible ansible
opentofu opentofu
openbao
(pkgs.callPackage ./scripts/create-host { }) (pkgs.callPackage ./scripts/create-host { })
]; ];
}; };

View File

@@ -1,7 +1,7 @@
$ORIGIN home.2rjus.net. $ORIGIN home.2rjus.net.
$TTL 1800 $TTL 1800
@ IN SOA ns1.home.2rjus.net. admin.test.2rjus.net. ( @ IN SOA ns1.home.2rjus.net. admin.test.2rjus.net. (
2063 ; serial number 2064 ; serial number
3600 ; refresh 3600 ; refresh
900 ; retry 900 ; retry
1209600 ; expire 1209600 ; expire
@@ -63,6 +63,7 @@ actions1 IN CNAME nix-cache01
pgdb1 IN A 10.69.13.16 pgdb1 IN A 10.69.13.16
nats1 IN A 10.69.13.17 nats1 IN A 10.69.13.17
auth01 IN A 10.69.13.18 auth01 IN A 10.69.13.18
vault01 IN A 10.69.13.19
; http-proxy cnames ; http-proxy cnames
nzbget IN CNAME http-proxy nzbget IN CNAME http-proxy

38
services/vault/README.md Normal file
View File

@@ -0,0 +1,38 @@
# OpenBao Service Module
NixOS service module for OpenBao (open-source Vault fork) with TPM2-based auto-unsealing.
## Features
- TLS-enabled TCP listener on `0.0.0.0:8200`
- Unix socket listener at `/run/openbao/openbao.sock`
- File-based storage at `/var/lib/openbao`
- TPM2 auto-unseal on service start
## Configuration
The module expects:
- TLS certificate: `/var/lib/openbao/cert.pem`
- TLS private key: `/var/lib/openbao/key.pem`
- TPM2-encrypted unseal key: `/var/lib/openbao/unseal-key.cred`
Certificates are loaded via systemd `LoadCredential`, and the unseal key via `LoadCredentialEncrypted`.
## Setup
For initial setup and configuration instructions, see:
- **Auto-unseal setup**: `/docs/vault/auto-unseal.md`
- **Terraform configuration**: `/terraform/vault/README.md`
## Usage
```bash
# Check seal status
bao status
# Manually seal (for maintenance)
bao operator seal
# Service will auto-unseal on restart
systemctl restart openbao
```

View File

@@ -1,8 +1,114 @@
{ ... }: { pkgs, ... }:
let
unsealScript = pkgs.writeShellApplication {
name = "openbao-unseal";
runtimeInputs = with pkgs; [
openbao
coreutils
gnugrep
getent
];
text = ''
# Set environment to use Unix socket
export BAO_ADDR='unix:///run/openbao/openbao.sock'
SOCKET_PATH="/run/openbao/openbao.sock"
CREDS_DIR="''${CREDENTIALS_DIRECTORY:-}"
# Wait for socket to exist
echo "Waiting for OpenBao socket..."
for _ in {1..30}; do
if [ -S "$SOCKET_PATH" ]; then
echo "Socket exists"
break
fi
sleep 1
done
# Wait for OpenBao to accept connections
echo "Waiting for OpenBao to be ready..."
for _ in {1..30}; do
output=$(timeout 2 bao status 2>&1 || true)
if echo "$output" | grep -q "Sealed.*false"; then
# Already unsealed
echo "OpenBao is already unsealed"
exit 0
elif echo "$output" | grep -qE "(Sealed|Initialized)"; then
# Got a valid response, OpenBao is ready (sealed)
echo "OpenBao is ready"
break
fi
sleep 1
done
# Check if already unsealed
if output=$(timeout 2 bao status 2>&1 || true); then
if echo "$output" | grep -q "Sealed.*false"; then
echo "OpenBao is already unsealed"
exit 0
fi
fi
# Unseal using the TPM-decrypted keys (one per line)
if [ -n "$CREDS_DIR" ] && [ -f "$CREDS_DIR/unseal-key" ]; then
echo "Unsealing OpenBao..."
while IFS= read -r key; do
# Skip empty lines
[ -z "$key" ] && continue
echo "Applying unseal key..."
bao operator unseal "$key"
# Check if unsealed after each key
if output=$(timeout 2 bao status 2>&1 || true); then
if echo "$output" | grep -q "Sealed.*false"; then
echo "OpenBao unsealed successfully"
exit 0
fi
fi
done < "$CREDS_DIR/unseal-key"
echo "WARNING: Applied all keys but OpenBao is still sealed"
exit 0
else
echo "WARNING: Unseal key credential not found, OpenBao remains sealed"
exit 0
fi
'';
};
in
{ {
services.vault = { services.openbao = {
enable = true; enable = true;
storageBackend = "file"; settings = {
ui = true;
storage.file.path = "/var/lib/openbao";
listener.default = {
type = "tcp";
address = "0.0.0.0:8200";
tls_cert_file = "/run/credentials/openbao.service/cert.pem";
tls_key_file = "/run/credentials/openbao.service/key.pem";
};
listener.socket = {
type = "unix";
address = "/run/openbao/openbao.sock";
};
};
};
systemd.services.openbao.serviceConfig = {
LoadCredential = [
"key.pem:/var/lib/openbao/key.pem"
"cert.pem:/var/lib/openbao/cert.pem"
];
# TPM2-encrypted unseal key (created manually, see setup instructions)
LoadCredentialEncrypted = [
"unseal-key:/var/lib/openbao/unseal-key.cred"
];
# Auto-unseal on service start
ExecStartPost = "${unsealScript}/bin/openbao-unseal";
}; };
} }

37
terraform/vault/.terraform.lock.hcl generated Normal file
View File

@@ -0,0 +1,37 @@
# This file is maintained automatically by "tofu init".
# Manual edits may be lost in future updates.
provider "registry.opentofu.org/hashicorp/random" {
version = "3.8.1"
constraints = "~> 3.6"
hashes = [
"h1:EHn3jsqOKhWjbg0X+psk0Ww96yz3N7ASqEKKuFvDFwo=",
"zh:25c458c7c676f15705e872202dad7dcd0982e4a48e7ea1800afa5fc64e77f4c8",
"zh:2edeaf6f1b20435b2f81855ad98a2e70956d473be9e52a5fdf57ccd0098ba476",
"zh:44becb9d5f75d55e36dfed0c5beabaf4c92e0a2bc61a3814d698271c646d48e7",
"zh:7699032612c3b16cc69928add8973de47b10ce81b1141f30644a0e8a895b5cd3",
"zh:86d07aa98d17703de9fbf402c89590dc1e01dbe5671dd6bc5e487eb8fe87eee0",
"zh:8c411c77b8390a49a8a1bc9f176529e6b32369dd33a723606c8533e5ca4d68c1",
"zh:a5ecc8255a612652a56b28149994985e2c4dc046e5d34d416d47fa7767f5c28f",
"zh:aea3fe1a5669b932eda9c5c72e5f327db8da707fe514aaca0d0ef60cb24892f9",
"zh:f56e26e6977f755d7ae56fa6320af96ecf4bb09580d47cb481efbf27f1c5afff",
]
}
provider "registry.opentofu.org/hashicorp/vault" {
version = "4.8.0"
constraints = "~> 4.0"
hashes = [
"h1:SQkjClJDo6SETUnq912GO8BdEExhU1ko8IG2mr4X/2A=",
"zh:0c07ef884c03083b08a54c2cf782f3ff7e124b05e7a4438a0b90a86e60c8d080",
"zh:13dcf2ed494c79e893b447249716d96b665616a868ffaf8f2c5abef07c7eee6f",
"zh:6f15a29fae3a6178e5904e3c95ba22b20f362d8ee491da816048c89f30e6b2de",
"zh:94b92a4bf7a2d250d9698a021f1ab60d1957d01b5bab81f7d9c00c2d6a9b3747",
"zh:a9e207540ef12cd2402e37b3b7567e08de14061a0a2635fd2f4fd09e0a3382aa",
"zh:b41667938ba541e8492036415b3f51fbd1758e456f6d5f0b63e26f4ad5728b21",
"zh:df0b73aff5f4b51e08fc0c273db7f677994db29a81deda66d91acfcfe3f1a370",
"zh:df904b217dc79b71a8b5f5f3ab2e52316d0f890810383721349cc10a72f7265b",
"zh:f0e0b3e6782e0126c40f05cf87ec80978c7291d90f52d7741300b5de1d9c01ba",
"zh:f8e599718b0ea22658eaa3e590671d3873aa723e7ce7d00daf3460ab41d3af14",
]
}

280
terraform/vault/README.md Normal file
View File

@@ -0,0 +1,280 @@
# OpenBao Terraform Configuration
This directory contains Terraform/OpenTofu configuration for managing OpenBao (Vault) infrastructure as code.
## Overview
Manages the following OpenBao resources:
- **AppRole Authentication**: For host-based authentication
- **PKI Infrastructure**: Root CA + Intermediate CA for TLS certificates
- **KV Secrets Engine**: Key-value secret storage (v2)
- **Policies**: Access control policies
## Setup
1. **Copy the example tfvars file:**
```bash
cp terraform.tfvars.example terraform.tfvars
```
2. **Edit `terraform.tfvars` with your OpenBao credentials:**
```hcl
vault_address = "https://vault.home.2rjus.net:8200"
vault_token = "hvs.your-root-token-here"
vault_skip_tls_verify = true
```
3. **Initialize Terraform:**
```bash
tofu init
```
4. **Review the plan:**
```bash
tofu plan
```
5. **Apply the configuration:**
```bash
tofu apply
```
## Files
- `main.tf` - Provider configuration
- `variables.tf` - Variable definitions
- `approle.tf` - AppRole authentication backend and roles
- `pki.tf` - PKI engines (root CA and intermediate CA)
- `secrets.tf` - KV secrets engine and test secrets
- `terraform.tfvars` - Credentials (gitignored)
- `terraform.tfvars.example` - Example configuration
## Resources Created
### AppRole Authentication
- AppRole backend at `approle/`
- Host-based roles and policies (defined in `locals.host_policies`)
### PKI Infrastructure
- Root CA at `pki/` (10 year TTL)
- Intermediate CA at `pki_int/` (5 year TTL)
- Role `homelab` for issuing certificates to `*.home.2rjus.net`
- Certificate max TTL: 30 days
### Secrets
- KV v2 engine at `secret/`
- Secrets and policies defined in `locals.secrets` and `locals.host_policies`
## Usage Examples
### Adding a New Host
1. **Define the host policy in `approle.tf`:**
```hcl
locals {
host_policies = {
"monitoring01" = {
paths = [
"secret/data/hosts/monitoring01/*",
"secret/data/services/prometheus/*",
]
}
}
}
```
2. **Add secrets in `secrets.tf`:**
```hcl
locals {
secrets = {
"hosts/monitoring01/grafana-admin" = {
auto_generate = true
password_length = 32
}
}
}
```
3. **Apply changes:**
```bash
tofu apply
```
4. **Get AppRole credentials:**
```bash
# Get role_id
bao read auth/approle/role/monitoring01/role-id
# Generate secret_id
bao write -f auth/approle/role/monitoring01/secret-id
```
### Issue Certificates from PKI
**Method 1: ACME (Recommended for automated services)**
First, enable ACME support:
```bash
bao write pki_int/config/acme enabled=true
```
ACME directory endpoint:
```
https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
```
Use with ACME clients (lego, certbot, cert-manager, etc.):
```bash
# Example with lego
lego --email admin@home.2rjus.net \
--dns manual \
--server https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory \
--accept-tos \
run -d test.home.2rjus.net
```
**Method 2: Static certificates via Terraform**
Define in `pki.tf`:
```hcl
locals {
static_certificates = {
"monitoring" = {
common_name = "monitoring.home.2rjus.net"
alt_names = ["grafana.home.2rjus.net", "prometheus.home.2rjus.net"]
ttl = "720h"
}
}
}
```
Terraform will auto-issue and auto-renew these certificates.
**Method 3: Manual CLI issuance**
```bash
# Issue certificate for a host
bao write pki_int/issue/homelab \
common_name="test.home.2rjus.net" \
ttl="720h"
```
### Read a secret
```bash
# Authenticate with AppRole first
bao write auth/approle/login \
role_id="..." \
secret_id="..."
# Read the test secret
bao kv get secret/test/example
```
## Managing Secrets
Secrets are defined in the `locals.secrets` block in `secrets.tf` using a declarative pattern:
### Auto-Generated Secrets (Recommended)
Most secrets can be auto-generated using the `random_password` provider:
```hcl
locals {
secrets = {
"hosts/monitoring01/grafana-admin" = {
auto_generate = true
password_length = 32
}
}
}
```
### Manual Secrets
For secrets that must have specific values (external services, etc.):
```hcl
# In variables.tf
variable "smtp_password" {
type = string
sensitive = true
}
# In secrets.tf locals block
locals {
secrets = {
"shared/smtp/credentials" = {
auto_generate = false
data = {
username = "notifications@2rjus.net"
password = var.smtp_password
server = "smtp.gmail.com"
}
}
}
}
# In terraform.tfvars
smtp_password = "super-secret-password"
```
### Path Structure
Secrets follow a three-tier hierarchy:
- `hosts/{hostname}/*` - Host-specific secrets
- `services/{service}/*` - Service-wide secrets (any host running the service)
- `shared/{category}/*` - Shared secrets (SMTP, backup, etc.)
## Security Notes
- `terraform.tfvars` is gitignored to prevent credential leakage
- Root token should be stored securely (consider using a limited admin token instead)
- `skip_tls_verify = true` is acceptable for self-signed certs in homelab
- AppRole secret_ids can be scoped to specific CIDR ranges for additional security
## Initial Setup Steps
After deploying this configuration, perform these one-time setup tasks:
### 1. Enable ACME
```bash
export BAO_ADDR='https://vault.home.2rjus.net:8200'
export BAO_TOKEN='your-root-token'
export BAO_SKIP_VERIFY=1
# Configure cluster path (required for ACME)
bao write pki_int/config/cluster path=https://vault.home.2rjus.net:8200/v1/pki_int
# Enable ACME on intermediate CA
bao write pki_int/config/acme enabled=true
# Verify ACME is enabled
curl -k https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
```
### 2. Download Root CA Certificate
For trusting the internal CA on clients:
```bash
# Download root CA certificate
bao read -field=certificate pki/cert/ca > homelab-root-ca.crt
# Install on NixOS hosts (add to system/default.nix or similar)
security.pki.certificateFiles = [ ./homelab-root-ca.crt ];
```
### 3. Test Certificate Issuance
```bash
# Manual test
bao write pki_int/issue/homelab common_name="test.home.2rjus.net" ttl="24h"
```
## Next Steps
1. Replace step-ca ACME endpoint with OpenBao in `system/acme.nix`
2. Add more AppRoles for different host types
3. Migrate existing sops-nix secrets to OpenBao KV
4. Set up SSH CA for host and user certificates
5. Configure auto-unseal for vault01

View File

@@ -0,0 +1,74 @@
# Enable AppRole auth backend
resource "vault_auth_backend" "approle" {
type = "approle"
path = "approle"
}
# Define host access policies
locals {
host_policies = {
# Example: monitoring01 host
# "monitoring01" = {
# paths = [
# "secret/data/hosts/monitoring01/*",
# "secret/data/services/prometheus/*",
# "secret/data/services/grafana/*",
# "secret/data/shared/smtp/*"
# ]
# }
# Example: ha1 host
# "ha1" = {
# paths = [
# "secret/data/hosts/ha1/*",
# "secret/data/shared/mqtt/*"
# ]
# }
# TODO: actually use this policy
"ha1" = {
paths = [
"secret/data/hosts/ha1/*",
]
}
# TODO: actually use this policy
"monitoring01" = {
paths = [
"secret/data/hosts/monitoring01/*",
]
}
}
}
# Generate policies for each host
resource "vault_policy" "host_policies" {
for_each = local.host_policies
name = "${each.key}-policy"
policy = <<EOT
%{~for path in each.value.paths~}
path "${path}" {
capabilities = ["read", "list"]
}
%{~endfor~}
EOT
}
# Generate AppRoles for each host
resource "vault_approle_auth_backend_role" "hosts" {
for_each = local.host_policies
backend = vault_auth_backend.approle.path
role_name = each.key
token_policies = ["${each.key}-policy"]
# Token configuration
token_ttl = 3600 # 1 hour
token_max_ttl = 86400 # 24 hours
# Security settings
bind_secret_id = true
secret_id_ttl = 0 # Never expire (we'll rotate manually)
}

19
terraform/vault/main.tf Normal file
View File

@@ -0,0 +1,19 @@
terraform {
required_version = ">= 1.0"
required_providers {
vault = {
source = "hashicorp/vault"
version = "~> 4.0"
}
random = {
source = "hashicorp/random"
version = "~> 3.6"
}
}
}
provider "vault" {
address = var.vault_address
token = var.vault_token
skip_tls_verify = var.vault_skip_tls_verify
}

190
terraform/vault/pki.tf Normal file
View File

@@ -0,0 +1,190 @@
# ============================================================================
# PKI Infrastructure Configuration
# ============================================================================
#
# This file configures a two-tier PKI hierarchy:
# - Root CA (pki/) - 10 year validity, EC P-384, kept offline (internal to Vault)
# - Intermediate CA (pki_int/) - 5 year validity, EC P-384, used for issuing certificates
# - Leaf certificates - Default to EC P-256 for optimal performance
#
# Key Type Choices:
# - Root/Intermediate: EC P-384 (secp384r1) for long-term security
# - Leaf certificates: EC P-256 (secp256r1) for performance and compatibility
# - EC provides smaller keys, faster operations, and lower CPU usage vs RSA
#
# Certificate Issuance Methods:
#
# 1. ACME (Automated Certificate Management Environment)
# - Services fetch certificates automatically using ACME protocol
# - ACME directory: https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
# - Enable ACME: bao write pki_int/config/acme enabled=true
# - Compatible with cert-manager, lego, certbot, etc.
#
# 2. Direct Issuance (Non-ACME)
# - Certificates defined in locals.static_certificates
# - Terraform manages lifecycle (issuance, renewal)
# - Useful for services without ACME support
# - Certificates auto-renew 7 days before expiry
#
# 3. Manual Issuance (CLI)
# - bao write pki_int/issue/homelab common_name="service.home.2rjus.net"
# - Useful for one-off certificates or testing
#
# ============================================================================
# Root CA
resource "vault_mount" "pki_root" {
path = "pki"
type = "pki"
description = "Root CA"
default_lease_ttl_seconds = 315360000 # 10 years
max_lease_ttl_seconds = 315360000 # 10 years
}
resource "vault_pki_secret_backend_root_cert" "root" {
backend = vault_mount.pki_root.path
type = "internal"
common_name = "home.2rjus.net Root CA"
ttl = "315360000" # 10 years
format = "pem"
private_key_format = "der"
key_type = "ec"
key_bits = 384 # P-384 curve (NIST P-384, secp384r1)
exclude_cn_from_sans = true
organization = "Homelab"
country = "NO"
}
# Intermediate CA
resource "vault_mount" "pki_int" {
path = "pki_int"
type = "pki"
description = "Intermediate CA"
default_lease_ttl_seconds = 157680000 # 5 years
max_lease_ttl_seconds = 157680000 # 5 years
}
resource "vault_pki_secret_backend_intermediate_cert_request" "intermediate" {
backend = vault_mount.pki_int.path
type = "internal"
common_name = "home.2rjus.net Intermediate CA"
key_type = "ec"
key_bits = 384 # P-384 curve (NIST P-384, secp384r1)
organization = "Homelab"
country = "NO"
}
resource "vault_pki_secret_backend_root_sign_intermediate" "intermediate" {
backend = vault_mount.pki_root.path
csr = vault_pki_secret_backend_intermediate_cert_request.intermediate.csr
common_name = "Homelab Intermediate CA"
ttl = "157680000" # 5 years
exclude_cn_from_sans = true
organization = "Homelab"
country = "NO"
}
resource "vault_pki_secret_backend_intermediate_set_signed" "intermediate" {
backend = vault_mount.pki_int.path
certificate = vault_pki_secret_backend_root_sign_intermediate.intermediate.certificate
}
# PKI Role for issuing certificates via ACME and direct issuance
resource "vault_pki_secret_backend_role" "homelab" {
backend = vault_mount.pki_int.path
name = "homelab"
allowed_domains = ["home.2rjus.net"]
allow_subdomains = true
max_ttl = 2592000 # 30 days
ttl = 2592000 # 30 days default
# Key configuration - EC (Elliptic Curve) by default
key_type = "ec"
key_bits = 256 # P-256 curve (NIST P-256, secp256r1)
# ACME-friendly settings
allow_ip_sans = true # Allow IP addresses in SANs
allow_localhost = false # Disable localhost
allow_bare_domains = false # Require subdomain or FQDN
allow_glob_domains = false # Don't allow glob patterns in domain names
# Server authentication
server_flag = true
client_flag = false
code_signing_flag = false
email_protection_flag = false
# Key usage (appropriate for EC certificates)
key_usage = [
"DigitalSignature",
"KeyAgreement",
]
ext_key_usage = ["ServerAuth"]
# Certificate properties
require_cn = false # ACME doesn't always use CN
}
# Configure CRL and issuing URLs
resource "vault_pki_secret_backend_config_urls" "config_urls" {
backend = vault_mount.pki_int.path
issuing_certificates = [
"${var.vault_address}/v1/pki_int/ca"
]
crl_distribution_points = [
"${var.vault_address}/v1/pki_int/crl"
]
ocsp_servers = [
"${var.vault_address}/v1/pki_int/ocsp"
]
}
# ============================================================================
# Direct Certificate Issuance (Non-ACME)
# ============================================================================
# Define static certificates to be issued directly (not via ACME)
# Useful for services that don't support ACME or need long-lived certificates
locals {
static_certificates = {
# Example: Issue a certificate for a specific service
# "vault" = {
# common_name = "vault.home.2rjus.net"
# alt_names = ["vault01.home.2rjus.net"]
# ip_sans = ["10.69.13.19"]
# ttl = "8760h" # 1 year
# }
}
}
# Issue static certificates
resource "vault_pki_secret_backend_cert" "static_certs" {
for_each = local.static_certificates
backend = vault_mount.pki_int.path
name = vault_pki_secret_backend_role.homelab.name
common_name = each.value.common_name
alt_names = lookup(each.value, "alt_names", [])
ip_sans = lookup(each.value, "ip_sans", [])
ttl = lookup(each.value, "ttl", "720h") # 30 days default
auto_renew = true
min_seconds_remaining = 604800 # Renew 7 days before expiry
}
# Output static certificate data for use in configurations
output "static_certificates" {
description = "Static certificates issued by Vault PKI"
value = {
for k, v in vault_pki_secret_backend_cert.static_certs : k => {
common_name = v.common_name
serial = v.serial_number
expiration = v.expiration
issuing_ca = v.issuing_ca
certificate = v.certificate
private_key = v.private_key
}
}
sensitive = true
}

View File

@@ -0,0 +1,76 @@
# Enable KV v2 secrets engine
resource "vault_mount" "kv" {
path = "secret"
type = "kv"
options = { version = "2" }
description = "KV Version 2 secret store"
}
# Define all secrets with auto-generation support
locals {
secrets = {
# Example host-specific secrets
# "hosts/monitoring01/grafana-admin" = {
# auto_generate = true
# password_length = 32
# }
# "hosts/ha1/mqtt-password" = {
# auto_generate = true
# password_length = 24
# }
# Example service secrets
# "services/prometheus/remote-write" = {
# auto_generate = true
# password_length = 40
# }
# Example shared secrets with manual values
# "shared/smtp/credentials" = {
# auto_generate = false
# data = {
# username = "notifications@2rjus.net"
# password = var.smtp_password # Define in variables.tf and set in terraform.tfvars
# server = "smtp.gmail.com"
# }
# }
# TODO: actually use the secret
"hosts/monitoring01/grafana-admin" = {
auto_generate = true
password_length = 32
}
# TODO: actually use the secret
"hosts/ha1/mqtt-password" = {
auto_generate = true
password_length = 24
}
}
}
# Auto-generate passwords for secrets with auto_generate = true
resource "random_password" "auto_secrets" {
for_each = {
for k, v in local.secrets : k => v
if lookup(v, "auto_generate", false)
}
length = each.value.password_length
special = true
}
# Create all secrets in Vault
resource "vault_kv_secret_v2" "secrets" {
for_each = local.secrets
mount = vault_mount.kv.path
name = each.key
data_json = jsonencode(
lookup(each.value, "auto_generate", false)
? { password = random_password.auto_secrets[each.key].result }
: each.value.data
)
}

View File

@@ -0,0 +1,6 @@
# Copy this file to terraform.tfvars and fill in your values
# terraform.tfvars is gitignored to keep credentials safe
vault_address = "https://vault.home.2rjus.net:8200"
vault_token = "hvs.XXXXXXXXXXXXXXXXXXXX"
vault_skip_tls_verify = true

View File

@@ -0,0 +1,26 @@
variable "vault_address" {
description = "OpenBao server address"
type = string
default = "https://vault.home.2rjus.net:8200"
}
variable "vault_token" {
description = "OpenBao root or admin token"
type = string
sensitive = true
}
variable "vault_skip_tls_verify" {
description = "Skip TLS verification (for self-signed certs)"
type = bool
default = true
}
# Example variables for manual secrets
# Uncomment and add to terraform.tfvars as needed
# variable "smtp_password" {
# description = "SMTP password for notifications"
# type = string
# sensitive = true
# }

View File

@@ -43,6 +43,7 @@ locals {
cpu_cores = 2 cpu_cores = 2
memory = 2048 memory = 2048
disk_size = "20G" disk_size = "20G"
flake_branch = "vault-setup" # Bootstrap from this branch instead of master
} }
} }
@@ -118,6 +119,11 @@ resource "proxmox_vm_qemu" "vm" {
} }
} }
# TPM device
tpm_state {
storage = each.value.storage
}
# Start on boot # Start on boot
start_at_node_boot = true start_at_node_boot = true