Compare commits
14 Commits
host-vault
...
b5364d2ccc
| Author | SHA1 | Date | |
|---|---|---|---|
| b5364d2ccc | |||
|
7fc69c40a6
|
|||
|
34a2f2ab50
|
|||
| 16b3214982 | |||
| 244dd0c78b | |||
|
238ad45c14
|
|||
|
c694b9889a
|
|||
|
3f2f91aedd
|
|||
|
5d513fd5af
|
|||
|
b6f1e80c2a
|
|||
|
4133eafc4e
|
|||
|
ace848b29c
|
|||
|
b012df9f34
|
|||
|
ab053c25bd
|
9
.gitignore
vendored
9
.gitignore
vendored
@@ -10,3 +10,12 @@ terraform/terraform.tfvars
|
||||
terraform/*.auto.tfvars
|
||||
terraform/crash.log
|
||||
terraform/crash.*.log
|
||||
|
||||
terraform/vault/.terraform/
|
||||
terraform/vault/.terraform.lock.hcl
|
||||
terraform/vault/*.tfstate
|
||||
terraform/vault/*.tfstate.*
|
||||
terraform/vault/terraform.tfvars
|
||||
terraform/vault/*.auto.tfvars
|
||||
terraform/vault/crash.log
|
||||
terraform/vault/crash.*.log
|
||||
|
||||
290
TODO.md
290
TODO.md
@@ -153,7 +153,9 @@ create-host \
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Secrets Management with HashiCorp Vault
|
||||
### Phase 4: Secrets Management with OpenBao (Vault)
|
||||
|
||||
**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress
|
||||
|
||||
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
||||
|
||||
@@ -164,161 +166,225 @@ create-host \
|
||||
4. User commits, pushes
|
||||
5. VM can now decrypt secrets
|
||||
|
||||
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
|
||||
**Selected approach:** Migrate to OpenBao (Vault fork) for centralized secrets management
|
||||
|
||||
**Why OpenBao instead of HashiCorp Vault:**
|
||||
- HashiCorp Vault switched to BSL (Business Source License), unavailable in NixOS cache
|
||||
- OpenBao is the community fork maintaining the pre-BSL MPL 2.0 license
|
||||
- API-compatible with Vault, uses same Terraform provider
|
||||
- Maintains all Vault features we need
|
||||
|
||||
**Benefits:**
|
||||
- Industry-standard secrets management (Vault experience transferable to work)
|
||||
- Industry-standard secrets management (Vault-compatible experience)
|
||||
- Eliminates manual age key distribution step
|
||||
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
|
||||
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
|
||||
- Centralized PKI management with ACME support (ready to replace step-ca)
|
||||
- Automatic secret rotation capabilities
|
||||
- Audit logging for all secret access
|
||||
- Audit logging for all secret access (not yet enabled)
|
||||
- AppRole authentication enables automated bootstrap
|
||||
|
||||
**Architecture:**
|
||||
**Current Architecture:**
|
||||
```
|
||||
vault.home.2rjus.net
|
||||
├─ KV Secrets Engine (replaces sops-nix)
|
||||
├─ PKI Engine (replaces step-ca for TLS)
|
||||
├─ SSH CA Engine (replaces step-ca SSH CA)
|
||||
└─ AppRole Auth (per-host authentication)
|
||||
vault.home.2rjus.net (10.69.13.19)
|
||||
├─ KV Secrets Engine (ready to replace sops-nix)
|
||||
│ ├─ secret/hosts/{hostname}/*
|
||||
│ ├─ secret/services/{service}/*
|
||||
│ └─ secret/shared/{category}/*
|
||||
├─ PKI Engine (ready to replace step-ca for TLS)
|
||||
│ ├─ Root CA (EC P-384, 10 year)
|
||||
│ ├─ Intermediate CA (EC P-384, 5 year)
|
||||
│ └─ ACME endpoint enabled
|
||||
├─ SSH CA Engine (TODO: Phase 4c)
|
||||
└─ AppRole Auth (per-host authentication configured)
|
||||
↓
|
||||
New hosts authenticate on first boot
|
||||
Fetch secrets via Vault API
|
||||
[Phase 4d] New hosts authenticate on first boot
|
||||
[Phase 4d] Fetch secrets via Vault API
|
||||
No manual key distribution needed
|
||||
```
|
||||
|
||||
**Completed:**
|
||||
- ✅ Phase 4a: OpenBao server with TPM2 auto-unseal
|
||||
- ✅ Phase 4b: Infrastructure-as-code (secrets, policies, AppRoles, PKI)
|
||||
|
||||
**Next Steps:**
|
||||
- Phase 4c: Migrate from step-ca to OpenBao PKI
|
||||
- Phase 4d: Bootstrap integration for automated secrets access
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4a: Vault Server Setup
|
||||
#### Phase 4a: Vault Server Setup ✅ COMPLETED
|
||||
|
||||
**Status:** ✅ Fully implemented and tested
|
||||
**Completed:** 2026-02-02
|
||||
|
||||
**Goal:** Deploy and configure Vault server with auto-unseal
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Create `hosts/vault01/` configuration
|
||||
- [ ] Basic NixOS configuration (hostname, networking, etc.)
|
||||
- [ ] Vault service configuration
|
||||
- [ ] Firewall rules (8200 for API, 8201 for cluster)
|
||||
- [ ] Add to flake.nix and terraform
|
||||
- [ ] Implement auto-unseal mechanism
|
||||
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
|
||||
- [ ] Use tpm2-tools to seal/unseal Vault keys
|
||||
- [ ] Systemd service to unseal on boot
|
||||
- [ ] **Fallback:** Shamir secret sharing with systemd automation
|
||||
- [ ] Generate 3 keys, threshold 2
|
||||
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
|
||||
- [ ] Systemd service auto-unseals using 2 keys
|
||||
- [ ] Initial Vault setup
|
||||
- [ ] Initialize Vault
|
||||
- [ ] Configure storage backend (integrated raft or file)
|
||||
- [ ] Set up root token management
|
||||
- [ ] Enable audit logging
|
||||
- [ ] Deploy to infrastructure
|
||||
- [ ] Add DNS entry for vault.home.2rjus.net
|
||||
- [ ] Deploy VM via terraform
|
||||
- [ ] Bootstrap and verify Vault is running
|
||||
**Implementation:**
|
||||
- Used **OpenBao** (Vault fork) instead of HashiCorp Vault due to BSL licensing concerns
|
||||
- TPM2-based auto-unseal using systemd's native `LoadCredentialEncrypted`
|
||||
- Self-signed bootstrap TLS certificates (avoiding circular dependency with step-ca)
|
||||
- File-based storage backend at `/var/lib/openbao`
|
||||
- Unix socket + TCP listener (0.0.0.0:8200) configuration
|
||||
|
||||
**Deliverable:** Running Vault server that auto-unseals on boot
|
||||
**Tasks:**
|
||||
- [x] Create `hosts/vault01/` configuration
|
||||
- [x] Basic NixOS configuration (hostname: vault01, IP: 10.69.13.19/24)
|
||||
- [x] Created reusable `services/vault` module
|
||||
- [x] Firewall not needed (trusted network)
|
||||
- [x] Already in flake.nix, deployed via terraform
|
||||
- [x] Implement auto-unseal mechanism
|
||||
- [x] **TPM2-based auto-unseal** (preferred option)
|
||||
- [x] systemd `LoadCredentialEncrypted` with TPM2 binding
|
||||
- [x] `writeShellApplication` script with proper runtime dependencies
|
||||
- [x] Reads multiple unseal keys (one per line) until unsealed
|
||||
- [x] Auto-unseals on service start via `ExecStartPost`
|
||||
- [x] Initial Vault setup
|
||||
- [x] Initialized OpenBao with Shamir secret sharing (5 keys, threshold 3)
|
||||
- [x] File storage backend
|
||||
- [x] Self-signed TLS certificates via LoadCredential
|
||||
- [x] Deploy to infrastructure
|
||||
- [x] DNS entry added for vault.home.2rjus.net
|
||||
- [x] VM deployed via terraform
|
||||
- [x] Verified OpenBao running and auto-unsealing
|
||||
|
||||
**Changes from Original Plan:**
|
||||
- Used OpenBao instead of HashiCorp Vault (licensing)
|
||||
- Used systemd's native TPM2 support instead of tpm2-tools directly
|
||||
- Skipped audit logging (can be enabled later)
|
||||
- Used self-signed certs initially (will migrate to OpenBao PKI later)
|
||||
|
||||
**Deliverable:** ✅ Running OpenBao server that auto-unseals on boot using TPM2
|
||||
|
||||
**Documentation:**
|
||||
- `/services/vault/README.md` - Service module overview
|
||||
- `/docs/vault/auto-unseal.md` - Complete TPM2 auto-unseal setup guide
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4b: Vault-as-Code with OpenTofu
|
||||
#### Phase 4b: Vault-as-Code with OpenTofu ✅ COMPLETED
|
||||
|
||||
**Status:** ✅ Fully implemented and tested
|
||||
**Completed:** 2026-02-02
|
||||
|
||||
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
|
||||
|
||||
**Implementation:**
|
||||
- Complete Terraform/OpenTofu configuration in `terraform/vault/`
|
||||
- Locals-based pattern (similar to `vms.tf`) for declaring secrets and policies
|
||||
- Auto-generation of secrets using `random_password` provider
|
||||
- Three-tier secrets path hierarchy: `hosts/`, `services/`, `shared/`
|
||||
- PKI infrastructure with **Elliptic Curve certificates** (P-384 for CAs, P-256 for leaf certs)
|
||||
- ACME support enabled on intermediate CA
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault Terraform provider
|
||||
- [ ] Create `terraform/vault/` directory
|
||||
- [ ] Configure Vault provider (address, auth)
|
||||
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
|
||||
- [ ] Enable and configure secrets engines
|
||||
- [ ] Enable KV v2 secrets engine at `secret/`
|
||||
- [ ] Define secret path structure (per-service, per-host)
|
||||
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
|
||||
- [ ] Define policies as code
|
||||
- [ ] Create policies for different service tiers
|
||||
- [ ] Principle of least privilege (hosts only read their secrets)
|
||||
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
|
||||
- [ ] Set up AppRole authentication
|
||||
- [ ] Enable AppRole auth backend
|
||||
- [ ] Create role per host type (monitoring, dns, database, etc.)
|
||||
- [ ] Bind policies to roles
|
||||
- [ ] Configure TTL and token policies
|
||||
- [ ] Migrate existing secrets from sops-nix
|
||||
- [ ] Create migration script/playbook
|
||||
- [ ] Decrypt sops secrets and load into Vault KV
|
||||
- [ ] Verify all secrets migrated successfully
|
||||
- [ ] Keep sops as backup during transition
|
||||
- [ ] Implement secrets-as-code patterns
|
||||
- [ ] Secret values in gitignored terraform.tfvars
|
||||
- [ ] Or use random_password for auto-generated secrets
|
||||
- [ ] Secret structure/paths in version-controlled .tf files
|
||||
- [x] Set up Vault Terraform provider
|
||||
- [x] Created `terraform/vault/` directory
|
||||
- [x] Configured Vault provider (uses HashiCorp provider, compatible with OpenBao)
|
||||
- [x] Credentials in terraform.tfvars (gitignored)
|
||||
- [x] terraform.tfvars.example for reference
|
||||
- [x] Enable and configure secrets engines
|
||||
- [x] KV v2 engine at `secret/`
|
||||
- [x] Three-tier path structure:
|
||||
- `secret/hosts/{hostname}/*` - Host-specific secrets
|
||||
- `secret/services/{service}/*` - Service-wide secrets
|
||||
- `secret/shared/{category}/*` - Shared secrets (SMTP, backups, etc.)
|
||||
- [x] Define policies as code
|
||||
- [x] Policies auto-generated from `locals.host_policies`
|
||||
- [x] Per-host policies with read/list on designated paths
|
||||
- [x] Principle of least privilege enforced
|
||||
- [x] Set up AppRole authentication
|
||||
- [x] AppRole backend enabled at `approle/`
|
||||
- [x] Roles auto-generated per host from `locals.host_policies`
|
||||
- [x] Token TTL: 1 hour, max 24 hours
|
||||
- [x] Policies bound to roles
|
||||
- [x] Implement secrets-as-code patterns
|
||||
- [x] Auto-generated secrets using `random_password` provider
|
||||
- [x] Manual secrets supported via variables in terraform.tfvars
|
||||
- [x] Secret structure versioned in .tf files
|
||||
- [x] Secret values excluded from git
|
||||
- [x] Set up PKI infrastructure
|
||||
- [x] Root CA (10 year TTL, EC P-384)
|
||||
- [x] Intermediate CA (5 year TTL, EC P-384)
|
||||
- [x] PKI role for `*.home.2rjus.net` (30 day max TTL, EC P-256)
|
||||
- [x] ACME enabled on intermediate CA
|
||||
- [x] Support for static certificate issuance via Terraform
|
||||
- [x] CRL, OCSP, and issuing certificate URLs configured
|
||||
|
||||
**Example OpenTofu:**
|
||||
```hcl
|
||||
resource "vault_kv_secret_v2" "monitoring_grafana" {
|
||||
mount = "secret"
|
||||
name = "monitoring/grafana"
|
||||
data_json = jsonencode({
|
||||
admin_password = var.grafana_admin_password
|
||||
smtp_password = var.smtp_password
|
||||
})
|
||||
}
|
||||
**Changes from Original Plan:**
|
||||
- Used Elliptic Curve instead of RSA for all certificates (better performance, smaller keys)
|
||||
- Implemented PKI infrastructure in Phase 4b instead of Phase 4c (more logical grouping)
|
||||
- ACME support configured immediately (ready for migration from step-ca)
|
||||
- Did not migrate existing sops-nix secrets yet (deferred to gradual migration)
|
||||
|
||||
resource "vault_policy" "monitoring" {
|
||||
name = "monitoring-policy"
|
||||
policy = <<EOT
|
||||
path "secret/data/monitoring/*" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
EOT
|
||||
}
|
||||
**Files:**
|
||||
- `terraform/vault/main.tf` - Provider configuration
|
||||
- `terraform/vault/variables.tf` - Variable definitions
|
||||
- `terraform/vault/approle.tf` - AppRole authentication (locals-based pattern)
|
||||
- `terraform/vault/pki.tf` - PKI infrastructure with EC certificates
|
||||
- `terraform/vault/secrets.tf` - KV secrets engine (auto-generation support)
|
||||
- `terraform/vault/README.md` - Complete documentation and usage examples
|
||||
- `terraform/vault/terraform.tfvars.example` - Example credentials
|
||||
|
||||
resource "vault_approle_auth_backend_role" "monitoring01" {
|
||||
backend = "approle"
|
||||
role_name = "monitoring01"
|
||||
token_policies = ["monitoring-policy"]
|
||||
}
|
||||
```
|
||||
**Deliverable:** ✅ All secrets, policies, AppRoles, and PKI managed as OpenTofu code in `terraform/vault/`
|
||||
|
||||
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
|
||||
**Documentation:**
|
||||
- `/terraform/vault/README.md` - Comprehensive guide covering:
|
||||
- Setup and deployment
|
||||
- AppRole usage and host access patterns
|
||||
- PKI certificate issuance (ACME, static, manual)
|
||||
- Secrets management patterns
|
||||
- ACME configuration and troubleshooting
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4c: PKI Migration (Replace step-ca)
|
||||
|
||||
**Goal:** Consolidate PKI infrastructure into Vault
|
||||
**Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates
|
||||
|
||||
**Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support)
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault PKI engines
|
||||
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
|
||||
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
|
||||
- [ ] Sign intermediate with root CA
|
||||
- [ ] Configure CRL and OCSP
|
||||
- [ ] Enable ACME support
|
||||
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
|
||||
- [ ] Create PKI role for homelab domain
|
||||
- [ ] Set certificate TTLs and allowed domains
|
||||
- [ ] Configure SSH CA in Vault
|
||||
- [x] Set up OpenBao PKI engines (completed in Phase 4b)
|
||||
- [x] Root CA (`pki/` mount, 10 year TTL, EC P-384)
|
||||
- [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384)
|
||||
- [x] Signed intermediate with root CA
|
||||
- [x] Configured CRL, OCSP, and issuing certificate URLs
|
||||
- [x] Enable ACME support (completed in Phase 4b)
|
||||
- [x] Enabled ACME on intermediate CA
|
||||
- [x] Created PKI role for `*.home.2rjus.net`
|
||||
- [x] Set certificate TTLs (30 day max) and allowed domains
|
||||
- [x] ACME directory: `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Download and distribute root CA certificate
|
||||
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
|
||||
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
|
||||
- [ ] Deploy via auto-upgrade
|
||||
- [ ] Test certificate issuance
|
||||
- [ ] Issue test certificate using ACME client (lego/certbot)
|
||||
- [ ] Or issue static certificate via OpenBao CLI
|
||||
- [ ] Verify certificate chain and trust
|
||||
- [ ] Migrate vault01's own certificate
|
||||
- [ ] Issue new certificate from OpenBao PKI (self-issued)
|
||||
- [ ] Replace self-signed bootstrap certificate
|
||||
- [ ] Update service configuration
|
||||
- [ ] Migrate hosts from step-ca to OpenBao
|
||||
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test on one host (non-critical service)
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Configure SSH CA in OpenBao (optional, future work)
|
||||
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
||||
- [ ] Generate SSH signing keys
|
||||
- [ ] Create roles for host and user certificates
|
||||
- [ ] Configure TTLs and allowed principals
|
||||
- [ ] Migrate hosts from step-ca to Vault
|
||||
- [ ] Update system/acme.nix to use Vault ACME endpoint
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test certificate issuance on one host
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Migrate SSH CA trust
|
||||
- [ ] Distribute Vault SSH CA public key to all hosts
|
||||
- [ ] Update sshd_config to trust Vault CA
|
||||
- [ ] Test SSH certificate authentication
|
||||
- [ ] Distribute SSH CA public key to all hosts
|
||||
- [ ] Update sshd_config to trust OpenBao CA
|
||||
- [ ] Decommission step-ca
|
||||
- [ ] Verify all services migrated
|
||||
- [ ] Verify all ACME services migrated and working
|
||||
- [ ] Stop step-ca service on ca host
|
||||
- [ ] Archive step-ca configuration for backup
|
||||
- [ ] Update documentation
|
||||
|
||||
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
|
||||
**Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired
|
||||
|
||||
---
|
||||
|
||||
|
||||
282
docs/infrastructure.md
Normal file
282
docs/infrastructure.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# Homelab Infrastructure
|
||||
|
||||
This document describes the physical and virtual infrastructure components that support the NixOS-managed servers in this repository.
|
||||
|
||||
## Overview
|
||||
|
||||
The homelab consists of several core infrastructure components:
|
||||
- **Proxmox VE** - Hypervisor hosting all NixOS VMs
|
||||
- **TrueNAS** - Network storage and backup target
|
||||
- **Ubiquiti EdgeRouter** - Primary router and gateway
|
||||
- **Mikrotik Switch** - Core network switching
|
||||
|
||||
All NixOS configurations in this repository run as VMs on Proxmox and rely on these underlying infrastructure components.
|
||||
|
||||
## Network Topology
|
||||
|
||||
### Subnets
|
||||
|
||||
VLAN numbers are based on third octet of ip address.
|
||||
|
||||
TODO: VLAN naming is currently inconsistent across router/switch/Proxmox configurations. Need to standardize VLAN names and update all device configs to use consistent naming.
|
||||
|
||||
- `10.69.8.x` - Kubernetes (no longer in use)
|
||||
- `10.69.12.x` - Core services
|
||||
- `10.69.13.x` - NixOS VMs and core services
|
||||
- `10.69.30.x` - Client network 1
|
||||
- `10.69.31.x` - Clients network 2
|
||||
- `10.69.99.x` - Management network
|
||||
|
||||
### Core Network Services
|
||||
|
||||
- **Gateway**: Web UI exposed on 10.69.10.1
|
||||
- **DNS**: ns1 (10.69.13.5), ns2 (10.69.13.6)
|
||||
- **Primary DNS Domain**: `home.2rjus.net`
|
||||
|
||||
## Hardware Components
|
||||
|
||||
### Proxmox Hypervisor
|
||||
|
||||
**Purpose**: Hosts all NixOS VMs defined in this repository
|
||||
|
||||
**Hardware**:
|
||||
- CPU: AMD Ryzen 9 3900X 12-Core Processor
|
||||
- RAM: 96GB (94Gi)
|
||||
- Storage: 1TB NVMe SSD (nvme0n1)
|
||||
|
||||
**Management**:
|
||||
- Web UI: `https://pve1.home.2rjus.net:8006`
|
||||
- Cluster: Standalone
|
||||
- Version: Proxmox VE 8.4.16 (kernel 6.8.12-18-pve)
|
||||
|
||||
**VM Provisioning**:
|
||||
- Template VM: ID 9000 (built from `hosts/template2`)
|
||||
- See `/terraform` directory for automated VM deployment using OpenTofu
|
||||
|
||||
**Storage**:
|
||||
- ZFS pool: `rpool` on NVMe partition (nvme0n1p3)
|
||||
- Total capacity: ~900GB (232GB used, 667GB available)
|
||||
- Configuration: Single disk (no RAID)
|
||||
- Scrub status: Last scrub completed successfully with 0 errors
|
||||
|
||||
**Networking**:
|
||||
- Management interface: `vmbr0` - 10.69.12.75/24 (VLAN 12 - Core services)
|
||||
- Physical interface: `enp9s0` (primary), `enp4s0` (unused)
|
||||
- VM bridges:
|
||||
- `vmbr0` - Main bridge (bridged to enp9s0)
|
||||
- `vmbr0v8` - VLAN 8 (Kubernetes - deprecated)
|
||||
- `vmbr0v13` - VLAN 13 (NixOS VMs and core services)
|
||||
|
||||
### TrueNAS
|
||||
|
||||
**Purpose**: Network storage, backup target, media storage
|
||||
|
||||
**Hardware**:
|
||||
- Model: Custom build
|
||||
- CPU: AMD Ryzen 5 5600G with Radeon Graphics
|
||||
- RAM: 32GB (31.2 GiB)
|
||||
- Disks:
|
||||
- 2x Kingston SA400S37 240GB SSD (boot pool, mirrored)
|
||||
- 2x Seagate ST16000NE000 16TB HDD (hdd-pool mirror-0)
|
||||
- 2x WD WD80EFBX 8TB HDD (hdd-pool mirror-1)
|
||||
- 2x Seagate ST8000VN004 8TB HDD (hdd-pool mirror-2)
|
||||
- 1x NVMe 2TB (nvme-pool, no redundancy)
|
||||
|
||||
**Management**:
|
||||
- Web UI: `https://nas.home.2rjus.net` (10.69.12.50)
|
||||
- Hostname: `nas.home.2rjus.net`
|
||||
- Version: TrueNAS-13.0-U6.1 (Core)
|
||||
|
||||
**Networking**:
|
||||
- Primary interface: `mlxen0` - 10GbE (10Gbase-CX4) connected to sw1
|
||||
- IP: 10.69.12.50/24 (VLAN 12 - Core services)
|
||||
|
||||
**ZFS Pools**:
|
||||
- `boot-pool`: 206GB (mirrored SSDs) - 4% used
|
||||
- Mirror of 2x Kingston 240GB SSDs
|
||||
- Last scrub: No errors
|
||||
- `hdd-pool`: 29.1TB total (3-way mirror, 28.4TB used, 658GB free) - 97% capacity
|
||||
- mirror-0: 2x 16TB Seagate ST16000NE000
|
||||
- mirror-1: 2x 8TB WD WD80EFBX
|
||||
- mirror-2: 2x 8TB Seagate ST8000VN004
|
||||
- Last scrub: No errors
|
||||
- `nvme-pool`: 1.81TB (single NVMe, 70.4GB used, 1.74TB free) - 3% capacity
|
||||
- Single NVMe drive, no redundancy
|
||||
- Last scrub: No errors
|
||||
|
||||
**NFS Exports**:
|
||||
- `/mnt/hdd-pool/media` - Media storage (exported to 10.69.0.0/16, used by Jellyfin)
|
||||
- `/mnt/hdd-pool/virt/nfs-iso` - ISO storage for Proxmox
|
||||
- `/mnt/hdd-pool/virt/kube-prod-pvc` - Kubernetes storage (deprecated)
|
||||
|
||||
**Jails**:
|
||||
TrueNAS runs several FreeBSD jails for media management:
|
||||
- nzbget - Usenet downloader
|
||||
- restic-rest - Restic REST server for backups
|
||||
- radarr - Movie management
|
||||
- sonarr - TV show management
|
||||
|
||||
### Ubiquiti EdgeRouter
|
||||
|
||||
**Purpose**: Primary router, gateway, firewall, inter-VLAN routing
|
||||
|
||||
**Model**: EdgeRouter X 5-Port
|
||||
|
||||
**Hardware**:
|
||||
- Serial: F09FC20E1A4C
|
||||
|
||||
**Management**:
|
||||
- SSH: `ssh ubnt@10.69.10.1`
|
||||
- Web UI: `https://10.69.10.1`
|
||||
- Version: EdgeOS v2.0.9-hotfix.6 (build 5574651, 12/30/22)
|
||||
|
||||
**WAN Connection**:
|
||||
- Interface: eth0
|
||||
- Public IP: 84.213.73.123/20
|
||||
- Gateway: 84.213.64.1
|
||||
|
||||
**Interface Layout**:
|
||||
- **eth0**: WAN (public IP)
|
||||
- **eth1**: 10.69.31.1/24 - Clients network 2
|
||||
- **eth2**: Unused (down)
|
||||
- **eth3**: 10.69.30.1/24 - Client network 1
|
||||
- **eth4**: Trunk port to Mikrotik switch (carries all VLANs)
|
||||
- eth4.8: 10.69.8.1/24 - K8S (deprecated)
|
||||
- eth4.10: 10.69.10.1/24 - TRUSTED (management access)
|
||||
- eth4.12: 10.69.12.1/24 - SERVER (Proxmox, TrueNAS, core services)
|
||||
- eth4.13: 10.69.13.1/24 - SVC (NixOS VMs)
|
||||
- eth4.21: 10.69.21.1/24 - CLIENTS
|
||||
- eth4.22: 10.69.22.1/24 - WLAN (wireless clients)
|
||||
- eth4.23: 10.69.23.1/24 - IOT
|
||||
- eth4.99: 10.69.99.1/24 - MGMT (device management)
|
||||
|
||||
**Routing**:
|
||||
- Default route: 0.0.0.0/0 via 84.213.64.1 (WAN gateway)
|
||||
- Static route: 192.168.100.0/24 via eth0
|
||||
- All internal VLANs directly connected
|
||||
|
||||
**DHCP Servers**:
|
||||
Active DHCP pools on all networks:
|
||||
- dhcp-8: VLAN 8 (K8S) - 91 addresses
|
||||
- dhcp-12: VLAN 12 (SERVER) - 51 addresses
|
||||
- dhcp-13: VLAN 13 (SVC) - 41 addresses
|
||||
- dhcp-21: VLAN 21 (CLIENTS) - 141 addresses
|
||||
- dhcp-22: VLAN 22 (WLAN) - 101 addresses
|
||||
- dhcp-23: VLAN 23 (IOT) - 191 addresses
|
||||
- dhcp-30: eth3 (Client network 1) - 101 addresses
|
||||
- dhcp-31: eth1 (Clients network 2) - 21 addresses
|
||||
- dhcp-mgmt: VLAN 99 (MGMT) - 51 addresses
|
||||
|
||||
**NAT/Firewall**:
|
||||
- Masquerading on WAN interface (eth0)
|
||||
|
||||
### Mikrotik Switch
|
||||
|
||||
**Purpose**: Core Layer 2/3 switching
|
||||
|
||||
**Model**: MikroTik CRS326-24G-2S+ (24x 1GbE + 2x 10GbE SFP+)
|
||||
|
||||
**Hardware**:
|
||||
- CPU: ARMv7 @ 800MHz
|
||||
- RAM: 512MB
|
||||
- Uptime: 21+ weeks
|
||||
|
||||
**Management**:
|
||||
- Hostname: `sw1.home.2rjus.net`
|
||||
- SSH access: `ssh admin@sw1.home.2rjus.net` (using gunter SSH key)
|
||||
- Management IP: 10.69.99.2/24 (VLAN 99)
|
||||
- Version: RouterOS 6.47.10 (long-term)
|
||||
|
||||
**VLANs**:
|
||||
- VLAN 8: Kubernetes (deprecated)
|
||||
- VLAN 12: SERVERS - Core services subnet
|
||||
- VLAN 13: SVC - Services subnet
|
||||
- VLAN 21: CLIENTS
|
||||
- VLAN 22: WLAN - Wireless network
|
||||
- VLAN 23: IOT
|
||||
- VLAN 99: MGMT - Management network
|
||||
|
||||
**Port Layout** (active ports):
|
||||
- **ether1**: Uplink to EdgeRouter (trunk, carries all VLANs)
|
||||
- **ether11**: virt-mini1 (VLAN 12 - SERVERS)
|
||||
- **ether12**: Home Assistant (VLAN 12 - SERVERS)
|
||||
- **ether24**: Wireless AP (VLAN 22 - WLAN)
|
||||
- **sfp-sfpplus1**: Media server/Jellyfin (VLAN 12) - 10Gbps, 7m copper DAC
|
||||
- **sfp-sfpplus2**: TrueNAS (VLAN 12) - 10Gbps, 1m copper DAC
|
||||
|
||||
**Bridge Configuration**:
|
||||
- All ports bridged to main bridge interface
|
||||
- Hardware offloading enabled
|
||||
- VLAN filtering enabled on bridge
|
||||
|
||||
## Backup & Disaster Recovery
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
**NixOS VMs**:
|
||||
- Declarative configurations in this git repository
|
||||
- Secrets: SOPS-encrypted, backed up with repository
|
||||
- State/data: Some hosts are backed up to nas host, but this should be improved and expanded to more hosts.
|
||||
|
||||
**Proxmox**:
|
||||
- VM backups: Not currently implemented
|
||||
|
||||
**Critical Credentials**:
|
||||
|
||||
TODO: Document this
|
||||
|
||||
- OpenBao root token and unseal keys: _[offline secure storage location]_
|
||||
- Proxmox root password: _[secure storage]_
|
||||
- TrueNAS admin password: _[secure storage]_
|
||||
- Router admin credentials: _[secure storage]_
|
||||
|
||||
### Disaster Recovery Procedures
|
||||
|
||||
**Total Infrastructure Loss**:
|
||||
1. Restore Proxmox from installation media
|
||||
2. Restore TrueNAS from installation media, import ZFS pools
|
||||
3. Restore network configuration on EdgeRouter and Mikrotik
|
||||
4. Rebuild NixOS VMs from this repository using Proxmox template
|
||||
5. Restore stateful data from TrueNAS backups
|
||||
6. Re-initialize OpenBao and restore from backup if needed
|
||||
|
||||
**Individual VM Loss**:
|
||||
1. Deploy new VM from template using OpenTofu (`terraform/`)
|
||||
2. Run `nixos-rebuild` with appropriate flake configuration
|
||||
3. Restore any stateful data from backups
|
||||
4. For vault01: follow re-provisioning steps in `docs/vault/auto-unseal.md`
|
||||
|
||||
**Network Device Failure**:
|
||||
- EdgeRouter: _[config backup location, restoration procedure]_
|
||||
- Mikrotik: _[config backup location, restoration procedure]_
|
||||
|
||||
## Future Additions
|
||||
|
||||
- Additional Proxmox nodes for clustering
|
||||
- Backup Proxmox Backup Server
|
||||
- Additional TrueNAS for replication
|
||||
|
||||
## Maintenance Notes
|
||||
|
||||
### Proxmox Updates
|
||||
|
||||
- Update schedule: manual
|
||||
- Pre-update checklist: yolo
|
||||
|
||||
### TrueNAS Updates
|
||||
|
||||
- Update schedule: manual
|
||||
|
||||
### Network Device Updates
|
||||
|
||||
- EdgeRouter: manual
|
||||
- Mikrotik: manual
|
||||
|
||||
## Monitoring
|
||||
|
||||
**Infrastructure Monitoring**:
|
||||
|
||||
TODO: Improve monitoring for physical hosts (proxmox, nas)
|
||||
TODO: Improve monitoring for networking equipment
|
||||
|
||||
All NixOS VMs ship metrics to monitoring01 via node-exporter and logs via Promtail. See `/services/monitoring/` for the observability stack configuration.
|
||||
151
docs/plans/truenas-migration.md
Normal file
151
docs/plans/truenas-migration.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# TrueNAS Migration Planning
|
||||
|
||||
## Current State
|
||||
|
||||
### Hardware
|
||||
- CPU: AMD Ryzen 5 5600G with Radeon Graphics
|
||||
- RAM: 32GB
|
||||
- Network: 10GbE (mlxen0)
|
||||
- Software: TrueNAS-13.0-U6.1 (Core)
|
||||
|
||||
### Storage Status
|
||||
|
||||
**hdd-pool**: 29.1TB total, **28.4TB used, 658GB free (97% capacity)** ⚠️
|
||||
- mirror-0: 2x Seagate ST16000NE000 16TB HDD (16TB usable)
|
||||
- mirror-1: 2x WD WD80EFBX 8TB HDD (8TB usable)
|
||||
- mirror-2: 2x Seagate ST8000VN004 8TB HDD (8TB usable)
|
||||
|
||||
## Goal
|
||||
|
||||
Expand storage capacity for the main hdd-pool. Since we need to add disks anyway, also evaluating whether to upgrade or replace the entire system.
|
||||
|
||||
## Decisions
|
||||
|
||||
### Migration Approach: Option 3 - Migrate to NixOS
|
||||
|
||||
**Decision**: Replace TrueNAS with NixOS bare metal installation
|
||||
|
||||
**Rationale**:
|
||||
- Aligns with existing infrastructure (16+ NixOS hosts already managed in this repo)
|
||||
- Declarative configuration fits homelab philosophy
|
||||
- Automatic monitoring/logging integration (Prometheus + Promtail)
|
||||
- Auto-upgrades via same mechanism as other hosts
|
||||
- SOPS secrets management integration
|
||||
- TrueNAS-specific features (WebGUI, jails) not heavily utilized
|
||||
|
||||
**Service migration**:
|
||||
- radarr/sonarr: Native NixOS services (`services.radarr`, `services.sonarr`)
|
||||
- restic-rest: `services.restic.server`
|
||||
- nzbget: NixOS service or OCI container
|
||||
- NFS exports: `services.nfs.server`
|
||||
|
||||
### Filesystem: BTRFS RAID1
|
||||
|
||||
**Decision**: Migrate from ZFS to BTRFS with RAID1
|
||||
|
||||
**Rationale**:
|
||||
- **In-kernel**: No out-of-tree module issues like ZFS
|
||||
- **Flexible expansion**: Add individual disks, not required to buy pairs
|
||||
- **Mixed disk sizes**: Better handling than ZFS multi-vdev approach
|
||||
- **RAID level conversion**: Can convert between RAID levels in place
|
||||
- Built-in checksumming, snapshots, compression (zstd)
|
||||
- NixOS has good BTRFS support
|
||||
|
||||
**BTRFS RAID1 notes**:
|
||||
- "RAID1" means 2 copies of all data
|
||||
- Distributes across all available devices
|
||||
- With 6+ disks, provides redundancy + capacity scaling
|
||||
- RAID5/6 avoided (known issues), RAID1/10 are stable
|
||||
|
||||
### Hardware: Keep Existing + Add Disks
|
||||
|
||||
**Decision**: Retain current hardware, expand disk capacity
|
||||
|
||||
**Hardware to keep**:
|
||||
- AMD Ryzen 5 5600G (sufficient for NAS workload)
|
||||
- 32GB RAM (adequate)
|
||||
- 10GbE network interface
|
||||
- Chassis
|
||||
|
||||
**Storage architecture**:
|
||||
|
||||
**Bulk storage** (BTRFS RAID1 on HDDs):
|
||||
- Current: 6x HDDs (2x16TB + 2x8TB + 2x8TB)
|
||||
- Add: 2x new HDDs (size TBD)
|
||||
- Use: Media, downloads, backups, non-critical data
|
||||
- Risk tolerance: High (data mostly replaceable)
|
||||
|
||||
**Critical data** (small volume):
|
||||
- Use 2x 240GB SSDs in mirror (BTRFS or ZFS)
|
||||
- Or use 2TB NVMe for critical data
|
||||
- Risk tolerance: Low (data important but small)
|
||||
|
||||
### Disk Purchase Decision
|
||||
|
||||
**Options under consideration**:
|
||||
|
||||
**Option A: 2x 16TB drives**
|
||||
- Matches largest current drives
|
||||
- Enables potential future RAID5 if desired (6x 16TB array)
|
||||
- More conservative capacity increase
|
||||
|
||||
**Option B: 2x 20-24TB drives**
|
||||
- Larger capacity headroom
|
||||
- Better $/TB ratio typically
|
||||
- Future-proofs better
|
||||
|
||||
**Initial purchase**: 2 drives (chassis has space for 2 more without modifications)
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### High-Level Plan
|
||||
|
||||
1. **Preparation**:
|
||||
- Purchase 2x new HDDs (16TB or 20-24TB)
|
||||
- Create NixOS configuration for new storage host
|
||||
- Set up bare metal NixOS installation
|
||||
|
||||
2. **Initial BTRFS pool**:
|
||||
- Install 2 new disks
|
||||
- Create BTRFS filesystem in RAID1
|
||||
- Mount and test NFS exports
|
||||
|
||||
3. **Data migration**:
|
||||
- Copy data from TrueNAS ZFS pool to new BTRFS pool over 10GbE
|
||||
- Verify data integrity
|
||||
|
||||
4. **Expand pool**:
|
||||
- As old ZFS pool is emptied, wipe drives and add to BTRFS pool
|
||||
- Pool grows incrementally: 2 → 4 → 6 → 8 disks
|
||||
- BTRFS rebalances data across new devices
|
||||
|
||||
5. **Service migration**:
|
||||
- Set up radarr/sonarr/nzbget/restic as NixOS services
|
||||
- Update NFS client mounts on consuming hosts
|
||||
|
||||
6. **Cutover**:
|
||||
- Point consumers to new NAS host
|
||||
- Decommission TrueNAS
|
||||
- Repurpose hardware or keep as spare
|
||||
|
||||
### Migration Advantages
|
||||
|
||||
- **Low risk**: New pool created independently, old data remains intact during migration
|
||||
- **Incremental**: Can add old disks one at a time as space allows
|
||||
- **Flexible**: BTRFS handles mixed disk sizes gracefully
|
||||
- **Reversible**: Keep TrueNAS running until fully validated
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Decide on disk size (16TB vs 20-24TB)
|
||||
2. Purchase disks
|
||||
3. Design NixOS host configuration (`hosts/nas1/`)
|
||||
4. Plan detailed migration timeline
|
||||
5. Document NFS export mapping (current → new)
|
||||
|
||||
## Open Questions
|
||||
|
||||
- [ ] Final decision on disk size?
|
||||
- [ ] Hostname for new NAS host? (nas1? storage1?)
|
||||
- [ ] IP address allocation (keep 10.69.12.50 or new IP?)
|
||||
- [ ] Timeline/maintenance window for migration?
|
||||
178
docs/vault/auto-unseal.md
Normal file
178
docs/vault/auto-unseal.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# OpenBao TPM2 Auto-Unseal Setup
|
||||
|
||||
This document describes the one-time setup process for enabling TPM2-based auto-unsealing on vault01.
|
||||
|
||||
## Overview
|
||||
|
||||
The auto-unseal feature uses systemd's `LoadCredentialEncrypted` with TPM2 to securely store and retrieve an unseal key. On service start, systemd automatically decrypts the credential using the VM's TPM, and the service unseals OpenBao.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- OpenBao must be initialized (`bao operator init` completed)
|
||||
- You must have at least one unseal key from the initialization
|
||||
- vault01 must have a TPM2 device (virtual TPM for Proxmox VMs)
|
||||
|
||||
## Initial Setup
|
||||
|
||||
Perform these steps on vault01 after deploying the service configuration:
|
||||
|
||||
### 1. Save Unseal Key
|
||||
|
||||
```bash
|
||||
# Create temporary file with one of your unseal keys
|
||||
echo "paste-your-unseal-key-here" > /tmp/unseal-key.txt
|
||||
```
|
||||
|
||||
### 2. Encrypt with TPM2
|
||||
|
||||
```bash
|
||||
# Encrypt the key using TPM2 binding
|
||||
systemd-creds encrypt \
|
||||
--with-key=tpm2 \
|
||||
--name=unseal-key \
|
||||
/tmp/unseal-key.txt \
|
||||
/var/lib/openbao/unseal-key.cred
|
||||
|
||||
# Set proper ownership and permissions
|
||||
chown openbao:openbao /var/lib/openbao/unseal-key.cred
|
||||
chmod 600 /var/lib/openbao/unseal-key.cred
|
||||
```
|
||||
|
||||
### 3. Cleanup
|
||||
|
||||
```bash
|
||||
# Securely delete the plaintext key
|
||||
shred -u /tmp/unseal-key.txt
|
||||
```
|
||||
|
||||
### 4. Test Auto-Unseal
|
||||
|
||||
```bash
|
||||
# Restart the service - it should auto-unseal
|
||||
systemctl restart openbao
|
||||
|
||||
# Verify it's unsealed
|
||||
bao status
|
||||
# Should show: Sealed = false
|
||||
```
|
||||
|
||||
## TPM PCR Binding
|
||||
|
||||
The default `--with-key=tpm2` binds the credential to PCR 7 (Secure Boot state). For stricter binding that includes firmware and boot state:
|
||||
|
||||
```bash
|
||||
systemd-creds encrypt \
|
||||
--with-key=tpm2 \
|
||||
--tpm2-pcrs=0+7+14 \
|
||||
--name=unseal-key \
|
||||
/tmp/unseal-key.txt \
|
||||
/var/lib/openbao/unseal-key.cred
|
||||
```
|
||||
|
||||
PCR meanings:
|
||||
- **PCR 0**: BIOS/UEFI firmware measurements
|
||||
- **PCR 7**: Secure Boot state (UEFI variables)
|
||||
- **PCR 14**: MOK (Machine Owner Key) state
|
||||
|
||||
**Trade-off**: Stricter PCR binding improves security but may require re-encrypting the credential after firmware updates or kernel changes.
|
||||
|
||||
## Re-provisioning
|
||||
|
||||
If you need to reprovision vault01 from scratch:
|
||||
|
||||
1. **Before destroying**: Back up your root token and all unseal keys (stored securely offline)
|
||||
2. **After recreating the VM**:
|
||||
- Initialize OpenBao: `bao operator init`
|
||||
- Follow the setup steps above to encrypt a new unseal key with TPM2
|
||||
3. **Restore data** (if migrating): Copy `/var/lib/openbao` from backup
|
||||
|
||||
## Handling System Changes
|
||||
|
||||
**After firmware updates, kernel updates, or boot configuration changes**, PCR values may change, causing TPM decryption to fail.
|
||||
|
||||
### Symptoms
|
||||
- Service fails to start
|
||||
- Logs show: `Failed to decrypt credentials`
|
||||
- OpenBao remains sealed after reboot
|
||||
|
||||
### Fix
|
||||
1. Unseal manually with one of your offline unseal keys:
|
||||
```bash
|
||||
bao operator unseal
|
||||
```
|
||||
|
||||
2. Re-encrypt the credential with updated PCR values:
|
||||
```bash
|
||||
echo "your-unseal-key" > /tmp/unseal-key.txt
|
||||
systemd-creds encrypt \
|
||||
--with-key=tpm2 \
|
||||
--name=unseal-key \
|
||||
/tmp/unseal-key.txt \
|
||||
/var/lib/openbao/unseal-key.cred
|
||||
chown openbao:openbao /var/lib/openbao/unseal-key.cred
|
||||
chmod 600 /var/lib/openbao/unseal-key.cred
|
||||
shred -u /tmp/unseal-key.txt
|
||||
```
|
||||
|
||||
3. Restart the service:
|
||||
```bash
|
||||
systemctl restart openbao
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### What This Protects Against
|
||||
- **Data at rest**: Vault data is encrypted and cannot be accessed without unsealing
|
||||
- **VM snapshot theft**: An attacker with a VM snapshot cannot decrypt the unseal key without the TPM state
|
||||
- **TPM binding**: The key can only be decrypted by the same VM with matching PCR values
|
||||
|
||||
### What This Does NOT Protect Against
|
||||
- **Compromised host**: If an attacker gains root access to vault01 while running, they can access unsealed data
|
||||
- **Boot-time attacks**: If an attacker can modify the boot process to match PCR values, they may retrieve the key
|
||||
- **VM console access**: An attacker with VM console access during boot could potentially access the unsealed vault
|
||||
|
||||
### Recommendations
|
||||
- **Keep offline backups** of root token and all unseal keys in a secure location (password manager, encrypted USB, etc.)
|
||||
- **Use Shamir secret sharing**: The default 5-key threshold means even if the TPM key is compromised, an attacker needs the other keys
|
||||
- **Monitor access**: Use OpenBao's audit logging to detect unauthorized access
|
||||
- **Consider stricter PCR binding** (PCR 0+7+14) for production, accepting the maintenance overhead
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check if credential exists
|
||||
```bash
|
||||
ls -la /var/lib/openbao/unseal-key.cred
|
||||
```
|
||||
|
||||
### Test credential decryption manually
|
||||
```bash
|
||||
# Should output your unseal key if TPM decryption works
|
||||
systemd-creds decrypt /var/lib/openbao/unseal-key.cred -
|
||||
```
|
||||
|
||||
### View service logs
|
||||
```bash
|
||||
journalctl -u openbao -n 50
|
||||
```
|
||||
|
||||
### Manual unseal
|
||||
```bash
|
||||
bao operator unseal
|
||||
# Enter one of your offline unseal keys when prompted
|
||||
```
|
||||
|
||||
### Check TPM status
|
||||
```bash
|
||||
# Check if TPM2 is available
|
||||
ls /dev/tpm*
|
||||
|
||||
# View TPM PCR values
|
||||
tpm2_pcrread
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [systemd.exec - Credentials](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Credentials)
|
||||
- [systemd-creds man page](https://www.freedesktop.org/software/systemd/man/systemd-creds.html)
|
||||
- [TPM2 PCR Documentation](https://uapi-group.org/specifications/specs/linux_tpm_pcr_registry/)
|
||||
- [OpenBao Documentation](https://openbao.org/docs/)
|
||||
18
flake.lock
generated
18
flake.lock
generated
@@ -65,11 +65,11 @@
|
||||
},
|
||||
"nixpkgs": {
|
||||
"locked": {
|
||||
"lastModified": 1769598131,
|
||||
"narHash": "sha256-e7VO/kGLgRMbWtpBqdWl0uFg8Y2XWFMdz0uUJvlML8o=",
|
||||
"lastModified": 1769900590,
|
||||
"narHash": "sha256-I7Lmgj3owOTBGuauy9FL6qdpeK2umDoe07lM4V+PnyA=",
|
||||
"owner": "nixos",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "fa83fd837f3098e3e678e6cf017b2b36102c7211",
|
||||
"rev": "41e216c0ca66c83b12ab7a98cc326b5db01db646",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
@@ -81,11 +81,11 @@
|
||||
},
|
||||
"nixpkgs-unstable": {
|
||||
"locked": {
|
||||
"lastModified": 1769461804,
|
||||
"narHash": "sha256-msG8SU5WsBUfVVa/9RPLaymvi5bI8edTavbIq3vRlhI=",
|
||||
"lastModified": 1770019141,
|
||||
"narHash": "sha256-VKS4ZLNx4PNrABoB0L8KUpc1fE7CLpQXQs985tGfaCU=",
|
||||
"owner": "nixos",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "bfc1b8a4574108ceef22f02bafcf6611380c100d",
|
||||
"rev": "cb369ef2efd432b3cdf8622b0ffc0a97a02f3137",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
@@ -112,11 +112,11 @@
|
||||
]
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1769469829,
|
||||
"narHash": "sha256-wFcr32ZqspCxk4+FvIxIL0AZktRs6DuF8oOsLt59YBU=",
|
||||
"lastModified": 1769921679,
|
||||
"narHash": "sha256-twBMKGQvaztZQxFxbZnkg7y/50BW9yjtCBWwdjtOZew=",
|
||||
"owner": "Mic92",
|
||||
"repo": "sops-nix",
|
||||
"rev": "c5eebd4eb2e3372fe12a8d70a248a6ee9dd02eff",
|
||||
"rev": "1e89149dcfc229e7e2ae24a8030f124a31e4f24f",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
|
||||
61
flake.nix
61
flake.nix
@@ -334,38 +334,38 @@
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
testvm01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
testvm01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/testvm01
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/testvm01
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
vault01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
vault01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/vault01
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/vault01
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
};
|
||||
packages = forAllSystems (
|
||||
{ pkgs }:
|
||||
@@ -380,6 +380,7 @@
|
||||
packages = with pkgs; [
|
||||
ansible
|
||||
opentofu
|
||||
openbao
|
||||
(pkgs.callPackage ./scripts/create-host { })
|
||||
];
|
||||
};
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
$ORIGIN home.2rjus.net.
|
||||
$TTL 1800
|
||||
@ IN SOA ns1.home.2rjus.net. admin.test.2rjus.net. (
|
||||
2063 ; serial number
|
||||
2064 ; serial number
|
||||
3600 ; refresh
|
||||
900 ; retry
|
||||
1209600 ; expire
|
||||
@@ -63,6 +63,7 @@ actions1 IN CNAME nix-cache01
|
||||
pgdb1 IN A 10.69.13.16
|
||||
nats1 IN A 10.69.13.17
|
||||
auth01 IN A 10.69.13.18
|
||||
vault01 IN A 10.69.13.19
|
||||
|
||||
; http-proxy cnames
|
||||
nzbget IN CNAME http-proxy
|
||||
|
||||
38
services/vault/README.md
Normal file
38
services/vault/README.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# OpenBao Service Module
|
||||
|
||||
NixOS service module for OpenBao (open-source Vault fork) with TPM2-based auto-unsealing.
|
||||
|
||||
## Features
|
||||
|
||||
- TLS-enabled TCP listener on `0.0.0.0:8200`
|
||||
- Unix socket listener at `/run/openbao/openbao.sock`
|
||||
- File-based storage at `/var/lib/openbao`
|
||||
- TPM2 auto-unseal on service start
|
||||
|
||||
## Configuration
|
||||
|
||||
The module expects:
|
||||
- TLS certificate: `/var/lib/openbao/cert.pem`
|
||||
- TLS private key: `/var/lib/openbao/key.pem`
|
||||
- TPM2-encrypted unseal key: `/var/lib/openbao/unseal-key.cred`
|
||||
|
||||
Certificates are loaded via systemd `LoadCredential`, and the unseal key via `LoadCredentialEncrypted`.
|
||||
|
||||
## Setup
|
||||
|
||||
For initial setup and configuration instructions, see:
|
||||
- **Auto-unseal setup**: `/docs/vault/auto-unseal.md`
|
||||
- **Terraform configuration**: `/terraform/vault/README.md`
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Check seal status
|
||||
bao status
|
||||
|
||||
# Manually seal (for maintenance)
|
||||
bao operator seal
|
||||
|
||||
# Service will auto-unseal on restart
|
||||
systemctl restart openbao
|
||||
```
|
||||
@@ -1,8 +1,114 @@
|
||||
{ ... }:
|
||||
{ pkgs, ... }:
|
||||
let
|
||||
unsealScript = pkgs.writeShellApplication {
|
||||
name = "openbao-unseal";
|
||||
runtimeInputs = with pkgs; [
|
||||
openbao
|
||||
coreutils
|
||||
gnugrep
|
||||
getent
|
||||
];
|
||||
text = ''
|
||||
# Set environment to use Unix socket
|
||||
export BAO_ADDR='unix:///run/openbao/openbao.sock'
|
||||
SOCKET_PATH="/run/openbao/openbao.sock"
|
||||
CREDS_DIR="''${CREDENTIALS_DIRECTORY:-}"
|
||||
|
||||
# Wait for socket to exist
|
||||
echo "Waiting for OpenBao socket..."
|
||||
for _ in {1..30}; do
|
||||
if [ -S "$SOCKET_PATH" ]; then
|
||||
echo "Socket exists"
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
# Wait for OpenBao to accept connections
|
||||
echo "Waiting for OpenBao to be ready..."
|
||||
for _ in {1..30}; do
|
||||
output=$(timeout 2 bao status 2>&1 || true)
|
||||
|
||||
if echo "$output" | grep -q "Sealed.*false"; then
|
||||
# Already unsealed
|
||||
echo "OpenBao is already unsealed"
|
||||
exit 0
|
||||
elif echo "$output" | grep -qE "(Sealed|Initialized)"; then
|
||||
# Got a valid response, OpenBao is ready (sealed)
|
||||
echo "OpenBao is ready"
|
||||
break
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
done
|
||||
|
||||
# Check if already unsealed
|
||||
if output=$(timeout 2 bao status 2>&1 || true); then
|
||||
if echo "$output" | grep -q "Sealed.*false"; then
|
||||
echo "OpenBao is already unsealed"
|
||||
exit 0
|
||||
fi
|
||||
fi
|
||||
|
||||
# Unseal using the TPM-decrypted keys (one per line)
|
||||
if [ -n "$CREDS_DIR" ] && [ -f "$CREDS_DIR/unseal-key" ]; then
|
||||
echo "Unsealing OpenBao..."
|
||||
while IFS= read -r key; do
|
||||
# Skip empty lines
|
||||
[ -z "$key" ] && continue
|
||||
|
||||
echo "Applying unseal key..."
|
||||
bao operator unseal "$key"
|
||||
|
||||
# Check if unsealed after each key
|
||||
if output=$(timeout 2 bao status 2>&1 || true); then
|
||||
if echo "$output" | grep -q "Sealed.*false"; then
|
||||
echo "OpenBao unsealed successfully"
|
||||
exit 0
|
||||
fi
|
||||
fi
|
||||
done < "$CREDS_DIR/unseal-key"
|
||||
|
||||
echo "WARNING: Applied all keys but OpenBao is still sealed"
|
||||
exit 0
|
||||
else
|
||||
echo "WARNING: Unseal key credential not found, OpenBao remains sealed"
|
||||
exit 0
|
||||
fi
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
services.vault = {
|
||||
services.openbao = {
|
||||
enable = true;
|
||||
|
||||
storageBackend = "file";
|
||||
settings = {
|
||||
ui = true;
|
||||
|
||||
storage.file.path = "/var/lib/openbao";
|
||||
listener.default = {
|
||||
type = "tcp";
|
||||
address = "0.0.0.0:8200";
|
||||
tls_cert_file = "/run/credentials/openbao.service/cert.pem";
|
||||
tls_key_file = "/run/credentials/openbao.service/key.pem";
|
||||
};
|
||||
listener.socket = {
|
||||
type = "unix";
|
||||
address = "/run/openbao/openbao.sock";
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
systemd.services.openbao.serviceConfig = {
|
||||
LoadCredential = [
|
||||
"key.pem:/var/lib/openbao/key.pem"
|
||||
"cert.pem:/var/lib/openbao/cert.pem"
|
||||
];
|
||||
# TPM2-encrypted unseal key (created manually, see setup instructions)
|
||||
LoadCredentialEncrypted = [
|
||||
"unseal-key:/var/lib/openbao/unseal-key.cred"
|
||||
];
|
||||
# Auto-unseal on service start
|
||||
ExecStartPost = "${unsealScript}/bin/openbao-unseal";
|
||||
};
|
||||
}
|
||||
|
||||
@@ -8,7 +8,7 @@ resource "proxmox_cloud_init_disk" "ci" {
|
||||
|
||||
name = each.key
|
||||
pve_node = each.value.target_node
|
||||
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets
|
||||
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets
|
||||
|
||||
# User data includes SSH keys and optionally NIXOS_FLAKE_BRANCH
|
||||
user_data = <<-EOT
|
||||
@@ -25,34 +25,34 @@ resource "proxmox_cloud_init_disk" "ci" {
|
||||
: ""}
|
||||
EOT
|
||||
|
||||
# Network configuration - static IP or DHCP
|
||||
network_config = each.value.ip != null ? yamlencode({
|
||||
version = 1
|
||||
config = [{
|
||||
type = "physical"
|
||||
name = "ens18"
|
||||
subnets = [{
|
||||
type = "static"
|
||||
address = each.value.ip
|
||||
gateway = each.value.gateway
|
||||
dns_nameservers = split(" ", each.value.nameservers)
|
||||
dns_search = [each.value.search_domain]
|
||||
}]
|
||||
# Network configuration - static IP or DHCP
|
||||
network_config = each.value.ip != null ? yamlencode({
|
||||
version = 1
|
||||
config = [{
|
||||
type = "physical"
|
||||
name = "ens18"
|
||||
subnets = [{
|
||||
type = "static"
|
||||
address = each.value.ip
|
||||
gateway = each.value.gateway
|
||||
dns_nameservers = split(" ", each.value.nameservers)
|
||||
dns_search = [each.value.search_domain]
|
||||
}]
|
||||
}) : yamlencode({
|
||||
version = 1
|
||||
config = [{
|
||||
type = "physical"
|
||||
name = "ens18"
|
||||
subnets = [{
|
||||
type = "dhcp"
|
||||
}]
|
||||
}]
|
||||
}) : yamlencode({
|
||||
version = 1
|
||||
config = [{
|
||||
type = "physical"
|
||||
name = "ens18"
|
||||
subnets = [{
|
||||
type = "dhcp"
|
||||
}]
|
||||
})
|
||||
}]
|
||||
})
|
||||
|
||||
# Instance metadata
|
||||
meta_data = yamlencode({
|
||||
instance_id = sha1(each.key)
|
||||
local-hostname = each.key
|
||||
})
|
||||
# Instance metadata
|
||||
meta_data = yamlencode({
|
||||
instance_id = sha1(each.key)
|
||||
local-hostname = each.key
|
||||
})
|
||||
}
|
||||
|
||||
37
terraform/vault/.terraform.lock.hcl
generated
Normal file
37
terraform/vault/.terraform.lock.hcl
generated
Normal file
@@ -0,0 +1,37 @@
|
||||
# This file is maintained automatically by "tofu init".
|
||||
# Manual edits may be lost in future updates.
|
||||
|
||||
provider "registry.opentofu.org/hashicorp/random" {
|
||||
version = "3.8.1"
|
||||
constraints = "~> 3.6"
|
||||
hashes = [
|
||||
"h1:EHn3jsqOKhWjbg0X+psk0Ww96yz3N7ASqEKKuFvDFwo=",
|
||||
"zh:25c458c7c676f15705e872202dad7dcd0982e4a48e7ea1800afa5fc64e77f4c8",
|
||||
"zh:2edeaf6f1b20435b2f81855ad98a2e70956d473be9e52a5fdf57ccd0098ba476",
|
||||
"zh:44becb9d5f75d55e36dfed0c5beabaf4c92e0a2bc61a3814d698271c646d48e7",
|
||||
"zh:7699032612c3b16cc69928add8973de47b10ce81b1141f30644a0e8a895b5cd3",
|
||||
"zh:86d07aa98d17703de9fbf402c89590dc1e01dbe5671dd6bc5e487eb8fe87eee0",
|
||||
"zh:8c411c77b8390a49a8a1bc9f176529e6b32369dd33a723606c8533e5ca4d68c1",
|
||||
"zh:a5ecc8255a612652a56b28149994985e2c4dc046e5d34d416d47fa7767f5c28f",
|
||||
"zh:aea3fe1a5669b932eda9c5c72e5f327db8da707fe514aaca0d0ef60cb24892f9",
|
||||
"zh:f56e26e6977f755d7ae56fa6320af96ecf4bb09580d47cb481efbf27f1c5afff",
|
||||
]
|
||||
}
|
||||
|
||||
provider "registry.opentofu.org/hashicorp/vault" {
|
||||
version = "4.8.0"
|
||||
constraints = "~> 4.0"
|
||||
hashes = [
|
||||
"h1:SQkjClJDo6SETUnq912GO8BdEExhU1ko8IG2mr4X/2A=",
|
||||
"zh:0c07ef884c03083b08a54c2cf782f3ff7e124b05e7a4438a0b90a86e60c8d080",
|
||||
"zh:13dcf2ed494c79e893b447249716d96b665616a868ffaf8f2c5abef07c7eee6f",
|
||||
"zh:6f15a29fae3a6178e5904e3c95ba22b20f362d8ee491da816048c89f30e6b2de",
|
||||
"zh:94b92a4bf7a2d250d9698a021f1ab60d1957d01b5bab81f7d9c00c2d6a9b3747",
|
||||
"zh:a9e207540ef12cd2402e37b3b7567e08de14061a0a2635fd2f4fd09e0a3382aa",
|
||||
"zh:b41667938ba541e8492036415b3f51fbd1758e456f6d5f0b63e26f4ad5728b21",
|
||||
"zh:df0b73aff5f4b51e08fc0c273db7f677994db29a81deda66d91acfcfe3f1a370",
|
||||
"zh:df904b217dc79b71a8b5f5f3ab2e52316d0f890810383721349cc10a72f7265b",
|
||||
"zh:f0e0b3e6782e0126c40f05cf87ec80978c7291d90f52d7741300b5de1d9c01ba",
|
||||
"zh:f8e599718b0ea22658eaa3e590671d3873aa723e7ce7d00daf3460ab41d3af14",
|
||||
]
|
||||
}
|
||||
280
terraform/vault/README.md
Normal file
280
terraform/vault/README.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# OpenBao Terraform Configuration
|
||||
|
||||
This directory contains Terraform/OpenTofu configuration for managing OpenBao (Vault) infrastructure as code.
|
||||
|
||||
## Overview
|
||||
|
||||
Manages the following OpenBao resources:
|
||||
- **AppRole Authentication**: For host-based authentication
|
||||
- **PKI Infrastructure**: Root CA + Intermediate CA for TLS certificates
|
||||
- **KV Secrets Engine**: Key-value secret storage (v2)
|
||||
- **Policies**: Access control policies
|
||||
|
||||
## Setup
|
||||
|
||||
1. **Copy the example tfvars file:**
|
||||
```bash
|
||||
cp terraform.tfvars.example terraform.tfvars
|
||||
```
|
||||
|
||||
2. **Edit `terraform.tfvars` with your OpenBao credentials:**
|
||||
```hcl
|
||||
vault_address = "https://vault.home.2rjus.net:8200"
|
||||
vault_token = "hvs.your-root-token-here"
|
||||
vault_skip_tls_verify = true
|
||||
```
|
||||
|
||||
3. **Initialize Terraform:**
|
||||
```bash
|
||||
tofu init
|
||||
```
|
||||
|
||||
4. **Review the plan:**
|
||||
```bash
|
||||
tofu plan
|
||||
```
|
||||
|
||||
5. **Apply the configuration:**
|
||||
```bash
|
||||
tofu apply
|
||||
```
|
||||
|
||||
## Files
|
||||
|
||||
- `main.tf` - Provider configuration
|
||||
- `variables.tf` - Variable definitions
|
||||
- `approle.tf` - AppRole authentication backend and roles
|
||||
- `pki.tf` - PKI engines (root CA and intermediate CA)
|
||||
- `secrets.tf` - KV secrets engine and test secrets
|
||||
- `terraform.tfvars` - Credentials (gitignored)
|
||||
- `terraform.tfvars.example` - Example configuration
|
||||
|
||||
## Resources Created
|
||||
|
||||
### AppRole Authentication
|
||||
- AppRole backend at `approle/`
|
||||
- Host-based roles and policies (defined in `locals.host_policies`)
|
||||
|
||||
### PKI Infrastructure
|
||||
- Root CA at `pki/` (10 year TTL)
|
||||
- Intermediate CA at `pki_int/` (5 year TTL)
|
||||
- Role `homelab` for issuing certificates to `*.home.2rjus.net`
|
||||
- Certificate max TTL: 30 days
|
||||
|
||||
### Secrets
|
||||
- KV v2 engine at `secret/`
|
||||
- Secrets and policies defined in `locals.secrets` and `locals.host_policies`
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Adding a New Host
|
||||
|
||||
1. **Define the host policy in `approle.tf`:**
|
||||
```hcl
|
||||
locals {
|
||||
host_policies = {
|
||||
"monitoring01" = {
|
||||
paths = [
|
||||
"secret/data/hosts/monitoring01/*",
|
||||
"secret/data/services/prometheus/*",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. **Add secrets in `secrets.tf`:**
|
||||
```hcl
|
||||
locals {
|
||||
secrets = {
|
||||
"hosts/monitoring01/grafana-admin" = {
|
||||
auto_generate = true
|
||||
password_length = 32
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
3. **Apply changes:**
|
||||
```bash
|
||||
tofu apply
|
||||
```
|
||||
|
||||
4. **Get AppRole credentials:**
|
||||
```bash
|
||||
# Get role_id
|
||||
bao read auth/approle/role/monitoring01/role-id
|
||||
|
||||
# Generate secret_id
|
||||
bao write -f auth/approle/role/monitoring01/secret-id
|
||||
```
|
||||
|
||||
### Issue Certificates from PKI
|
||||
|
||||
**Method 1: ACME (Recommended for automated services)**
|
||||
|
||||
First, enable ACME support:
|
||||
```bash
|
||||
bao write pki_int/config/acme enabled=true
|
||||
```
|
||||
|
||||
ACME directory endpoint:
|
||||
```
|
||||
https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
|
||||
```
|
||||
|
||||
Use with ACME clients (lego, certbot, cert-manager, etc.):
|
||||
```bash
|
||||
# Example with lego
|
||||
lego --email admin@home.2rjus.net \
|
||||
--dns manual \
|
||||
--server https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory \
|
||||
--accept-tos \
|
||||
run -d test.home.2rjus.net
|
||||
```
|
||||
|
||||
**Method 2: Static certificates via Terraform**
|
||||
|
||||
Define in `pki.tf`:
|
||||
```hcl
|
||||
locals {
|
||||
static_certificates = {
|
||||
"monitoring" = {
|
||||
common_name = "monitoring.home.2rjus.net"
|
||||
alt_names = ["grafana.home.2rjus.net", "prometheus.home.2rjus.net"]
|
||||
ttl = "720h"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Terraform will auto-issue and auto-renew these certificates.
|
||||
|
||||
**Method 3: Manual CLI issuance**
|
||||
|
||||
```bash
|
||||
# Issue certificate for a host
|
||||
bao write pki_int/issue/homelab \
|
||||
common_name="test.home.2rjus.net" \
|
||||
ttl="720h"
|
||||
```
|
||||
|
||||
### Read a secret
|
||||
|
||||
```bash
|
||||
# Authenticate with AppRole first
|
||||
bao write auth/approle/login \
|
||||
role_id="..." \
|
||||
secret_id="..."
|
||||
|
||||
# Read the test secret
|
||||
bao kv get secret/test/example
|
||||
```
|
||||
|
||||
## Managing Secrets
|
||||
|
||||
Secrets are defined in the `locals.secrets` block in `secrets.tf` using a declarative pattern:
|
||||
|
||||
### Auto-Generated Secrets (Recommended)
|
||||
|
||||
Most secrets can be auto-generated using the `random_password` provider:
|
||||
|
||||
```hcl
|
||||
locals {
|
||||
secrets = {
|
||||
"hosts/monitoring01/grafana-admin" = {
|
||||
auto_generate = true
|
||||
password_length = 32
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Manual Secrets
|
||||
|
||||
For secrets that must have specific values (external services, etc.):
|
||||
|
||||
```hcl
|
||||
# In variables.tf
|
||||
variable "smtp_password" {
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# In secrets.tf locals block
|
||||
locals {
|
||||
secrets = {
|
||||
"shared/smtp/credentials" = {
|
||||
auto_generate = false
|
||||
data = {
|
||||
username = "notifications@2rjus.net"
|
||||
password = var.smtp_password
|
||||
server = "smtp.gmail.com"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# In terraform.tfvars
|
||||
smtp_password = "super-secret-password"
|
||||
```
|
||||
|
||||
### Path Structure
|
||||
|
||||
Secrets follow a three-tier hierarchy:
|
||||
- `hosts/{hostname}/*` - Host-specific secrets
|
||||
- `services/{service}/*` - Service-wide secrets (any host running the service)
|
||||
- `shared/{category}/*` - Shared secrets (SMTP, backup, etc.)
|
||||
|
||||
## Security Notes
|
||||
|
||||
- `terraform.tfvars` is gitignored to prevent credential leakage
|
||||
- Root token should be stored securely (consider using a limited admin token instead)
|
||||
- `skip_tls_verify = true` is acceptable for self-signed certs in homelab
|
||||
- AppRole secret_ids can be scoped to specific CIDR ranges for additional security
|
||||
|
||||
## Initial Setup Steps
|
||||
|
||||
After deploying this configuration, perform these one-time setup tasks:
|
||||
|
||||
### 1. Enable ACME
|
||||
```bash
|
||||
export BAO_ADDR='https://vault.home.2rjus.net:8200'
|
||||
export BAO_TOKEN='your-root-token'
|
||||
export BAO_SKIP_VERIFY=1
|
||||
|
||||
# Configure cluster path (required for ACME)
|
||||
bao write pki_int/config/cluster path=https://vault.home.2rjus.net:8200/v1/pki_int
|
||||
|
||||
# Enable ACME on intermediate CA
|
||||
bao write pki_int/config/acme enabled=true
|
||||
|
||||
# Verify ACME is enabled
|
||||
curl -k https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
|
||||
```
|
||||
|
||||
### 2. Download Root CA Certificate
|
||||
|
||||
For trusting the internal CA on clients:
|
||||
```bash
|
||||
# Download root CA certificate
|
||||
bao read -field=certificate pki/cert/ca > homelab-root-ca.crt
|
||||
|
||||
# Install on NixOS hosts (add to system/default.nix or similar)
|
||||
security.pki.certificateFiles = [ ./homelab-root-ca.crt ];
|
||||
```
|
||||
|
||||
### 3. Test Certificate Issuance
|
||||
|
||||
```bash
|
||||
# Manual test
|
||||
bao write pki_int/issue/homelab common_name="test.home.2rjus.net" ttl="24h"
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Replace step-ca ACME endpoint with OpenBao in `system/acme.nix`
|
||||
2. Add more AppRoles for different host types
|
||||
3. Migrate existing sops-nix secrets to OpenBao KV
|
||||
4. Set up SSH CA for host and user certificates
|
||||
5. Configure auto-unseal for vault01
|
||||
74
terraform/vault/approle.tf
Normal file
74
terraform/vault/approle.tf
Normal file
@@ -0,0 +1,74 @@
|
||||
# Enable AppRole auth backend
|
||||
resource "vault_auth_backend" "approle" {
|
||||
type = "approle"
|
||||
path = "approle"
|
||||
}
|
||||
|
||||
# Define host access policies
|
||||
locals {
|
||||
host_policies = {
|
||||
# Example: monitoring01 host
|
||||
# "monitoring01" = {
|
||||
# paths = [
|
||||
# "secret/data/hosts/monitoring01/*",
|
||||
# "secret/data/services/prometheus/*",
|
||||
# "secret/data/services/grafana/*",
|
||||
# "secret/data/shared/smtp/*"
|
||||
# ]
|
||||
# }
|
||||
|
||||
# Example: ha1 host
|
||||
# "ha1" = {
|
||||
# paths = [
|
||||
# "secret/data/hosts/ha1/*",
|
||||
# "secret/data/shared/mqtt/*"
|
||||
# ]
|
||||
# }
|
||||
|
||||
# TODO: actually use this policy
|
||||
"ha1" = {
|
||||
paths = [
|
||||
"secret/data/hosts/ha1/*",
|
||||
]
|
||||
}
|
||||
|
||||
# TODO: actually use this policy
|
||||
"monitoring01" = {
|
||||
paths = [
|
||||
"secret/data/hosts/monitoring01/*",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Generate policies for each host
|
||||
resource "vault_policy" "host_policies" {
|
||||
for_each = local.host_policies
|
||||
|
||||
name = "${each.key}-policy"
|
||||
|
||||
policy = <<EOT
|
||||
%{~for path in each.value.paths~}
|
||||
path "${path}" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
%{~endfor~}
|
||||
EOT
|
||||
}
|
||||
|
||||
# Generate AppRoles for each host
|
||||
resource "vault_approle_auth_backend_role" "hosts" {
|
||||
for_each = local.host_policies
|
||||
|
||||
backend = vault_auth_backend.approle.path
|
||||
role_name = each.key
|
||||
token_policies = ["${each.key}-policy"]
|
||||
|
||||
# Token configuration
|
||||
token_ttl = 3600 # 1 hour
|
||||
token_max_ttl = 86400 # 24 hours
|
||||
|
||||
# Security settings
|
||||
bind_secret_id = true
|
||||
secret_id_ttl = 0 # Never expire (we'll rotate manually)
|
||||
}
|
||||
19
terraform/vault/main.tf
Normal file
19
terraform/vault/main.tf
Normal file
@@ -0,0 +1,19 @@
|
||||
terraform {
|
||||
required_version = ">= 1.0"
|
||||
required_providers {
|
||||
vault = {
|
||||
source = "hashicorp/vault"
|
||||
version = "~> 4.0"
|
||||
}
|
||||
random = {
|
||||
source = "hashicorp/random"
|
||||
version = "~> 3.6"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "vault" {
|
||||
address = var.vault_address
|
||||
token = var.vault_token
|
||||
skip_tls_verify = var.vault_skip_tls_verify
|
||||
}
|
||||
190
terraform/vault/pki.tf
Normal file
190
terraform/vault/pki.tf
Normal file
@@ -0,0 +1,190 @@
|
||||
# ============================================================================
|
||||
# PKI Infrastructure Configuration
|
||||
# ============================================================================
|
||||
#
|
||||
# This file configures a two-tier PKI hierarchy:
|
||||
# - Root CA (pki/) - 10 year validity, EC P-384, kept offline (internal to Vault)
|
||||
# - Intermediate CA (pki_int/) - 5 year validity, EC P-384, used for issuing certificates
|
||||
# - Leaf certificates - Default to EC P-256 for optimal performance
|
||||
#
|
||||
# Key Type Choices:
|
||||
# - Root/Intermediate: EC P-384 (secp384r1) for long-term security
|
||||
# - Leaf certificates: EC P-256 (secp256r1) for performance and compatibility
|
||||
# - EC provides smaller keys, faster operations, and lower CPU usage vs RSA
|
||||
#
|
||||
# Certificate Issuance Methods:
|
||||
#
|
||||
# 1. ACME (Automated Certificate Management Environment)
|
||||
# - Services fetch certificates automatically using ACME protocol
|
||||
# - ACME directory: https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory
|
||||
# - Enable ACME: bao write pki_int/config/acme enabled=true
|
||||
# - Compatible with cert-manager, lego, certbot, etc.
|
||||
#
|
||||
# 2. Direct Issuance (Non-ACME)
|
||||
# - Certificates defined in locals.static_certificates
|
||||
# - Terraform manages lifecycle (issuance, renewal)
|
||||
# - Useful for services without ACME support
|
||||
# - Certificates auto-renew 7 days before expiry
|
||||
#
|
||||
# 3. Manual Issuance (CLI)
|
||||
# - bao write pki_int/issue/homelab common_name="service.home.2rjus.net"
|
||||
# - Useful for one-off certificates or testing
|
||||
#
|
||||
# ============================================================================
|
||||
|
||||
# Root CA
|
||||
resource "vault_mount" "pki_root" {
|
||||
path = "pki"
|
||||
type = "pki"
|
||||
description = "Root CA"
|
||||
default_lease_ttl_seconds = 315360000 # 10 years
|
||||
max_lease_ttl_seconds = 315360000 # 10 years
|
||||
}
|
||||
|
||||
resource "vault_pki_secret_backend_root_cert" "root" {
|
||||
backend = vault_mount.pki_root.path
|
||||
type = "internal"
|
||||
common_name = "home.2rjus.net Root CA"
|
||||
ttl = "315360000" # 10 years
|
||||
format = "pem"
|
||||
private_key_format = "der"
|
||||
key_type = "ec"
|
||||
key_bits = 384 # P-384 curve (NIST P-384, secp384r1)
|
||||
exclude_cn_from_sans = true
|
||||
organization = "Homelab"
|
||||
country = "NO"
|
||||
}
|
||||
|
||||
# Intermediate CA
|
||||
resource "vault_mount" "pki_int" {
|
||||
path = "pki_int"
|
||||
type = "pki"
|
||||
description = "Intermediate CA"
|
||||
default_lease_ttl_seconds = 157680000 # 5 years
|
||||
max_lease_ttl_seconds = 157680000 # 5 years
|
||||
}
|
||||
|
||||
resource "vault_pki_secret_backend_intermediate_cert_request" "intermediate" {
|
||||
backend = vault_mount.pki_int.path
|
||||
type = "internal"
|
||||
common_name = "home.2rjus.net Intermediate CA"
|
||||
key_type = "ec"
|
||||
key_bits = 384 # P-384 curve (NIST P-384, secp384r1)
|
||||
organization = "Homelab"
|
||||
country = "NO"
|
||||
}
|
||||
|
||||
resource "vault_pki_secret_backend_root_sign_intermediate" "intermediate" {
|
||||
backend = vault_mount.pki_root.path
|
||||
csr = vault_pki_secret_backend_intermediate_cert_request.intermediate.csr
|
||||
common_name = "Homelab Intermediate CA"
|
||||
ttl = "157680000" # 5 years
|
||||
exclude_cn_from_sans = true
|
||||
organization = "Homelab"
|
||||
country = "NO"
|
||||
}
|
||||
|
||||
resource "vault_pki_secret_backend_intermediate_set_signed" "intermediate" {
|
||||
backend = vault_mount.pki_int.path
|
||||
certificate = vault_pki_secret_backend_root_sign_intermediate.intermediate.certificate
|
||||
}
|
||||
|
||||
# PKI Role for issuing certificates via ACME and direct issuance
|
||||
resource "vault_pki_secret_backend_role" "homelab" {
|
||||
backend = vault_mount.pki_int.path
|
||||
name = "homelab"
|
||||
allowed_domains = ["home.2rjus.net"]
|
||||
allow_subdomains = true
|
||||
max_ttl = 2592000 # 30 days
|
||||
ttl = 2592000 # 30 days default
|
||||
|
||||
# Key configuration - EC (Elliptic Curve) by default
|
||||
key_type = "ec"
|
||||
key_bits = 256 # P-256 curve (NIST P-256, secp256r1)
|
||||
|
||||
# ACME-friendly settings
|
||||
allow_ip_sans = true # Allow IP addresses in SANs
|
||||
allow_localhost = false # Disable localhost
|
||||
allow_bare_domains = false # Require subdomain or FQDN
|
||||
allow_glob_domains = false # Don't allow glob patterns in domain names
|
||||
|
||||
# Server authentication
|
||||
server_flag = true
|
||||
client_flag = false
|
||||
code_signing_flag = false
|
||||
email_protection_flag = false
|
||||
|
||||
# Key usage (appropriate for EC certificates)
|
||||
key_usage = [
|
||||
"DigitalSignature",
|
||||
"KeyAgreement",
|
||||
]
|
||||
ext_key_usage = ["ServerAuth"]
|
||||
|
||||
# Certificate properties
|
||||
require_cn = false # ACME doesn't always use CN
|
||||
}
|
||||
|
||||
# Configure CRL and issuing URLs
|
||||
resource "vault_pki_secret_backend_config_urls" "config_urls" {
|
||||
backend = vault_mount.pki_int.path
|
||||
issuing_certificates = [
|
||||
"${var.vault_address}/v1/pki_int/ca"
|
||||
]
|
||||
crl_distribution_points = [
|
||||
"${var.vault_address}/v1/pki_int/crl"
|
||||
]
|
||||
ocsp_servers = [
|
||||
"${var.vault_address}/v1/pki_int/ocsp"
|
||||
]
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Direct Certificate Issuance (Non-ACME)
|
||||
# ============================================================================
|
||||
|
||||
# Define static certificates to be issued directly (not via ACME)
|
||||
# Useful for services that don't support ACME or need long-lived certificates
|
||||
locals {
|
||||
static_certificates = {
|
||||
# Example: Issue a certificate for a specific service
|
||||
# "vault" = {
|
||||
# common_name = "vault.home.2rjus.net"
|
||||
# alt_names = ["vault01.home.2rjus.net"]
|
||||
# ip_sans = ["10.69.13.19"]
|
||||
# ttl = "8760h" # 1 year
|
||||
# }
|
||||
}
|
||||
}
|
||||
|
||||
# Issue static certificates
|
||||
resource "vault_pki_secret_backend_cert" "static_certs" {
|
||||
for_each = local.static_certificates
|
||||
|
||||
backend = vault_mount.pki_int.path
|
||||
name = vault_pki_secret_backend_role.homelab.name
|
||||
common_name = each.value.common_name
|
||||
|
||||
alt_names = lookup(each.value, "alt_names", [])
|
||||
ip_sans = lookup(each.value, "ip_sans", [])
|
||||
ttl = lookup(each.value, "ttl", "720h") # 30 days default
|
||||
|
||||
auto_renew = true
|
||||
min_seconds_remaining = 604800 # Renew 7 days before expiry
|
||||
}
|
||||
|
||||
# Output static certificate data for use in configurations
|
||||
output "static_certificates" {
|
||||
description = "Static certificates issued by Vault PKI"
|
||||
value = {
|
||||
for k, v in vault_pki_secret_backend_cert.static_certs : k => {
|
||||
common_name = v.common_name
|
||||
serial = v.serial_number
|
||||
expiration = v.expiration
|
||||
issuing_ca = v.issuing_ca
|
||||
certificate = v.certificate
|
||||
private_key = v.private_key
|
||||
}
|
||||
}
|
||||
sensitive = true
|
||||
}
|
||||
76
terraform/vault/secrets.tf
Normal file
76
terraform/vault/secrets.tf
Normal file
@@ -0,0 +1,76 @@
|
||||
# Enable KV v2 secrets engine
|
||||
resource "vault_mount" "kv" {
|
||||
path = "secret"
|
||||
type = "kv"
|
||||
options = { version = "2" }
|
||||
description = "KV Version 2 secret store"
|
||||
}
|
||||
|
||||
# Define all secrets with auto-generation support
|
||||
locals {
|
||||
secrets = {
|
||||
# Example host-specific secrets
|
||||
# "hosts/monitoring01/grafana-admin" = {
|
||||
# auto_generate = true
|
||||
# password_length = 32
|
||||
# }
|
||||
# "hosts/ha1/mqtt-password" = {
|
||||
# auto_generate = true
|
||||
# password_length = 24
|
||||
# }
|
||||
|
||||
# Example service secrets
|
||||
# "services/prometheus/remote-write" = {
|
||||
# auto_generate = true
|
||||
# password_length = 40
|
||||
# }
|
||||
|
||||
# Example shared secrets with manual values
|
||||
# "shared/smtp/credentials" = {
|
||||
# auto_generate = false
|
||||
# data = {
|
||||
# username = "notifications@2rjus.net"
|
||||
# password = var.smtp_password # Define in variables.tf and set in terraform.tfvars
|
||||
# server = "smtp.gmail.com"
|
||||
# }
|
||||
# }
|
||||
|
||||
# TODO: actually use the secret
|
||||
"hosts/monitoring01/grafana-admin" = {
|
||||
auto_generate = true
|
||||
password_length = 32
|
||||
}
|
||||
|
||||
# TODO: actually use the secret
|
||||
"hosts/ha1/mqtt-password" = {
|
||||
auto_generate = true
|
||||
password_length = 24
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
# Auto-generate passwords for secrets with auto_generate = true
|
||||
resource "random_password" "auto_secrets" {
|
||||
for_each = {
|
||||
for k, v in local.secrets : k => v
|
||||
if lookup(v, "auto_generate", false)
|
||||
}
|
||||
|
||||
length = each.value.password_length
|
||||
special = true
|
||||
}
|
||||
|
||||
# Create all secrets in Vault
|
||||
resource "vault_kv_secret_v2" "secrets" {
|
||||
for_each = local.secrets
|
||||
|
||||
mount = vault_mount.kv.path
|
||||
name = each.key
|
||||
|
||||
data_json = jsonencode(
|
||||
lookup(each.value, "auto_generate", false)
|
||||
? { password = random_password.auto_secrets[each.key].result }
|
||||
: each.value.data
|
||||
)
|
||||
}
|
||||
6
terraform/vault/terraform.tfvars.example
Normal file
6
terraform/vault/terraform.tfvars.example
Normal file
@@ -0,0 +1,6 @@
|
||||
# Copy this file to terraform.tfvars and fill in your values
|
||||
# terraform.tfvars is gitignored to keep credentials safe
|
||||
|
||||
vault_address = "https://vault.home.2rjus.net:8200"
|
||||
vault_token = "hvs.XXXXXXXXXXXXXXXXXXXX"
|
||||
vault_skip_tls_verify = true
|
||||
26
terraform/vault/variables.tf
Normal file
26
terraform/vault/variables.tf
Normal file
@@ -0,0 +1,26 @@
|
||||
variable "vault_address" {
|
||||
description = "OpenBao server address"
|
||||
type = string
|
||||
default = "https://vault.home.2rjus.net:8200"
|
||||
}
|
||||
|
||||
variable "vault_token" {
|
||||
description = "OpenBao root or admin token"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "vault_skip_tls_verify" {
|
||||
description = "Skip TLS verification (for self-signed certs)"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# Example variables for manual secrets
|
||||
# Uncomment and add to terraform.tfvars as needed
|
||||
|
||||
# variable "smtp_password" {
|
||||
# description = "SMTP password for notifications"
|
||||
# type = string
|
||||
# sensitive = true
|
||||
# }
|
||||
@@ -39,10 +39,11 @@ locals {
|
||||
flake_branch = "pipeline-testing-improvements"
|
||||
}
|
||||
"vault01" = {
|
||||
ip = "10.69.13.19/24"
|
||||
cpu_cores = 2
|
||||
memory = 2048
|
||||
disk_size = "20G"
|
||||
ip = "10.69.13.19/24"
|
||||
cpu_cores = 2
|
||||
memory = 2048
|
||||
disk_size = "20G"
|
||||
flake_branch = "vault-setup" # Bootstrap from this branch instead of master
|
||||
}
|
||||
}
|
||||
|
||||
@@ -118,6 +119,11 @@ resource "proxmox_vm_qemu" "vm" {
|
||||
}
|
||||
}
|
||||
|
||||
# TPM device
|
||||
tpm_state {
|
||||
storage = each.value.storage
|
||||
}
|
||||
|
||||
# Start on boot
|
||||
start_at_node_boot = true
|
||||
|
||||
|
||||
Reference in New Issue
Block a user