Compare commits
4 Commits
d194c147d6
...
host-vault
| Author | SHA1 | Date | |
|---|---|---|---|
|
4afb37d730
|
|||
|
a2c798bc30
|
|||
|
6d64e53586
|
|||
|
e0ad445341
|
224
TODO.md
224
TODO.md
@@ -153,9 +153,9 @@ create-host \
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Secrets Management Automation
|
||||
### Phase 4: Secrets Management with HashiCorp Vault
|
||||
|
||||
**Challenge:** sops needs age key, but age key is generated on first boot
|
||||
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
||||
|
||||
**Current workflow:**
|
||||
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
|
||||
@@ -164,27 +164,213 @@ create-host \
|
||||
4. User commits, pushes
|
||||
5. VM can now decrypt secrets
|
||||
|
||||
**Proposed solution:**
|
||||
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
|
||||
|
||||
**Option A: Pre-generate age keys**
|
||||
- [ ] Generate age key pair during `create-host-config.sh`
|
||||
- [ ] Add public key to `.sops.yaml` immediately
|
||||
- [ ] Store private key temporarily (secure location)
|
||||
- [ ] Inject private key via cloud-init write_files or Terraform file provisioner
|
||||
- [ ] VM uses pre-configured key from first boot
|
||||
**Benefits:**
|
||||
- Industry-standard secrets management (Vault experience transferable to work)
|
||||
- Eliminates manual age key distribution step
|
||||
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
|
||||
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
|
||||
- Automatic secret rotation capabilities
|
||||
- Audit logging for all secret access
|
||||
- AppRole authentication enables automated bootstrap
|
||||
|
||||
**Option B: Post-deployment secret injection**
|
||||
- [ ] VM boots with template, generates its own key
|
||||
- [ ] Fetch public key via SSH after first boot
|
||||
- [ ] Automatically add to `.sops.yaml` and commit
|
||||
- [ ] Trigger rebuild on VM to pick up secrets access
|
||||
**Architecture:**
|
||||
```
|
||||
vault.home.2rjus.net
|
||||
├─ KV Secrets Engine (replaces sops-nix)
|
||||
├─ PKI Engine (replaces step-ca for TLS)
|
||||
├─ SSH CA Engine (replaces step-ca SSH CA)
|
||||
└─ AppRole Auth (per-host authentication)
|
||||
↓
|
||||
New hosts authenticate on first boot
|
||||
Fetch secrets via Vault API
|
||||
No manual key distribution needed
|
||||
```
|
||||
|
||||
**Option C: Separate secrets from initial deployment**
|
||||
- [ ] Initial deployment works without secrets
|
||||
- [ ] After VM is running, user manually adds age key
|
||||
- [ ] Subsequent auto-upgrades pick up secrets
|
||||
---
|
||||
|
||||
**Decision needed:** Option A is most automated, but requires secure key handling
|
||||
#### Phase 4a: Vault Server Setup
|
||||
|
||||
**Goal:** Deploy and configure Vault server with auto-unseal
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Create `hosts/vault01/` configuration
|
||||
- [ ] Basic NixOS configuration (hostname, networking, etc.)
|
||||
- [ ] Vault service configuration
|
||||
- [ ] Firewall rules (8200 for API, 8201 for cluster)
|
||||
- [ ] Add to flake.nix and terraform
|
||||
- [ ] Implement auto-unseal mechanism
|
||||
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
|
||||
- [ ] Use tpm2-tools to seal/unseal Vault keys
|
||||
- [ ] Systemd service to unseal on boot
|
||||
- [ ] **Fallback:** Shamir secret sharing with systemd automation
|
||||
- [ ] Generate 3 keys, threshold 2
|
||||
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
|
||||
- [ ] Systemd service auto-unseals using 2 keys
|
||||
- [ ] Initial Vault setup
|
||||
- [ ] Initialize Vault
|
||||
- [ ] Configure storage backend (integrated raft or file)
|
||||
- [ ] Set up root token management
|
||||
- [ ] Enable audit logging
|
||||
- [ ] Deploy to infrastructure
|
||||
- [ ] Add DNS entry for vault.home.2rjus.net
|
||||
- [ ] Deploy VM via terraform
|
||||
- [ ] Bootstrap and verify Vault is running
|
||||
|
||||
**Deliverable:** Running Vault server that auto-unseals on boot
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4b: Vault-as-Code with OpenTofu
|
||||
|
||||
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault Terraform provider
|
||||
- [ ] Create `terraform/vault/` directory
|
||||
- [ ] Configure Vault provider (address, auth)
|
||||
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
|
||||
- [ ] Enable and configure secrets engines
|
||||
- [ ] Enable KV v2 secrets engine at `secret/`
|
||||
- [ ] Define secret path structure (per-service, per-host)
|
||||
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
|
||||
- [ ] Define policies as code
|
||||
- [ ] Create policies for different service tiers
|
||||
- [ ] Principle of least privilege (hosts only read their secrets)
|
||||
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
|
||||
- [ ] Set up AppRole authentication
|
||||
- [ ] Enable AppRole auth backend
|
||||
- [ ] Create role per host type (monitoring, dns, database, etc.)
|
||||
- [ ] Bind policies to roles
|
||||
- [ ] Configure TTL and token policies
|
||||
- [ ] Migrate existing secrets from sops-nix
|
||||
- [ ] Create migration script/playbook
|
||||
- [ ] Decrypt sops secrets and load into Vault KV
|
||||
- [ ] Verify all secrets migrated successfully
|
||||
- [ ] Keep sops as backup during transition
|
||||
- [ ] Implement secrets-as-code patterns
|
||||
- [ ] Secret values in gitignored terraform.tfvars
|
||||
- [ ] Or use random_password for auto-generated secrets
|
||||
- [ ] Secret structure/paths in version-controlled .tf files
|
||||
|
||||
**Example OpenTofu:**
|
||||
```hcl
|
||||
resource "vault_kv_secret_v2" "monitoring_grafana" {
|
||||
mount = "secret"
|
||||
name = "monitoring/grafana"
|
||||
data_json = jsonencode({
|
||||
admin_password = var.grafana_admin_password
|
||||
smtp_password = var.smtp_password
|
||||
})
|
||||
}
|
||||
|
||||
resource "vault_policy" "monitoring" {
|
||||
name = "monitoring-policy"
|
||||
policy = <<EOT
|
||||
path "secret/data/monitoring/*" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
EOT
|
||||
}
|
||||
|
||||
resource "vault_approle_auth_backend_role" "monitoring01" {
|
||||
backend = "approle"
|
||||
role_name = "monitoring01"
|
||||
token_policies = ["monitoring-policy"]
|
||||
}
|
||||
```
|
||||
|
||||
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4c: PKI Migration (Replace step-ca)
|
||||
|
||||
**Goal:** Consolidate PKI infrastructure into Vault
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault PKI engines
|
||||
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
|
||||
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
|
||||
- [ ] Sign intermediate with root CA
|
||||
- [ ] Configure CRL and OCSP
|
||||
- [ ] Enable ACME support
|
||||
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
|
||||
- [ ] Create PKI role for homelab domain
|
||||
- [ ] Set certificate TTLs and allowed domains
|
||||
- [ ] Configure SSH CA in Vault
|
||||
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
||||
- [ ] Generate SSH signing keys
|
||||
- [ ] Create roles for host and user certificates
|
||||
- [ ] Configure TTLs and allowed principals
|
||||
- [ ] Migrate hosts from step-ca to Vault
|
||||
- [ ] Update system/acme.nix to use Vault ACME endpoint
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test certificate issuance on one host
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Migrate SSH CA trust
|
||||
- [ ] Distribute Vault SSH CA public key to all hosts
|
||||
- [ ] Update sshd_config to trust Vault CA
|
||||
- [ ] Test SSH certificate authentication
|
||||
- [ ] Decommission step-ca
|
||||
- [ ] Verify all services migrated
|
||||
- [ ] Stop step-ca service on ca host
|
||||
- [ ] Archive step-ca configuration for backup
|
||||
|
||||
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4d: Bootstrap Integration
|
||||
|
||||
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Update create-host tool
|
||||
- [ ] Generate AppRole role_id + secret_id for new host
|
||||
- [ ] Or create wrapped token for one-time bootstrap
|
||||
- [ ] Add host-specific policy to Vault (via terraform)
|
||||
- [ ] Store bootstrap credentials for cloud-init injection
|
||||
- [ ] Update template2 for Vault authentication
|
||||
- [ ] Create Vault authentication module
|
||||
- [ ] Reads bootstrap credentials from cloud-init
|
||||
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
|
||||
- [ ] Stores role_id + secret_id locally for services to use
|
||||
- [ ] Create NixOS Vault secrets module
|
||||
- [ ] Replacement for sops.secrets
|
||||
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
|
||||
- [ ] Or runtime secret fetching for services
|
||||
- [ ] Handle Vault token renewal
|
||||
- [ ] Update bootstrap service
|
||||
- [ ] After authenticating to Vault, fetch any bootstrap secrets
|
||||
- [ ] Run nixos-rebuild with host configuration
|
||||
- [ ] Services automatically fetch their secrets from Vault
|
||||
- [ ] Update terraform cloud-init
|
||||
- [ ] Inject Vault address and bootstrap credentials
|
||||
- [ ] Pass via cloud-init user-data or write_files
|
||||
- [ ] Credentials scoped to single use or short TTL
|
||||
- [ ] Test complete flow
|
||||
- [ ] Run create-host to generate new host config
|
||||
- [ ] Deploy with terraform
|
||||
- [ ] Verify host bootstraps and authenticates to Vault
|
||||
- [ ] Verify services can fetch secrets
|
||||
- [ ] Confirm no manual steps required
|
||||
|
||||
**Bootstrap flow:**
|
||||
```
|
||||
1. terraform apply (deploys VM with cloud-init)
|
||||
2. Cloud-init sets hostname + Vault bootstrap credentials
|
||||
3. nixos-bootstrap.service runs:
|
||||
- Authenticates to Vault with bootstrap credentials
|
||||
- Retrieves permanent AppRole credentials
|
||||
- Stores locally for service use
|
||||
- Runs nixos-rebuild
|
||||
4. Host services fetch secrets from Vault as needed
|
||||
5. Done - no manual intervention
|
||||
```
|
||||
|
||||
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
|
||||
|
||||
---
|
||||
|
||||
|
||||
16
flake.nix
16
flake.nix
@@ -350,6 +350,22 @@
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
vault01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/vault01
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
};
|
||||
packages = forAllSystems (
|
||||
{ pkgs }:
|
||||
|
||||
63
hosts/vault01/configuration.nix
Normal file
63
hosts/vault01/configuration.nix
Normal file
@@ -0,0 +1,63 @@
|
||||
{
|
||||
config,
|
||||
lib,
|
||||
pkgs,
|
||||
...
|
||||
}:
|
||||
|
||||
{
|
||||
imports = [
|
||||
../template2/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
../../common/vm
|
||||
../../services/vault
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
|
||||
networking.hostName = "vault01";
|
||||
networking.domain = "home.2rjus.net";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = true;
|
||||
networking.nameservers = [
|
||||
"10.69.13.5"
|
||||
"10.69.13.6"
|
||||
];
|
||||
|
||||
systemd.network.enable = true;
|
||||
systemd.network.networks."ens18" = {
|
||||
matchConfig.Name = "ens18";
|
||||
address = [
|
||||
"10.69.13.19/24"
|
||||
];
|
||||
routes = [
|
||||
{ Gateway = "10.69.13.1"; }
|
||||
];
|
||||
linkConfig.RequiredForOnline = "routable";
|
||||
};
|
||||
time.timeZone = "Europe/Oslo";
|
||||
|
||||
nix.settings.experimental-features = [
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
git
|
||||
];
|
||||
|
||||
# Open ports in the firewall.
|
||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
system.stateVersion = "25.11"; # Did you read the comment?
|
||||
}
|
||||
|
||||
5
hosts/vault01/default.nix
Normal file
5
hosts/vault01/default.nix
Normal file
@@ -0,0 +1,5 @@
|
||||
{ ... }: {
|
||||
imports = [
|
||||
./configuration.nix
|
||||
];
|
||||
}
|
||||
@@ -21,7 +21,7 @@
|
||||
networking.domain = "{{ domain }}";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = false;
|
||||
services.resolved.enable = true;
|
||||
networking.nameservers = [
|
||||
{% for ns in nameservers %}
|
||||
"{{ ns }}"
|
||||
|
||||
8
services/vault/default.nix
Normal file
8
services/vault/default.nix
Normal file
@@ -0,0 +1,8 @@
|
||||
{ ... }:
|
||||
{
|
||||
services.vault = {
|
||||
enable = true;
|
||||
|
||||
storageBackend = "file";
|
||||
};
|
||||
}
|
||||
@@ -38,6 +38,12 @@ locals {
|
||||
disk_size = "20G"
|
||||
flake_branch = "pipeline-testing-improvements"
|
||||
}
|
||||
"vault01" = {
|
||||
ip = "10.69.13.19/24"
|
||||
cpu_cores = 2
|
||||
memory = 2048
|
||||
disk_size = "20G"
|
||||
}
|
||||
}
|
||||
|
||||
# Compute VM configurations with defaults applied
|
||||
|
||||
Reference in New Issue
Block a user