Compare commits
3 Commits
host-vault
...
e413de6437
| Author | SHA1 | Date | |
|---|---|---|---|
|
e413de6437
|
|||
|
b0c743b093
|
|||
|
74071887ad
|
224
TODO.md
224
TODO.md
@@ -153,9 +153,9 @@ create-host \
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Secrets Management with HashiCorp Vault
|
||||
### Phase 4: Secrets Management Automation
|
||||
|
||||
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
||||
**Challenge:** sops needs age key, but age key is generated on first boot
|
||||
|
||||
**Current workflow:**
|
||||
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
|
||||
@@ -164,213 +164,27 @@ create-host \
|
||||
4. User commits, pushes
|
||||
5. VM can now decrypt secrets
|
||||
|
||||
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
|
||||
**Proposed solution:**
|
||||
|
||||
**Benefits:**
|
||||
- Industry-standard secrets management (Vault experience transferable to work)
|
||||
- Eliminates manual age key distribution step
|
||||
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
|
||||
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
|
||||
- Automatic secret rotation capabilities
|
||||
- Audit logging for all secret access
|
||||
- AppRole authentication enables automated bootstrap
|
||||
**Option A: Pre-generate age keys**
|
||||
- [ ] Generate age key pair during `create-host-config.sh`
|
||||
- [ ] Add public key to `.sops.yaml` immediately
|
||||
- [ ] Store private key temporarily (secure location)
|
||||
- [ ] Inject private key via cloud-init write_files or Terraform file provisioner
|
||||
- [ ] VM uses pre-configured key from first boot
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
vault.home.2rjus.net
|
||||
├─ KV Secrets Engine (replaces sops-nix)
|
||||
├─ PKI Engine (replaces step-ca for TLS)
|
||||
├─ SSH CA Engine (replaces step-ca SSH CA)
|
||||
└─ AppRole Auth (per-host authentication)
|
||||
↓
|
||||
New hosts authenticate on first boot
|
||||
Fetch secrets via Vault API
|
||||
No manual key distribution needed
|
||||
```
|
||||
**Option B: Post-deployment secret injection**
|
||||
- [ ] VM boots with template, generates its own key
|
||||
- [ ] Fetch public key via SSH after first boot
|
||||
- [ ] Automatically add to `.sops.yaml` and commit
|
||||
- [ ] Trigger rebuild on VM to pick up secrets access
|
||||
|
||||
---
|
||||
**Option C: Separate secrets from initial deployment**
|
||||
- [ ] Initial deployment works without secrets
|
||||
- [ ] After VM is running, user manually adds age key
|
||||
- [ ] Subsequent auto-upgrades pick up secrets
|
||||
|
||||
#### Phase 4a: Vault Server Setup
|
||||
|
||||
**Goal:** Deploy and configure Vault server with auto-unseal
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Create `hosts/vault01/` configuration
|
||||
- [ ] Basic NixOS configuration (hostname, networking, etc.)
|
||||
- [ ] Vault service configuration
|
||||
- [ ] Firewall rules (8200 for API, 8201 for cluster)
|
||||
- [ ] Add to flake.nix and terraform
|
||||
- [ ] Implement auto-unseal mechanism
|
||||
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
|
||||
- [ ] Use tpm2-tools to seal/unseal Vault keys
|
||||
- [ ] Systemd service to unseal on boot
|
||||
- [ ] **Fallback:** Shamir secret sharing with systemd automation
|
||||
- [ ] Generate 3 keys, threshold 2
|
||||
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
|
||||
- [ ] Systemd service auto-unseals using 2 keys
|
||||
- [ ] Initial Vault setup
|
||||
- [ ] Initialize Vault
|
||||
- [ ] Configure storage backend (integrated raft or file)
|
||||
- [ ] Set up root token management
|
||||
- [ ] Enable audit logging
|
||||
- [ ] Deploy to infrastructure
|
||||
- [ ] Add DNS entry for vault.home.2rjus.net
|
||||
- [ ] Deploy VM via terraform
|
||||
- [ ] Bootstrap and verify Vault is running
|
||||
|
||||
**Deliverable:** Running Vault server that auto-unseals on boot
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4b: Vault-as-Code with OpenTofu
|
||||
|
||||
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault Terraform provider
|
||||
- [ ] Create `terraform/vault/` directory
|
||||
- [ ] Configure Vault provider (address, auth)
|
||||
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
|
||||
- [ ] Enable and configure secrets engines
|
||||
- [ ] Enable KV v2 secrets engine at `secret/`
|
||||
- [ ] Define secret path structure (per-service, per-host)
|
||||
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
|
||||
- [ ] Define policies as code
|
||||
- [ ] Create policies for different service tiers
|
||||
- [ ] Principle of least privilege (hosts only read their secrets)
|
||||
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
|
||||
- [ ] Set up AppRole authentication
|
||||
- [ ] Enable AppRole auth backend
|
||||
- [ ] Create role per host type (monitoring, dns, database, etc.)
|
||||
- [ ] Bind policies to roles
|
||||
- [ ] Configure TTL and token policies
|
||||
- [ ] Migrate existing secrets from sops-nix
|
||||
- [ ] Create migration script/playbook
|
||||
- [ ] Decrypt sops secrets and load into Vault KV
|
||||
- [ ] Verify all secrets migrated successfully
|
||||
- [ ] Keep sops as backup during transition
|
||||
- [ ] Implement secrets-as-code patterns
|
||||
- [ ] Secret values in gitignored terraform.tfvars
|
||||
- [ ] Or use random_password for auto-generated secrets
|
||||
- [ ] Secret structure/paths in version-controlled .tf files
|
||||
|
||||
**Example OpenTofu:**
|
||||
```hcl
|
||||
resource "vault_kv_secret_v2" "monitoring_grafana" {
|
||||
mount = "secret"
|
||||
name = "monitoring/grafana"
|
||||
data_json = jsonencode({
|
||||
admin_password = var.grafana_admin_password
|
||||
smtp_password = var.smtp_password
|
||||
})
|
||||
}
|
||||
|
||||
resource "vault_policy" "monitoring" {
|
||||
name = "monitoring-policy"
|
||||
policy = <<EOT
|
||||
path "secret/data/monitoring/*" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
EOT
|
||||
}
|
||||
|
||||
resource "vault_approle_auth_backend_role" "monitoring01" {
|
||||
backend = "approle"
|
||||
role_name = "monitoring01"
|
||||
token_policies = ["monitoring-policy"]
|
||||
}
|
||||
```
|
||||
|
||||
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4c: PKI Migration (Replace step-ca)
|
||||
|
||||
**Goal:** Consolidate PKI infrastructure into Vault
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Set up Vault PKI engines
|
||||
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
|
||||
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
|
||||
- [ ] Sign intermediate with root CA
|
||||
- [ ] Configure CRL and OCSP
|
||||
- [ ] Enable ACME support
|
||||
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
|
||||
- [ ] Create PKI role for homelab domain
|
||||
- [ ] Set certificate TTLs and allowed domains
|
||||
- [ ] Configure SSH CA in Vault
|
||||
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
||||
- [ ] Generate SSH signing keys
|
||||
- [ ] Create roles for host and user certificates
|
||||
- [ ] Configure TTLs and allowed principals
|
||||
- [ ] Migrate hosts from step-ca to Vault
|
||||
- [ ] Update system/acme.nix to use Vault ACME endpoint
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test certificate issuance on one host
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Migrate SSH CA trust
|
||||
- [ ] Distribute Vault SSH CA public key to all hosts
|
||||
- [ ] Update sshd_config to trust Vault CA
|
||||
- [ ] Test SSH certificate authentication
|
||||
- [ ] Decommission step-ca
|
||||
- [ ] Verify all services migrated
|
||||
- [ ] Stop step-ca service on ca host
|
||||
- [ ] Archive step-ca configuration for backup
|
||||
|
||||
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
|
||||
|
||||
---
|
||||
|
||||
#### Phase 4d: Bootstrap Integration
|
||||
|
||||
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Update create-host tool
|
||||
- [ ] Generate AppRole role_id + secret_id for new host
|
||||
- [ ] Or create wrapped token for one-time bootstrap
|
||||
- [ ] Add host-specific policy to Vault (via terraform)
|
||||
- [ ] Store bootstrap credentials for cloud-init injection
|
||||
- [ ] Update template2 for Vault authentication
|
||||
- [ ] Create Vault authentication module
|
||||
- [ ] Reads bootstrap credentials from cloud-init
|
||||
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
|
||||
- [ ] Stores role_id + secret_id locally for services to use
|
||||
- [ ] Create NixOS Vault secrets module
|
||||
- [ ] Replacement for sops.secrets
|
||||
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
|
||||
- [ ] Or runtime secret fetching for services
|
||||
- [ ] Handle Vault token renewal
|
||||
- [ ] Update bootstrap service
|
||||
- [ ] After authenticating to Vault, fetch any bootstrap secrets
|
||||
- [ ] Run nixos-rebuild with host configuration
|
||||
- [ ] Services automatically fetch their secrets from Vault
|
||||
- [ ] Update terraform cloud-init
|
||||
- [ ] Inject Vault address and bootstrap credentials
|
||||
- [ ] Pass via cloud-init user-data or write_files
|
||||
- [ ] Credentials scoped to single use or short TTL
|
||||
- [ ] Test complete flow
|
||||
- [ ] Run create-host to generate new host config
|
||||
- [ ] Deploy with terraform
|
||||
- [ ] Verify host bootstraps and authenticates to Vault
|
||||
- [ ] Verify services can fetch secrets
|
||||
- [ ] Confirm no manual steps required
|
||||
|
||||
**Bootstrap flow:**
|
||||
```
|
||||
1. terraform apply (deploys VM with cloud-init)
|
||||
2. Cloud-init sets hostname + Vault bootstrap credentials
|
||||
3. nixos-bootstrap.service runs:
|
||||
- Authenticates to Vault with bootstrap credentials
|
||||
- Retrieves permanent AppRole credentials
|
||||
- Stores locally for service use
|
||||
- Runs nixos-rebuild
|
||||
4. Host services fetch secrets from Vault as needed
|
||||
5. Done - no manual intervention
|
||||
```
|
||||
|
||||
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
|
||||
**Decision needed:** Option A is most automated, but requires secure key handling
|
||||
|
||||
---
|
||||
|
||||
|
||||
18
flake.nix
18
flake.nix
@@ -334,6 +334,7 @@
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
};
|
||||
testvm01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
@@ -350,23 +351,6 @@
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
vault01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/vault01
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
};
|
||||
packages = forAllSystems (
|
||||
{ pkgs }:
|
||||
{
|
||||
|
||||
@@ -7,15 +7,16 @@
|
||||
|
||||
{
|
||||
imports = [
|
||||
../template2/hardware-configuration.nix
|
||||
../template/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
../../common/vm
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
boot.loader.grub.device = "/dev/sda";
|
||||
|
||||
networking.hostName = "testvm01";
|
||||
networking.domain = "home.2rjus.net";
|
||||
|
||||
@@ -1,63 +0,0 @@
|
||||
{
|
||||
config,
|
||||
lib,
|
||||
pkgs,
|
||||
...
|
||||
}:
|
||||
|
||||
{
|
||||
imports = [
|
||||
../template2/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
../../common/vm
|
||||
../../services/vault
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
|
||||
networking.hostName = "vault01";
|
||||
networking.domain = "home.2rjus.net";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = true;
|
||||
networking.nameservers = [
|
||||
"10.69.13.5"
|
||||
"10.69.13.6"
|
||||
];
|
||||
|
||||
systemd.network.enable = true;
|
||||
systemd.network.networks."ens18" = {
|
||||
matchConfig.Name = "ens18";
|
||||
address = [
|
||||
"10.69.13.19/24"
|
||||
];
|
||||
routes = [
|
||||
{ Gateway = "10.69.13.1"; }
|
||||
];
|
||||
linkConfig.RequiredForOnline = "routable";
|
||||
};
|
||||
time.timeZone = "Europe/Oslo";
|
||||
|
||||
nix.settings.experimental-features = [
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
git
|
||||
];
|
||||
|
||||
# Open ports in the firewall.
|
||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
system.stateVersion = "25.11"; # Did you read the comment?
|
||||
}
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
{ ... }: {
|
||||
imports = [
|
||||
./configuration.nix
|
||||
];
|
||||
}
|
||||
@@ -50,17 +50,17 @@ def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -
|
||||
if count == 0:
|
||||
raise ValueError(f"Could not find existing entry for {config.hostname} in flake.nix")
|
||||
else:
|
||||
# Insert new entry before closing brace of nixosConfigurations
|
||||
# Pattern: " };\n packages = forAllSystems"
|
||||
pattern = r"( \};)\n( packages = forAllSystems)"
|
||||
replacement = rf"{new_entry}\g<1>\n\g<2>"
|
||||
# Insert new entry before closing brace
|
||||
# Pattern: " };\n packages ="
|
||||
pattern = r"( \};)\n( packages =)"
|
||||
replacement = rf"\g<1>\n{new_entry}\g<2>"
|
||||
|
||||
new_content, count = re.subn(pattern, replacement, content)
|
||||
|
||||
if count == 0:
|
||||
raise ValueError(
|
||||
"Could not find insertion point in flake.nix. "
|
||||
"Looking for pattern: ' };\\n packages = forAllSystems'"
|
||||
"Looking for pattern: ' };\\n packages ='"
|
||||
)
|
||||
|
||||
flake_path.write_text(new_content)
|
||||
|
||||
@@ -7,21 +7,22 @@
|
||||
|
||||
{
|
||||
imports = [
|
||||
../template2/hardware-configuration.nix
|
||||
../template/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
../../common/vm
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
boot.loader.grub.device = "/dev/sda";
|
||||
|
||||
networking.hostName = "{{ hostname }}";
|
||||
networking.domain = "{{ domain }}";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = true;
|
||||
services.resolved.enable = false;
|
||||
networking.nameservers = [
|
||||
{% for ns in nameservers %}
|
||||
"{{ ns }}"
|
||||
|
||||
@@ -1,8 +0,0 @@
|
||||
{ ... }:
|
||||
{
|
||||
services.vault = {
|
||||
enable = true;
|
||||
|
||||
storageBackend = "file";
|
||||
};
|
||||
}
|
||||
@@ -8,7 +8,7 @@ resource "proxmox_cloud_init_disk" "ci" {
|
||||
|
||||
name = each.key
|
||||
pve_node = each.value.target_node
|
||||
storage = "local" # Cloud-init disks must be on storage that supports ISO/snippets
|
||||
storage = each.value.storage
|
||||
|
||||
# User data includes SSH keys and optionally NIXOS_FLAKE_BRANCH
|
||||
user_data = <<-EOT
|
||||
@@ -25,34 +25,27 @@ resource "proxmox_cloud_init_disk" "ci" {
|
||||
: ""}
|
||||
EOT
|
||||
|
||||
# Network configuration - static IP or DHCP
|
||||
network_config = each.value.ip != null ? yamlencode({
|
||||
version = 1
|
||||
config = [{
|
||||
type = "physical"
|
||||
name = "ens18"
|
||||
subnets = [{
|
||||
type = "static"
|
||||
address = each.value.ip
|
||||
gateway = each.value.gateway
|
||||
dns_nameservers = split(" ", each.value.nameservers)
|
||||
dns_search = [each.value.search_domain]
|
||||
}]
|
||||
# Network configuration - static IP or DHCP
|
||||
network_config = yamlencode({
|
||||
version = 1
|
||||
config = [{
|
||||
type = "physical"
|
||||
name = "ens18"
|
||||
subnets = each.value.ip != null ? [{
|
||||
type = "static"
|
||||
address = each.value.ip
|
||||
gateway = each.value.gateway
|
||||
dns_nameservers = split(" ", each.value.nameservers)
|
||||
dns_search = [each.value.search_domain]
|
||||
}] : [{
|
||||
type = "dhcp"
|
||||
}]
|
||||
}) : yamlencode({
|
||||
version = 1
|
||||
config = [{
|
||||
type = "physical"
|
||||
name = "ens18"
|
||||
subnets = [{
|
||||
type = "dhcp"
|
||||
}]
|
||||
}]
|
||||
})
|
||||
}]
|
||||
})
|
||||
|
||||
# Instance metadata
|
||||
meta_data = yamlencode({
|
||||
instance_id = sha1(each.key)
|
||||
local-hostname = each.key
|
||||
})
|
||||
# Instance metadata
|
||||
meta_data = yamlencode({
|
||||
instance_id = sha1(each.key)
|
||||
local-hostname = each.key
|
||||
})
|
||||
}
|
||||
|
||||
@@ -38,12 +38,6 @@ locals {
|
||||
disk_size = "20G"
|
||||
flake_branch = "pipeline-testing-improvements"
|
||||
}
|
||||
"vault01" = {
|
||||
ip = "10.69.13.19/24"
|
||||
cpu_cores = 2
|
||||
memory = 2048
|
||||
disk_size = "20G"
|
||||
}
|
||||
}
|
||||
|
||||
# Compute VM configurations with defaults applied
|
||||
|
||||
Reference in New Issue
Block a user