9 Commits

Author SHA1 Message Date
2669b10f0e flake: update homelab-deploy
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:53:13 +01:00
db6d610e16 homelab: add deploy.enable option with assertion
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
- Add homelab.deploy.enable option (requires vault.enable)
- Create shared homelab-deploy Vault policy for all hosts
- Enable homelab.deploy on all vault-enabled hosts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:47:12 +01:00
e4eb8afe5c system: enable homelab-deploy listener for all vault hosts
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
Add system/homelab-deploy.nix module that automatically enables the
listener on all hosts with vault.enable=true. Uses homelab.host.tier
and homelab.host.role for NATS subject subscriptions.

- Add homelab-deploy access to all host AppRole policies
- Remove manual listener config from vaulttest01 (now handled by system module)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:41:03 +01:00
df9246a0f8 flake: update homelab-deploy
Some checks failed
Run nix flake check / flake-check (push) Failing after 12m46s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:27:21 +01:00
ec3b87f7fa flake: update homelab-deploy
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:20:14 +01:00
913fa11c64 flake: update homelab-deploy
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m39s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:11:37 +01:00
3e85e2527f flake: update homelab-deploy
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m7s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:58:26 +01:00
543ca18b14 flake: update homelab-deploy
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m6s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:54:05 +01:00
c83218b3bc flake: update homelab-deploy, add to devShell
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m6s
Update homelab-deploy to include bugfix. Add CLI to devShell for
easier testing and deployment operations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:45:54 +01:00
20 changed files with 206 additions and 536 deletions

View File

@@ -22,17 +22,6 @@
"ALERTMANAGER_URL": "https://alertmanager.home.2rjus.net",
"LOKI_URL": "http://monitoring01.home.2rjus.net:3100"
}
},
"homelab-deploy": {
"command": "nix",
"args": [
"run",
"git+https://git.t-juice.club/torjus/homelab-deploy",
"--",
"mcp",
"--nats-url", "nats://nats1.home.2rjus.net:4222",
"--nkey-file", "/home/torjus/.config/homelab-deploy/test-deployer.nkey"
]
}
}
}

View File

@@ -194,51 +194,6 @@ node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
node_filesystem_avail_bytes{mountpoint="/"}
```
### Deploying to Test Hosts
The **homelab-deploy** MCP server enables remote deployments to test-tier hosts via NATS messaging.
**Available Tools:**
- `deploy` - Deploy NixOS configuration to test-tier hosts
- `list_hosts` - List available deployment targets
**Deploy Parameters:**
- `hostname` - Target a specific host (e.g., `vaulttest01`)
- `role` - Deploy to all hosts with a specific role (e.g., `vault`)
- `all` - Deploy to all test-tier hosts
- `action` - nixos-rebuild action: `switch` (default), `boot`, `test`, `dry-activate`
- `branch` - Git branch or commit to deploy (default: `master`)
**Examples:**
```
# List available hosts
list_hosts()
# Deploy to a specific host
deploy(hostname="vaulttest01", action="switch")
# Dry-run deployment
deploy(hostname="vaulttest01", action="dry-activate")
# Deploy to all hosts with a role
deploy(role="vault", action="switch")
```
**Note:** Only test-tier hosts with `homelab.deploy.enable = true` and the listener service running will respond to deployments.
**Verifying Deployments:**
After deploying, use the `nixos_flake_info` metric from nixos-exporter to verify the host is running the expected revision:
```promql
nixos_flake_info{instance=~"vaulttest01.*"}
```
The `current_rev` label contains the git commit hash of the deployed flake configuration.
## Architecture
### Directory Structure

View File

@@ -1,122 +0,0 @@
# Long-Term Metrics Storage Options
## Problem Statement
Current Prometheus configuration retains metrics for 30 days (`retentionTime = "30d"`). Extending retention further raises disk usage concerns on the homelab hypervisor with limited local storage.
Prometheus does not support downsampling - it stores all data at full resolution until the retention period expires, then deletes it entirely.
## Current Configuration
Location: `services/monitoring/prometheus.nix`
- **Retention**: 30 days
- **Scrape interval**: 15s
- **Features**: Alertmanager, Pushgateway, auto-generated scrape configs from flake hosts
- **Storage**: Local disk on monitoring01
## Options Evaluated
### Option 1: VictoriaMetrics
VictoriaMetrics is a Prometheus-compatible TSDB with significantly better compression (5-10x smaller storage footprint).
**NixOS Options Available:**
- `services.victoriametrics.enable`
- `services.victoriametrics.prometheusConfig` - accepts Prometheus scrape config format
- `services.victoriametrics.retentionPeriod` - e.g., "6m" for 6 months
- `services.vmagent` - dedicated scraping agent
- `services.vmalert` - alerting rules evaluation
**Pros:**
- Simple migration - single service replacement
- Same PromQL query language - Grafana dashboards work unchanged
- Same scrape config format - existing auto-generated configs work as-is
- 5-10x better compression means 30 days of Prometheus data could become 180+ days
- Lightweight, single binary
**Cons:**
- No automatic downsampling (relies on compression alone)
- Alerting requires switching to vmalert instead of Prometheus alertmanager integration
- Would need to migrate existing data or start fresh
**Migration Steps:**
1. Replace `services.prometheus` with `services.victoriametrics`
2. Move scrape configs to `prometheusConfig`
3. Set up `services.vmalert` for alerting rules
4. Update Grafana datasource to VictoriaMetrics port (8428)
5. Keep Alertmanager for notification routing
### Option 2: Thanos
Thanos extends Prometheus with long-term storage and automatic downsampling by uploading data to object storage.
**NixOS Options Available:**
- `services.thanos.sidecar` - uploads Prometheus blocks to object storage
- `services.thanos.compact` - compacts and downsamples data
- `services.thanos.query` - unified query gateway
- `services.thanos.query-frontend` - query caching and parallelization
- `services.thanos.downsample` - dedicated downsampling service
**Downsampling Behavior:**
- Raw resolution kept for configurable period (default: indefinite)
- 5-minute resolution created after 40 hours
- 1-hour resolution created after 10 days
**Retention Configuration (in compactor):**
```nix
services.thanos.compact = {
retention.resolution-raw = "30d"; # Keep raw for 30 days
retention.resolution-5m = "180d"; # Keep 5m samples for 6 months
retention.resolution-1h = "2y"; # Keep 1h samples for 2 years
};
```
**Pros:**
- True downsampling - older data uses progressively less storage
- Keep metrics for years with minimal storage impact
- Prometheus continues running unchanged
- Existing Alertmanager integration preserved
**Cons:**
- Requires object storage (MinIO, S3, or local filesystem)
- Multiple services to manage (sidecar, compactor, query)
- More complex architecture
- Additional infrastructure (MinIO) may be needed
**Required Components:**
1. Thanos Sidecar (runs alongside Prometheus)
2. Object storage (MinIO or local filesystem)
3. Thanos Compactor (handles downsampling)
4. Thanos Query (provides unified query endpoint)
**Migration Steps:**
1. Deploy object storage (MinIO or configure filesystem backend)
2. Add Thanos sidecar pointing to Prometheus data directory
3. Add Thanos compactor with retention policies
4. Add Thanos query gateway
5. Update Grafana datasource to Thanos Query port (10902)
## Comparison
| Aspect | VictoriaMetrics | Thanos |
|--------|-----------------|--------|
| Complexity | Low (1 service) | Higher (3-4 services) |
| Downsampling | No | Yes (automatic) |
| Storage savings | 5-10x compression | Compression + downsampling |
| Object storage required | No | Yes |
| Migration effort | Minimal | Moderate |
| Grafana changes | Change port only | Change port only |
| Alerting changes | Need vmalert | Keep existing |
## Recommendation
**Start with VictoriaMetrics** for simplicity. The compression alone may provide 6+ months of retention in the same disk space currently used for 30 days.
If multi-year retention with true downsampling becomes necessary, Thanos can be evaluated later. However, it requires deploying object storage infrastructure (MinIO) which adds operational complexity.
## References
- VictoriaMetrics docs: https://docs.victoriametrics.com/
- Thanos docs: https://thanos.io/tip/thanos/getting-started.md/
- NixOS options searched from nixpkgs revision e576e3c9 (NixOS 25.11)

8
flake.lock generated
View File

@@ -28,11 +28,11 @@
]
},
"locked": {
"lastModified": 1770447502,
"narHash": "sha256-xH1PNyE3ydj4udhe1IpK8VQxBPZETGLuORZdSWYRmSU=",
"lastModified": 1770443536,
"narHash": "sha256-UufZIVggiioMFDSjKx+ifgkDOk9alNSiRmkvc4/+HIA=",
"ref": "master",
"rev": "79db119d1ca6630023947ef0a65896cc3307c2ff",
"revCount": 22,
"rev": "95b795dcfd86b7b36045bba67e536b3a1c61dd33",
"revCount": 20,
"type": "git",
"url": "https://git.t-juice.club/torjus/homelab-deploy"
},

View File

@@ -186,15 +186,6 @@
./hosts/nats1
];
};
vault01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = commonModules ++ [
./hosts/vault01
];
};
testvm01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
@@ -204,22 +195,22 @@
./hosts/testvm01
];
};
testvm02 = nixpkgs.lib.nixosSystem {
vault01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = commonModules ++ [
./hosts/testvm02
./hosts/vault01
];
};
testvm03 = nixpkgs.lib.nixosSystem {
vaulttest01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = commonModules ++ [
./hosts/testvm03
./hosts/vaulttest01
];
};
};

View File

@@ -13,17 +13,14 @@
../../common/vm
];
# Host metadata (adjust as needed)
# Test VM - exclude from DNS zone generation
homelab.dns.enable = false;
homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation
tier = "test";
priority = "low";
};
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
@@ -32,7 +29,7 @@
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
services.resolved.enable = false;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
@@ -42,7 +39,7 @@
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.20/24"
"10.69.13.101/24"
];
routes = [
{ Gateway = "10.69.13.1"; }

View File

@@ -1,72 +0,0 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
# Host metadata (adjust as needed)
homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation
};
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "testvm02";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.21/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -1,72 +0,0 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
# Host metadata (adjust as needed)
homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation
};
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "testvm03";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.22/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -1,5 +0,0 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -0,0 +1,134 @@
{
config,
lib,
pkgs,
...
}:
let
vault-test-script = pkgs.writeShellApplication {
name = "vault-test";
text = ''
echo "=== Vault Secret Test ==="
echo "Secret path: hosts/vaulttest01/test-service"
if [ -f /run/secrets/test-service/password ]; then
echo " Password file exists"
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
else
echo " Password file missing!"
exit 1
fi
if [ -d /var/lib/vault/cache/test-service ]; then
echo " Cache directory exists"
else
echo " Cache directory missing!"
exit 1
fi
echo "Test successful!"
'';
};
in
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
homelab.host = {
tier = "test";
priority = "low";
role = "vault";
};
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "vaulttest01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.150/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
# Testing config
# Enable Vault secrets management
vault.enable = true;
homelab.deploy.enable = true;
# Define a test secret
vault.secrets.test-service = {
secretPath = "hosts/vaulttest01/test-service";
restartTrigger = true;
restartInterval = "daily";
services = [ "vault-test" ];
};
# Create a test service that uses the secret
systemd.services.vault-test = {
description = "Test Vault secret fetching";
wantedBy = [ "multi-user.target" ];
after = [ "vault-secret-test-service.service" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
ExecStart = lib.getExe vault-test-script;
StandardOutput = "journal+console";
};
};
# Test ACME certificate issuance from OpenBao PKI
# Override the global ACME server (from system/acme.nix) to use OpenBao instead of step-ca
security.acme.defaults.server = lib.mkForce "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory";
# Request a certificate for this host
# Using HTTP-01 challenge with standalone listener on port 80
security.acme.certs."vaulttest01.home.2rjus.net" = {
listenHTTP = ":80";
enableDebugLogs = true;
};
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -18,8 +18,6 @@ from manipulators import (
remove_from_flake_nix,
remove_from_terraform_vms,
remove_from_vault_terraform,
remove_from_approle_tf,
find_host_secrets,
check_entries_exist,
)
from models import HostConfig
@@ -257,10 +255,7 @@ def handle_remove(
sys.exit(1)
# Check what entries exist
flake_exists, terraform_exists, vault_exists, approle_exists = check_entries_exist(hostname, repo_root)
# Check for secrets in secrets.tf
host_secrets = find_host_secrets(hostname, repo_root)
flake_exists, terraform_exists, vault_exists = check_entries_exist(hostname, repo_root)
# Collect all files in the host directory recursively
files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()])
@@ -299,21 +294,6 @@ def handle_remove(
else:
console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]")
if approle_exists:
console.print(f' • terraform/vault/approle.tf (host_policies["{hostname}"])')
else:
console.print(f" • terraform/vault/approle.tf [dim](not found)[/dim]")
# Warn about secrets in secrets.tf
if host_secrets:
console.print(f"\n[yellow]⚠️ Warning: Found {len(host_secrets)} secret(s) in terraform/vault/secrets.tf:[/yellow]")
for secret_path in host_secrets:
console.print(f'"{secret_path}"')
console.print(f"\n [yellow]These will NOT be removed automatically.[/yellow]")
console.print(f" After removal, manually edit secrets.tf and run:")
for secret_path in host_secrets:
console.print(f" [white]vault kv delete secret/{secret_path}[/white]")
# Warn about secrets directory
if secrets_exist:
console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists and will NOT be deleted[/yellow]")
@@ -343,13 +323,6 @@ def handle_remove(
else:
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf")
# Remove from terraform/vault/approle.tf
if approle_exists:
if remove_from_approle_tf(hostname, repo_root):
console.print("[green]✓[/green] Removed from terraform/vault/approle.tf")
else:
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/approle.tf")
# Remove from terraform/vms.tf
if terraform_exists:
if remove_from_terraform_vms(hostname, repo_root):
@@ -372,34 +345,19 @@ def handle_remove(
console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n")
# Display next steps
display_removal_next_steps(hostname, vault_exists, approle_exists, host_secrets)
display_removal_next_steps(hostname, vault_exists)
def display_removal_next_steps(hostname: str, had_vault: bool, had_approle: bool, host_secrets: list) -> None:
def display_removal_next_steps(hostname: str, had_vault: bool) -> None:
"""Display next steps after successful removal."""
vault_files = ""
if had_vault:
vault_files += " terraform/vault/hosts-generated.tf"
if had_approle:
vault_files += " terraform/vault/approle.tf"
vault_file = " terraform/vault/hosts-generated.tf" if had_vault else ""
vault_apply = ""
if had_vault or had_approle:
if had_vault:
vault_apply = f"""
3. Apply Vault changes:
[white]cd terraform/vault && tofu apply[/white]
"""
secrets_cleanup = ""
if host_secrets:
secrets_cleanup = f"""
5. Clean up secrets (manual):
Edit terraform/vault/secrets.tf to remove entries for {hostname}
Then delete from Vault:"""
for secret_path in host_secrets:
secrets_cleanup += f"\n [white]vault kv delete secret/{secret_path}[/white]"
secrets_cleanup += "\n"
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
1. Review changes:
@@ -409,9 +367,9 @@ def display_removal_next_steps(hostname: str, had_vault: bool, had_approle: bool
[white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white]
{vault_apply}
4. Commit changes:
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_files}
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_file}
git commit -m "hosts: remove {hostname}"[/white]
{secrets_cleanup}"""
"""
console.print(Panel(next_steps, border_style="cyan"))

View File

@@ -144,7 +144,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
backend = vault_auth_backend.approle.path
role_name = each.key
token_policies = ["host-\${each.key}", "homelab-deploy"]
token_policies = ["host-\${each.key}"]
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
token_ttl = 3600
token_max_ttl = 3600

View File

@@ -22,12 +22,12 @@ def remove_from_flake_nix(hostname: str, repo_root: Path) -> bool:
content = flake_path.read_text()
# Check if hostname exists
hostname_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
hostname_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
if not re.search(hostname_pattern, content, re.MULTILINE):
return False
# Match the entire block from "hostname = " to "};"
replace_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
replace_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
new_content, count = re.subn(replace_pattern, "", content, flags=re.MULTILINE | re.DOTALL)
if count == 0:
@@ -101,68 +101,7 @@ def remove_from_vault_terraform(hostname: str, repo_root: Path) -> bool:
return True
def remove_from_approle_tf(hostname: str, repo_root: Path) -> bool:
"""
Remove host entry from terraform/vault/approle.tf locals.host_policies.
Args:
hostname: Hostname to remove
repo_root: Path to repository root
Returns:
True if found and removed, False if not found
"""
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
if not approle_path.exists():
return False
content = approle_path.read_text()
# Check if hostname exists in host_policies
hostname_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
if not re.search(hostname_pattern, content, re.MULTILINE):
return False
# Match the entire block from "hostname" = { to closing }
# The block contains paths = [ ... ] and possibly extra_policies = [...]
replace_pattern = rf'\n?\s+"{re.escape(hostname)}" = \{{[^}}]*\}}\n?'
new_content, count = re.subn(replace_pattern, "\n", content, flags=re.DOTALL)
if count == 0:
return False
approle_path.write_text(new_content)
return True
def find_host_secrets(hostname: str, repo_root: Path) -> list:
"""
Find secrets in terraform/vault/secrets.tf that belong to a host.
Args:
hostname: Hostname to search for
repo_root: Path to repository root
Returns:
List of secret paths found (e.g., ["hosts/hostname/test-service"])
"""
secrets_path = repo_root / "terraform" / "vault" / "secrets.tf"
if not secrets_path.exists():
return []
content = secrets_path.read_text()
# Find all secret paths matching hosts/{hostname}/
pattern = rf'"(hosts/{re.escape(hostname)}/[^"]+)"'
matches = re.findall(pattern, content)
# Return unique paths, preserving order
return list(dict.fromkeys(matches))
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool, bool]:
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool]:
"""
Check which entries exist for a hostname.
@@ -171,12 +110,12 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
repo_root: Path to repository root
Returns:
Tuple of (flake_exists, terraform_vms_exists, vault_generated_exists, approle_exists)
Tuple of (flake_exists, terraform_vms_exists, vault_exists)
"""
# Check flake.nix
flake_path = repo_root / "flake.nix"
flake_content = flake_path.read_text()
flake_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
flake_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
flake_exists = bool(re.search(flake_pattern, flake_content, re.MULTILINE))
# Check terraform/vms.tf
@@ -192,15 +131,7 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
vault_content = vault_tf_path.read_text()
vault_exists = f'"{hostname}"' in vault_content
# Check terraform/vault/approle.tf
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
approle_exists = False
if approle_path.exists():
approle_content = approle_path.read_text()
approle_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
approle_exists = bool(re.search(approle_pattern, approle_content, re.MULTILINE))
return (flake_exists, terraform_exists, vault_exists, approle_exists)
return (flake_exists, terraform_exists, vault_exists)
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
@@ -216,25 +147,32 @@ def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -
content = flake_path.read_text()
# Create new entry
new_entry = f""" {config.hostname} = nixpkgs.lib.nixosSystem {{
inherit system;
specialArgs = {{
inherit inputs self sops-nix;
}};
modules = commonModules ++ [
./hosts/{config.hostname}
];
new_entry = f""" {config.hostname} = nixpkgs.lib.nixosSystem {{
inherit system;
specialArgs = {{
inherit inputs self sops-nix;
}};
modules = [
(
{{ config, pkgs, ... }}:
{{
nixpkgs.overlays = commonOverlays;
}}
)
./hosts/{config.hostname}
sops-nix.nixosModules.sops
];
}};
"""
# Check if hostname already exists
hostname_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem"
hostname_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem"
existing_match = re.search(hostname_pattern, content, re.MULTILINE)
if existing_match and force:
# Replace existing entry
# Match the entire block from "hostname = " to "};"
replace_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
replace_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
new_content, count = re.subn(replace_pattern, new_entry, content, flags=re.MULTILINE | re.DOTALL)
if count == 0:

View File

@@ -18,12 +18,6 @@
tier = "test"; # Start in test tier, move to prod after validation
};
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";

View File

@@ -19,15 +19,8 @@ in
natsUrl = "nats://nats1.home.2rjus.net:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.t-juice.club/torjus/nixos-servers.git";
metrics.enable = true;
};
# Expose metrics for Prometheus scraping
homelab.monitoring.scrapeTargets = [{
job_name = "homelab-deploy";
port = 9972;
}];
# Ensure listener starts after vault secret is available
systemd.services.homelab-deploy-listener = {
after = [ "vault-secret-homelab-deploy-nkey.service" ];

View File

@@ -101,6 +101,11 @@ locals {
]
}
"vaulttest01" = {
paths = [
"secret/data/hosts/vaulttest01/*",
]
}
}
}

View File

@@ -5,22 +5,12 @@
# Each host gets access to its own secrets under hosts/<hostname>/*
locals {
generated_host_policies = {
"testvm01" = {
"vaulttest01" = {
paths = [
"secret/data/hosts/testvm01/*",
"secret/data/hosts/vaulttest01/*",
]
}
"testvm02" = {
paths = [
"secret/data/hosts/testvm02/*",
]
}
"testvm03" = {
paths = [
"secret/data/hosts/testvm03/*",
]
}
}
# Placeholder secrets - user should add actual secrets manually or via tofu
@@ -50,7 +40,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
backend = vault_auth_backend.approle.path
role_name = each.key
token_policies = ["host-${each.key}", "homelab-deploy"]
token_policies = ["host-${each.key}"]
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
token_ttl = 3600
token_max_ttl = 3600

View File

@@ -45,6 +45,12 @@ locals {
password_length = 24
}
# TODO: Remove after testing
"hosts/vaulttest01/test-service" = {
auto_generate = true
password_length = 32
}
# Shared backup password (auto-generated, add alongside existing restic key)
"shared/backup/password" = {
auto_generate = true

View File

@@ -31,6 +31,13 @@ locals {
# Example Minimal VM using all defaults (uncomment to deploy):
# "minimal-vm" = {}
# "bootstrap-verify-test" = {}
"testvm01" = {
ip = "10.69.13.101/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "pipeline-testing-improvements"
}
"vault01" = {
ip = "10.69.13.19/24"
cpu_cores = 2
@@ -38,29 +45,13 @@ locals {
disk_size = "20G"
flake_branch = "vault-setup" # Bootstrap from this branch instead of master
}
"testvm01" = {
ip = "10.69.13.20/24"
"vaulttest01" = {
ip = "10.69.13.150/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "deploy-test-hosts"
vault_wrapped_token = "s.YRGRpAZVVtSYEa3wOYOqFmjt"
}
"testvm02" = {
ip = "10.69.13.21/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "deploy-test-hosts"
vault_wrapped_token = "s.tvs8yhJOkLjBs548STs6DBw7"
}
"testvm03" = {
ip = "10.69.13.22/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "deploy-test-hosts"
vault_wrapped_token = "s.sQ80FZGeG3z6jgrsuh74IopC"
flake_branch = "pki-migration"
vault_wrapped_token = "s.UCpQCOp7cOKDdtGGBvfRWwAt"
}
}