22 Commits

Author SHA1 Message Date
13d6d0ea3a Merge pull request 'improve-bootstrap-visibility' (#29) from improve-bootstrap-visibility into master
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Reviewed-on: #29
2026-02-07 15:00:09 +00:00
eea000b337 CLAUDE.md: document bootstrap logs in Loki
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Run nix flake check / flake-check (pull_request) Failing after 4s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:57:51 +01:00
f19ba2f4b6 CLAUDE.md: use tofu -chdir instead of cd
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:41:59 +01:00
a90d9c33d5 CLAUDE.md: prefer nix develop -c for devshell commands
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:39:56 +01:00
09c9df1bbe terraform: regenerate wrapped token for testvm01
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:36:25 +01:00
ae3039af19 template2: send bootstrap status to Loki for remote monitoring
Adds log_to_loki function that pushes structured log entries to Loki
at key bootstrap stages (starting, network_ok, vault_*, building,
success, failed). Enables querying bootstrap state via LogQL without
console access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:34:47 +01:00
11261c4636 template2: revert to journal+console output for bootstrap
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
TTY output was causing nixos-rebuild to fail. Keep the custom
greeting line to indicate bootstrap image, but use journal+console
for reliable logging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:24:39 +01:00
4ca3c8890f terraform: add flake_branch and token for testvm01
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:14:57 +01:00
78e8d7a600 template2: add ncurses for clear command in bootstrap
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:10:25 +01:00
0cf72ec191 terraform: update template to nixos-25.11.20260203.e576e3c
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 15:02:16 +01:00
6a3a51407e playbooks: auto-update terraform template name after deploy
Add a third play to build-and-deploy-template.yml that updates
terraform/variables.tf with the new template name after deploying
to Proxmox. Only updates if the template name has changed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 14:59:13 +01:00
a1ae766eb8 template2: show bootstrap progress on tty1
- Display bootstrap banner and live progress on tty1 instead of login prompt
- Add custom getty greeting on other ttys indicating this is a bootstrap image
- Disable getty on tty1 during bootstrap so output is visible

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 14:49:58 +01:00
11999b37f3 flake: update homelab-deploy
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Fixes false "Some deployments failed" warning in MCP server when
deployments are still in progress.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 14:24:41 +01:00
29b2b7db52 Merge branch 'deploy-test-hosts'
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Add three permanent test hosts (testvm01, testvm02, testvm03) with:
- Static IPs: 10.69.13.20-22
- Vault AppRole integration with homelab-deploy policy
- Remote deployment via NATS (homelab.deploy.enable)
- Test tier configuration

Also updates create-host template to include vault.enable and
homelab.deploy.enable by default.
2026-02-07 14:09:40 +01:00
b046a1b862 terraform: remove flake_branch from test VMs
VMs are now bootstrapped and running. Remove temporary flake_branch
and vault_wrapped_token settings so they use master going forward.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 14:09:30 +01:00
38348c5980 vault: add homelab-deploy policy to generated hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
The homelab-deploy listener requires access to shared/homelab-deploy/*
secrets. Update hosts-generated.tf and the generator script to include
this policy automatically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 14:05:42 +01:00
370cf2b03a hosts: enable vault and deploy listener on test VMs
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
- Add vault.enable = true to testvm01, testvm02, testvm03
- Add homelab.deploy.enable = true for remote deployment via NATS
- Update create-host template to include these by default

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 13:55:33 +01:00
7bc465b414 hosts: add testvm01, testvm02, testvm03 test hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Three permanent test hosts for validating deployment and bootstrapping
workflow. Each host configured with:
- Static IP (10.69.13.20-22/24)
- Vault AppRole integration
- Bootstrap from deploy-test-hosts branch

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 13:34:16 +01:00
8d7bc50108 hosts: remove testvm01
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Test host no longer needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 12:58:24 +01:00
03e70ac094 hosts: remove vaulttest01
Test host no longer needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 12:55:38 +01:00
3b32c9479f create-host: add approle removal and secrets detection
- Remove host entries from terraform/vault/approle.tf on --remove
- Detect and warn about secrets in terraform/vault/secrets.tf
- Include vault kv delete commands in removal instructions
- Update check_entries_exist to return approle status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 12:54:42 +01:00
b0d35f9a99 create-host: fix flake.nix indentation patterns
The regex patterns expected 6 spaces of indentation but flake.nix uses
8 spaces for host entries. Also updated generated entry template to
match current flake.nix style (using commonModules ++).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 12:48:29 +01:00
20 changed files with 508 additions and 223 deletions

View File

@@ -61,10 +61,31 @@ Do not run `nix flake update`. Should only be done manually by user.
### Development Environment ### Development Environment
```bash ```bash
# Enter development shell (provides ansible, python3) # Enter development shell
nix develop nix develop
``` ```
The devshell provides: `ansible`, `tofu` (OpenTofu), `vault` (OpenBao CLI), `create-host`, and `homelab-deploy`.
**Important:** When suggesting commands that use devshell tools, always use `nix develop -c <command>` syntax rather than assuming the user is already in a devshell. For example:
```bash
# Good - works regardless of current shell
nix develop -c tofu plan
# Avoid - requires user to be in devshell
tofu plan
```
**OpenTofu:** Use the `-chdir` option instead of `cd` when running tofu commands in subdirectories:
```bash
# Good - uses -chdir option
nix develop -c tofu -chdir=terraform plan
nix develop -c tofu -chdir=terraform/vault apply
# Avoid - changing directories
cd terraform && tofu plan
```
### Secrets Management ### Secrets Management
Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the
@@ -140,11 +161,27 @@ The **lab-monitoring** MCP server can query logs from Loki. All hosts ship syste
- `host` - Hostname (e.g., `ns1`, `ns2`, `monitoring01`, `ha1`). Use this label, not `hostname`. - `host` - Hostname (e.g., `ns1`, `ns2`, `monitoring01`, `ha1`). Use this label, not `hostname`.
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `prometheus.service`, `nixos-upgrade.service`) - `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `prometheus.service`, `nixos-upgrade.service`)
- `job` - Either `systemd-journal` (most logs) or `varlog` (file-based logs like caddy access logs) - `job` - Either `systemd-journal` (most logs), `varlog` (file-based logs), or `bootstrap` (VM bootstrap logs)
- `filename` - For `varlog` job, the log file path (e.g., `/var/log/caddy/nix-cache.log`) - `filename` - For `varlog` job, the log file path (e.g., `/var/log/caddy/nix-cache.log`)
Journal log entries are JSON-formatted with the actual log message in the `MESSAGE` field. Other useful fields include `PRIORITY` and `SYSLOG_IDENTIFIER`. Journal log entries are JSON-formatted with the actual log message in the `MESSAGE` field. Other useful fields include `PRIORITY` and `SYSLOG_IDENTIFIER`.
**Bootstrap Logs:**
VMs provisioned from template2 send bootstrap progress directly to Loki via curl (before promtail is available). These logs use `job="bootstrap"` with additional labels:
- `host` - Target hostname
- `branch` - Git branch being deployed
- `stage` - Bootstrap stage: `starting`, `network_ok`, `vault_ok`/`vault_skip`/`vault_warn`, `building`, `success`, `failed`
Query bootstrap status:
```
{job="bootstrap"} # All bootstrap logs
{job="bootstrap", host="testvm01"} # Specific host
{job="bootstrap", stage="failed"} # All failures
{job="bootstrap", stage=~"building|success"} # Track build progress
```
**Example LogQL queries:** **Example LogQL queries:**
``` ```
# Logs from a specific service on a host # Logs from a specific service on a host

8
flake.lock generated
View File

@@ -28,11 +28,11 @@
] ]
}, },
"locked": { "locked": {
"lastModified": 1770447502, "lastModified": 1770470613,
"narHash": "sha256-xH1PNyE3ydj4udhe1IpK8VQxBPZETGLuORZdSWYRmSU=", "narHash": "sha256-FOqOBdsQnLzA7+Y3hXuN9PcOHmpEDvNw64Ju8op9J2w=",
"ref": "master", "ref": "master",
"rev": "79db119d1ca6630023947ef0a65896cc3307c2ff", "rev": "36a74b8cf9a873202e28eecfb90f9af8650cca8b",
"revCount": 22, "revCount": 23,
"type": "git", "type": "git",
"url": "https://git.t-juice.club/torjus/homelab-deploy" "url": "https://git.t-juice.club/torjus/homelab-deploy"
}, },

View File

@@ -186,15 +186,6 @@
./hosts/nats1 ./hosts/nats1
]; ];
}; };
testvm01 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = commonModules ++ [
./hosts/testvm01
];
};
vault01 = nixpkgs.lib.nixosSystem { vault01 = nixpkgs.lib.nixosSystem {
inherit system; inherit system;
specialArgs = { specialArgs = {
@@ -204,13 +195,31 @@
./hosts/vault01 ./hosts/vault01
]; ];
}; };
vaulttest01 = nixpkgs.lib.nixosSystem { testvm01 = nixpkgs.lib.nixosSystem {
inherit system; inherit system;
specialArgs = { specialArgs = {
inherit inputs self sops-nix; inherit inputs self sops-nix;
}; };
modules = commonModules ++ [ modules = commonModules ++ [
./hosts/vaulttest01 ./hosts/testvm01
];
};
testvm02 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = commonModules ++ [
./hosts/testvm02
];
};
testvm03 = nixpkgs.lib.nixosSystem {
inherit system;
specialArgs = {
inherit inputs self sops-nix;
};
modules = commonModules ++ [
./hosts/testvm03
]; ];
}; };
}; };

View File

@@ -6,22 +6,72 @@ let
text = '' text = ''
set -euo pipefail set -euo pipefail
LOKI_URL="http://monitoring01.home.2rjus.net:3100/loki/api/v1/push"
# Send a log entry to Loki with bootstrap status
# Usage: log_to_loki <stage> <message>
# Fails silently if Loki is unreachable
log_to_loki() {
local stage="$1"
local message="$2"
local timestamp_ns
timestamp_ns="$(date +%s)000000000"
local payload
payload=$(jq -n \
--arg host "$HOSTNAME" \
--arg stage "$stage" \
--arg branch "''${BRANCH:-master}" \
--arg ts "$timestamp_ns" \
--arg msg "$message" \
'{
streams: [{
stream: {
job: "bootstrap",
host: $host,
stage: $stage,
branch: $branch
},
values: [[$ts, $msg]]
}]
}')
curl -s --connect-timeout 2 --max-time 5 \
-X POST \
-H "Content-Type: application/json" \
-d "$payload" \
"$LOKI_URL" >/dev/null 2>&1 || true
}
echo "================================================================================"
echo " NIXOS BOOTSTRAP IN PROGRESS"
echo "================================================================================"
echo ""
# Read hostname set by cloud-init (from Terraform VM name via user-data) # Read hostname set by cloud-init (from Terraform VM name via user-data)
# Cloud-init sets the system hostname from user-data.txt, so we read it from hostnamectl # Cloud-init sets the system hostname from user-data.txt, so we read it from hostnamectl
HOSTNAME=$(hostnamectl hostname) HOSTNAME=$(hostnamectl hostname)
echo "DEBUG: Hostname from hostnamectl: '$HOSTNAME'" # Read git branch from environment, default to master
BRANCH="''${NIXOS_FLAKE_BRANCH:-master}"
echo "Hostname: $HOSTNAME"
echo ""
echo "Starting NixOS bootstrap for host: $HOSTNAME" echo "Starting NixOS bootstrap for host: $HOSTNAME"
log_to_loki "starting" "Bootstrap starting for $HOSTNAME (branch: $BRANCH)"
echo "Waiting for network connectivity..." echo "Waiting for network connectivity..."
# Verify we can reach the git server via HTTPS (doesn't respond to ping) # Verify we can reach the git server via HTTPS (doesn't respond to ping)
if ! curl -s --connect-timeout 5 --max-time 10 https://git.t-juice.club >/dev/null 2>&1; then if ! curl -s --connect-timeout 5 --max-time 10 https://git.t-juice.club >/dev/null 2>&1; then
echo "ERROR: Cannot reach git.t-juice.club via HTTPS" echo "ERROR: Cannot reach git.t-juice.club via HTTPS"
echo "Check network configuration and DNS settings" echo "Check network configuration and DNS settings"
log_to_loki "failed" "Network check failed - cannot reach git.t-juice.club"
exit 1 exit 1
fi fi
echo "Network connectivity confirmed" echo "Network connectivity confirmed"
log_to_loki "network_ok" "Network connectivity confirmed"
# Unwrap Vault token and store AppRole credentials (if provided) # Unwrap Vault token and store AppRole credentials (if provided)
if [ -n "''${VAULT_WRAPPED_TOKEN:-}" ]; then if [ -n "''${VAULT_WRAPPED_TOKEN:-}" ]; then
@@ -50,6 +100,7 @@ let
chmod 600 /var/lib/vault/approle/secret-id chmod 600 /var/lib/vault/approle/secret-id
echo "Vault credentials unwrapped and stored successfully" echo "Vault credentials unwrapped and stored successfully"
log_to_loki "vault_ok" "Vault credentials unwrapped and stored"
else else
echo "WARNING: Failed to unwrap Vault token" echo "WARNING: Failed to unwrap Vault token"
if [ -n "$UNWRAP_RESPONSE" ]; then if [ -n "$UNWRAP_RESPONSE" ]; then
@@ -63,17 +114,17 @@ let
echo "To regenerate token, run: create-host --hostname $HOSTNAME --force" echo "To regenerate token, run: create-host --hostname $HOSTNAME --force"
echo "" echo ""
echo "Vault secrets will not be available, but continuing bootstrap..." echo "Vault secrets will not be available, but continuing bootstrap..."
log_to_loki "vault_warn" "Failed to unwrap Vault token - continuing without secrets"
fi fi
else else
echo "No Vault wrapped token provided (VAULT_WRAPPED_TOKEN not set)" echo "No Vault wrapped token provided (VAULT_WRAPPED_TOKEN not set)"
echo "Skipping Vault credential setup" echo "Skipping Vault credential setup"
log_to_loki "vault_skip" "No Vault token provided - skipping credential setup"
fi fi
echo "Fetching and building NixOS configuration from flake..." echo "Fetching and building NixOS configuration from flake..."
# Read git branch from environment, default to master
BRANCH="''${NIXOS_FLAKE_BRANCH:-master}"
echo "Using git branch: $BRANCH" echo "Using git branch: $BRANCH"
log_to_loki "building" "Starting nixos-rebuild boot"
# Build and activate the host-specific configuration # Build and activate the host-specific configuration
FLAKE_URL="git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#''${HOSTNAME}" FLAKE_URL="git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#''${HOSTNAME}"
@@ -81,18 +132,30 @@ let
if nixos-rebuild boot --flake "$FLAKE_URL"; then if nixos-rebuild boot --flake "$FLAKE_URL"; then
echo "Successfully built configuration for $HOSTNAME" echo "Successfully built configuration for $HOSTNAME"
echo "Rebooting into new configuration..." echo "Rebooting into new configuration..."
log_to_loki "success" "Build successful - rebooting into new configuration"
sleep 2 sleep 2
systemctl reboot systemctl reboot
else else
echo "ERROR: nixos-rebuild failed for $HOSTNAME" echo "ERROR: nixos-rebuild failed for $HOSTNAME"
echo "Check that flake has configuration for this hostname" echo "Check that flake has configuration for this hostname"
echo "Manual intervention required - system will not reboot" echo "Manual intervention required - system will not reboot"
log_to_loki "failed" "nixos-rebuild failed - manual intervention required"
exit 1 exit 1
fi fi
''; '';
}; };
in in
{ {
# Custom greeting line to indicate this is a bootstrap image
services.getty.greetingLine = lib.mkForce ''
================================================================================
BOOTSTRAP IMAGE - NixOS \V (\l)
================================================================================
Bootstrap service is running. Logs are displayed on tty1.
Check status: journalctl -fu nixos-bootstrap
'';
systemd.services."nixos-bootstrap" = { systemd.services."nixos-bootstrap" = {
description = "Bootstrap NixOS configuration from flake on first boot"; description = "Bootstrap NixOS configuration from flake on first boot";
@@ -107,12 +170,12 @@ in
serviceConfig = { serviceConfig = {
Type = "oneshot"; Type = "oneshot";
RemainAfterExit = true; RemainAfterExit = true;
ExecStart = "${bootstrap-script}/bin/nixos-bootstrap"; ExecStart = lib.getExe bootstrap-script;
# Read environment variables from cloud-init (set by cloud-init write_files) # Read environment variables from cloud-init (set by cloud-init write_files)
EnvironmentFile = "-/run/cloud-init-env"; EnvironmentFile = "-/run/cloud-init-env";
# Logging to journald # Log to journal and console
StandardOutput = "journal+console"; StandardOutput = "journal+console";
StandardError = "journal+console"; StandardError = "journal+console";
}; };

View File

@@ -13,14 +13,17 @@
../../common/vm ../../common/vm
]; ];
# Test VM - exclude from DNS zone generation # Host metadata (adjust as needed)
homelab.dns.enable = false;
homelab.host = { homelab.host = {
tier = "test"; tier = "test"; # Start in test tier, move to prod after validation
priority = "low";
}; };
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true; boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda"; boot.loader.grub.device = "/dev/vda";
@@ -29,7 +32,7 @@
networking.domain = "home.2rjus.net"; networking.domain = "home.2rjus.net";
networking.useNetworkd = true; networking.useNetworkd = true;
networking.useDHCP = false; networking.useDHCP = false;
services.resolved.enable = false; services.resolved.enable = true;
networking.nameservers = [ networking.nameservers = [
"10.69.13.5" "10.69.13.5"
"10.69.13.6" "10.69.13.6"
@@ -39,7 +42,7 @@
systemd.network.networks."ens18" = { systemd.network.networks."ens18" = {
matchConfig.Name = "ens18"; matchConfig.Name = "ens18";
address = [ address = [
"10.69.13.101/24" "10.69.13.20/24"
]; ];
routes = [ routes = [
{ Gateway = "10.69.13.1"; } { Gateway = "10.69.13.1"; }

View File

@@ -0,0 +1,72 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
# Host metadata (adjust as needed)
homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation
};
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "testvm02";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.21/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,72 @@
{
config,
lib,
pkgs,
...
}:
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
# Host metadata (adjust as needed)
homelab.host = {
tier = "test"; # Start in test tier, move to prod after validation
};
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "testvm03";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.22/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -0,0 +1,5 @@
{ ... }: {
imports = [
./configuration.nix
];
}

View File

@@ -1,135 +0,0 @@
{
config,
lib,
pkgs,
...
}:
let
vault-test-script = pkgs.writeShellApplication {
name = "vault-test";
text = ''
echo "=== Vault Secret Test ==="
echo "Secret path: hosts/vaulttest01/test-service"
if [ -f /run/secrets/test-service/password ]; then
echo " Password file exists"
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
else
echo " Password file missing!"
exit 1
fi
if [ -d /var/lib/vault/cache/test-service ]; then
echo " Cache directory exists"
else
echo " Cache directory missing!"
exit 1
fi
echo "Test successful!"
'';
};
in
{
imports = [
../template2/hardware-configuration.nix
../../system
../../common/vm
];
homelab.host = {
tier = "test";
priority = "low";
role = "vault";
};
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";
networking.hostName = "vaulttest01";
networking.domain = "home.2rjus.net";
networking.useNetworkd = true;
networking.useDHCP = false;
services.resolved.enable = true;
networking.nameservers = [
"10.69.13.5"
"10.69.13.6"
];
systemd.network.enable = true;
systemd.network.networks."ens18" = {
matchConfig.Name = "ens18";
address = [
"10.69.13.150/24"
];
routes = [
{ Gateway = "10.69.13.1"; }
];
linkConfig.RequiredForOnline = "routable";
};
time.timeZone = "Europe/Oslo";
nix.settings.experimental-features = [
"nix-command"
"flakes"
];
nix.settings.tarball-ttl = 0;
environment.systemPackages = with pkgs; [
vim
wget
git
htop # test deploy verification
];
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
networking.firewall.enable = false;
# Testing config
# Enable Vault secrets management
vault.enable = true;
homelab.deploy.enable = true;
# Define a test secret
vault.secrets.test-service = {
secretPath = "hosts/vaulttest01/test-service";
restartTrigger = true;
restartInterval = "daily";
services = [ "vault-test" ];
};
# Create a test service that uses the secret
systemd.services.vault-test = {
description = "Test Vault secret fetching";
wantedBy = [ "multi-user.target" ];
after = [ "vault-secret-test-service.service" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
ExecStart = lib.getExe vault-test-script;
StandardOutput = "journal+console";
};
};
# Test ACME certificate issuance from OpenBao PKI
# Override the global ACME server (from system/acme.nix) to use OpenBao instead of step-ca
security.acme.defaults.server = lib.mkForce "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory";
# Request a certificate for this host
# Using HTTP-01 challenge with standalone listener on port 80
security.acme.certs."vaulttest01.home.2rjus.net" = {
listenHTTP = ":80";
enableDebugLogs = true;
};
system.stateVersion = "25.11"; # Did you read the comment?
}

View File

@@ -99,3 +99,48 @@
- name: Display success message - name: Display success message
ansible.builtin.debug: ansible.builtin.debug:
msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}" msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}"
- name: Update Terraform template name
hosts: localhost
gather_facts: false
vars:
terraform_dir: "{{ playbook_dir }}/../terraform"
tasks:
- name: Get image filename from earlier play
ansible.builtin.set_fact:
image_filename: "{{ hostvars['localhost']['image_filename'] }}"
- name: Extract template name from image filename
ansible.builtin.set_fact:
new_template_name: "{{ image_filename | regex_replace('\\.vma\\.zst$', '') | regex_replace('^vzdump-qemu-', '') }}"
- name: Read current Terraform variables file
ansible.builtin.slurp:
src: "{{ terraform_dir }}/variables.tf"
register: variables_tf_content
- name: Extract current template name from variables.tf
ansible.builtin.set_fact:
current_template_name: "{{ (variables_tf_content.content | b64decode) | regex_search('variable \"default_template_name\"[^}]+default\\s*=\\s*\"([^\"]+)\"', '\\1') | first }}"
- name: Check if template name has changed
ansible.builtin.set_fact:
template_name_changed: "{{ current_template_name != new_template_name }}"
- name: Display template name status
ansible.builtin.debug:
msg: "Template name: {{ current_template_name }} -> {{ new_template_name }} ({{ 'changed' if template_name_changed else 'unchanged' }})"
- name: Update default_template_name in variables.tf
ansible.builtin.replace:
path: "{{ terraform_dir }}/variables.tf"
regexp: '(variable "default_template_name"[^}]+default\s*=\s*)"[^"]+"'
replace: '\1"{{ new_template_name }}"'
when: template_name_changed
- name: Display update result
ansible.builtin.debug:
msg: "Updated terraform/variables.tf with new template name: {{ new_template_name }}"
when: template_name_changed

View File

@@ -18,6 +18,8 @@ from manipulators import (
remove_from_flake_nix, remove_from_flake_nix,
remove_from_terraform_vms, remove_from_terraform_vms,
remove_from_vault_terraform, remove_from_vault_terraform,
remove_from_approle_tf,
find_host_secrets,
check_entries_exist, check_entries_exist,
) )
from models import HostConfig from models import HostConfig
@@ -255,7 +257,10 @@ def handle_remove(
sys.exit(1) sys.exit(1)
# Check what entries exist # Check what entries exist
flake_exists, terraform_exists, vault_exists = check_entries_exist(hostname, repo_root) flake_exists, terraform_exists, vault_exists, approle_exists = check_entries_exist(hostname, repo_root)
# Check for secrets in secrets.tf
host_secrets = find_host_secrets(hostname, repo_root)
# Collect all files in the host directory recursively # Collect all files in the host directory recursively
files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()]) files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()])
@@ -294,6 +299,21 @@ def handle_remove(
else: else:
console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]") console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]")
if approle_exists:
console.print(f' • terraform/vault/approle.tf (host_policies["{hostname}"])')
else:
console.print(f" • terraform/vault/approle.tf [dim](not found)[/dim]")
# Warn about secrets in secrets.tf
if host_secrets:
console.print(f"\n[yellow]⚠️ Warning: Found {len(host_secrets)} secret(s) in terraform/vault/secrets.tf:[/yellow]")
for secret_path in host_secrets:
console.print(f'"{secret_path}"')
console.print(f"\n [yellow]These will NOT be removed automatically.[/yellow]")
console.print(f" After removal, manually edit secrets.tf and run:")
for secret_path in host_secrets:
console.print(f" [white]vault kv delete secret/{secret_path}[/white]")
# Warn about secrets directory # Warn about secrets directory
if secrets_exist: if secrets_exist:
console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists and will NOT be deleted[/yellow]") console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists and will NOT be deleted[/yellow]")
@@ -323,6 +343,13 @@ def handle_remove(
else: else:
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf") console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf")
# Remove from terraform/vault/approle.tf
if approle_exists:
if remove_from_approle_tf(hostname, repo_root):
console.print("[green]✓[/green] Removed from terraform/vault/approle.tf")
else:
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/approle.tf")
# Remove from terraform/vms.tf # Remove from terraform/vms.tf
if terraform_exists: if terraform_exists:
if remove_from_terraform_vms(hostname, repo_root): if remove_from_terraform_vms(hostname, repo_root):
@@ -345,19 +372,34 @@ def handle_remove(
console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n") console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n")
# Display next steps # Display next steps
display_removal_next_steps(hostname, vault_exists) display_removal_next_steps(hostname, vault_exists, approle_exists, host_secrets)
def display_removal_next_steps(hostname: str, had_vault: bool) -> None: def display_removal_next_steps(hostname: str, had_vault: bool, had_approle: bool, host_secrets: list) -> None:
"""Display next steps after successful removal.""" """Display next steps after successful removal."""
vault_file = " terraform/vault/hosts-generated.tf" if had_vault else "" vault_files = ""
vault_apply = ""
if had_vault: if had_vault:
vault_files += " terraform/vault/hosts-generated.tf"
if had_approle:
vault_files += " terraform/vault/approle.tf"
vault_apply = ""
if had_vault or had_approle:
vault_apply = f""" vault_apply = f"""
3. Apply Vault changes: 3. Apply Vault changes:
[white]cd terraform/vault && tofu apply[/white] [white]cd terraform/vault && tofu apply[/white]
""" """
secrets_cleanup = ""
if host_secrets:
secrets_cleanup = f"""
5. Clean up secrets (manual):
Edit terraform/vault/secrets.tf to remove entries for {hostname}
Then delete from Vault:"""
for secret_path in host_secrets:
secrets_cleanup += f"\n [white]vault kv delete secret/{secret_path}[/white]"
secrets_cleanup += "\n"
next_steps = f"""[bold cyan]Next Steps:[/bold cyan] next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
1. Review changes: 1. Review changes:
@@ -367,9 +409,9 @@ def display_removal_next_steps(hostname: str, had_vault: bool) -> None:
[white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white] [white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white]
{vault_apply} {vault_apply}
4. Commit changes: 4. Commit changes:
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_file} [white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_files}
git commit -m "hosts: remove {hostname}"[/white] git commit -m "hosts: remove {hostname}"[/white]
""" {secrets_cleanup}"""
console.print(Panel(next_steps, border_style="cyan")) console.print(Panel(next_steps, border_style="cyan"))

View File

@@ -144,7 +144,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
backend = vault_auth_backend.approle.path backend = vault_auth_backend.approle.path
role_name = each.key role_name = each.key
token_policies = ["host-\${each.key}"] token_policies = ["host-\${each.key}", "homelab-deploy"]
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit) secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
token_ttl = 3600 token_ttl = 3600
token_max_ttl = 3600 token_max_ttl = 3600

View File

@@ -101,7 +101,68 @@ def remove_from_vault_terraform(hostname: str, repo_root: Path) -> bool:
return True return True
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool]: def remove_from_approle_tf(hostname: str, repo_root: Path) -> bool:
"""
Remove host entry from terraform/vault/approle.tf locals.host_policies.
Args:
hostname: Hostname to remove
repo_root: Path to repository root
Returns:
True if found and removed, False if not found
"""
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
if not approle_path.exists():
return False
content = approle_path.read_text()
# Check if hostname exists in host_policies
hostname_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
if not re.search(hostname_pattern, content, re.MULTILINE):
return False
# Match the entire block from "hostname" = { to closing }
# The block contains paths = [ ... ] and possibly extra_policies = [...]
replace_pattern = rf'\n?\s+"{re.escape(hostname)}" = \{{[^}}]*\}}\n?'
new_content, count = re.subn(replace_pattern, "\n", content, flags=re.DOTALL)
if count == 0:
return False
approle_path.write_text(new_content)
return True
def find_host_secrets(hostname: str, repo_root: Path) -> list:
"""
Find secrets in terraform/vault/secrets.tf that belong to a host.
Args:
hostname: Hostname to search for
repo_root: Path to repository root
Returns:
List of secret paths found (e.g., ["hosts/hostname/test-service"])
"""
secrets_path = repo_root / "terraform" / "vault" / "secrets.tf"
if not secrets_path.exists():
return []
content = secrets_path.read_text()
# Find all secret paths matching hosts/{hostname}/
pattern = rf'"(hosts/{re.escape(hostname)}/[^"]+)"'
matches = re.findall(pattern, content)
# Return unique paths, preserving order
return list(dict.fromkeys(matches))
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool, bool]:
""" """
Check which entries exist for a hostname. Check which entries exist for a hostname.
@@ -110,7 +171,7 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
repo_root: Path to repository root repo_root: Path to repository root
Returns: Returns:
Tuple of (flake_exists, terraform_vms_exists, vault_exists) Tuple of (flake_exists, terraform_vms_exists, vault_generated_exists, approle_exists)
""" """
# Check flake.nix # Check flake.nix
flake_path = repo_root / "flake.nix" flake_path = repo_root / "flake.nix"
@@ -131,7 +192,15 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
vault_content = vault_tf_path.read_text() vault_content = vault_tf_path.read_text()
vault_exists = f'"{hostname}"' in vault_content vault_exists = f'"{hostname}"' in vault_content
return (flake_exists, terraform_exists, vault_exists) # Check terraform/vault/approle.tf
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
approle_exists = False
if approle_path.exists():
approle_content = approle_path.read_text()
approle_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
approle_exists = bool(re.search(approle_pattern, approle_content, re.MULTILINE))
return (flake_exists, terraform_exists, vault_exists, approle_exists)
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None: def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
@@ -152,15 +221,8 @@ def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -
specialArgs = {{ specialArgs = {{
inherit inputs self sops-nix; inherit inputs self sops-nix;
}}; }};
modules = [ modules = commonModules ++ [
(
{{ config, pkgs, ... }}:
{{
nixpkgs.overlays = commonOverlays;
}}
)
./hosts/{config.hostname} ./hosts/{config.hostname}
sops-nix.nixosModules.sops
]; ];
}}; }};
""" """

View File

@@ -18,6 +18,12 @@
tier = "test"; # Start in test tier, move to prod after validation tier = "test"; # Start in test tier, move to prod after validation
}; };
# Enable Vault integration
vault.enable = true;
# Enable remote deployment via NATS
homelab.deploy.enable = true;
nixpkgs.config.allowUnfree = true; nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true; boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda"; boot.loader.grub.device = "/dev/vda";

View File

@@ -33,7 +33,7 @@ variable "default_target_node" {
variable "default_template_name" { variable "default_template_name" {
description = "Default template VM name to clone from" description = "Default template VM name to clone from"
type = string type = string
default = "nixos-25.11.20260131.41e216c" default = "nixos-25.11.20260203.e576e3c"
} }
variable "default_ssh_public_key" { variable "default_ssh_public_key" {

View File

@@ -101,11 +101,6 @@ locals {
] ]
} }
"vaulttest01" = {
paths = [
"secret/data/hosts/vaulttest01/*",
]
}
} }
} }

View File

@@ -5,9 +5,19 @@
# Each host gets access to its own secrets under hosts/<hostname>/* # Each host gets access to its own secrets under hosts/<hostname>/*
locals { locals {
generated_host_policies = { generated_host_policies = {
"vaulttest01" = { "testvm01" = {
paths = [ paths = [
"secret/data/hosts/vaulttest01/*", "secret/data/hosts/testvm01/*",
]
}
"testvm02" = {
paths = [
"secret/data/hosts/testvm02/*",
]
}
"testvm03" = {
paths = [
"secret/data/hosts/testvm03/*",
] ]
} }
@@ -40,7 +50,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
backend = vault_auth_backend.approle.path backend = vault_auth_backend.approle.path
role_name = each.key role_name = each.key
token_policies = ["host-${each.key}"] token_policies = ["host-${each.key}", "homelab-deploy"]
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit) secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
token_ttl = 3600 token_ttl = 3600
token_max_ttl = 3600 token_max_ttl = 3600

View File

@@ -45,12 +45,6 @@ locals {
password_length = 24 password_length = 24
} }
# TODO: Remove after testing
"hosts/vaulttest01/test-service" = {
auto_generate = true
password_length = 32
}
# Shared backup password (auto-generated, add alongside existing restic key) # Shared backup password (auto-generated, add alongside existing restic key)
"shared/backup/password" = { "shared/backup/password" = {
auto_generate = true auto_generate = true

View File

@@ -31,13 +31,6 @@ locals {
# Example Minimal VM using all defaults (uncomment to deploy): # Example Minimal VM using all defaults (uncomment to deploy):
# "minimal-vm" = {} # "minimal-vm" = {}
# "bootstrap-verify-test" = {} # "bootstrap-verify-test" = {}
"testvm01" = {
ip = "10.69.13.101/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "pipeline-testing-improvements"
}
"vault01" = { "vault01" = {
ip = "10.69.13.19/24" ip = "10.69.13.19/24"
cpu_cores = 2 cpu_cores = 2
@@ -45,13 +38,25 @@ locals {
disk_size = "20G" disk_size = "20G"
flake_branch = "vault-setup" # Bootstrap from this branch instead of master flake_branch = "vault-setup" # Bootstrap from this branch instead of master
} }
"vaulttest01" = { "testvm01" = {
ip = "10.69.13.150/24" ip = "10.69.13.20/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
flake_branch = "improve-bootstrap-visibility"
vault_wrapped_token = "s.l5q88wzXfEcr5SMDHmO6o96b"
}
"testvm02" = {
ip = "10.69.13.21/24"
cpu_cores = 2
memory = 2048
disk_size = "20G"
}
"testvm03" = {
ip = "10.69.13.22/24"
cpu_cores = 2 cpu_cores = 2
memory = 2048 memory = 2048
disk_size = "20G" disk_size = "20G"
flake_branch = "pki-migration"
vault_wrapped_token = "s.UCpQCOp7cOKDdtGGBvfRWwAt"
} }
} }