Compare commits

30 Commits

Author SHA1 Message Date
713d1e7584 chore: migrate module path from git.t-juice.club to code.t-juice.club
Gitea to Forgejo host migration — update Go module path and all
import references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 19:37:47 +01:00
2d26de5055 fix(metrics): adjust build duration histogram buckets for better resolution
Add lower buckets (5s, 10s) for cached builds and higher buckets (7200s, 14400s)
for cold builds up to the 4h timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 09:03:15 +01:00
e5e8be86ec Merge pull request 'feat(builder): log build failure output as separate lines' (#3) from feat/builder-logging into master
Reviewed-on: #3
2026-02-13 17:35:23 +00:00
3ac5d9777f feat(builder): log build failure output as separate lines
Log each line of build failure output as a separate structured log entry
at WARN level, making output readable and queryable in Loki/Grafana.
Add repo and rev fields to all build-related log entries. Add
truncateOutputLines helper that returns a []string for per-line logging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:34:21 +01:00
1a23847d31 fix(builder): separate build output from error to preserve timeout messages
When a build timed out, the timeout error was silently replaced by
truncated stderr output. Split into separate Error and Output fields
on BuildHostResult so the cause (e.g. "build timed out after 30m0s")
is always visible in logs and CLI output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 13:24:04 +01:00
c13914bf5a fix(builder): truncate large error output to prevent log overflow
Build errors from nix can be very large (100k+ chars). This truncates
error output to the first 50 and last 50 lines when it exceeds 100
lines, preventing journal and NATS message overflow.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 00:42:13 +01:00
a8aab16d0e Merge pull request 'feat: add builder mode for centralized Nix builds' (#2) from feat/builder into master
Reviewed-on: #2
2026-02-10 21:16:05 +00:00
00899489ac feat(nixos): add settings option for builder config
Allow defining builder repository configuration directly in Nix using
the `settings.repos` option, which is more idiomatic for NixOS modules.

Users can now choose between:
- `settings.repos` - Define repos in Nix (recommended)
- `configFile` - Point to an external YAML file

The module generates a YAML config file from settings when configFile
is not specified. An assertion ensures at least one method is used.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-10 22:13:33 +01:00
c52e88ca7e fix: add validation for config and reply subjects
Address medium severity security issues:

- Validate repo names in config only allow alphanumeric, dash, underscore
  (prevents NATS subject injection via dots or wildcards)
- Validate repo URLs must start with git+https://, git+ssh://, or git+file://
- Validate ReplyTo field must start with "build.responses." to prevent
  publishing responses to arbitrary NATS subjects

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-10 22:09:51 +01:00
08f1fcc6ac fix: validate target and hostname inputs to prevent injection
Add input validation to address security concerns:

- Validate Target field in BuildRequest against safe character pattern
  (must be "all" or match alphanumeric/dash/underscore/dot pattern)
- Filter hostnames discovered from nix flake show output, skipping any
  with invalid characters before using them in build commands

This prevents potential command injection via crafted NATS messages or
malicious flake configurations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-10 22:07:26 +01:00
14f5b31faf feat: add builder mode for centralized Nix builds
Add a new "builder" capability to trigger Nix builds on a dedicated
build host via NATS messaging. This allows pre-building NixOS
configurations before deployment.

New components:
- Builder mode: subscribes to build.<repo>.* subjects, executes nix build
- Build CLI command: triggers builds with progress tracking
- MCP build tool: available with --enable-builds flag
- Builder metrics: tracks build success/failure per repo and host
- NixOS module: services.homelab-deploy.builder

The builder uses a YAML config file to define allowed repositories
with their URLs and default branches. Builds can target all hosts
or specific hosts, with real-time progress updates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-10 22:03:14 +01:00
277a49a666 chore: update flake inputs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 15:44:18 +01:00
bc02393c5a fix: wait for metrics scrape before restarting after switch deployment
After a successful switch deployment, the listener now waits for Prometheus
to scrape the /metrics endpoint before exiting for restart. This ensures
deployment metrics are captured before the process restarts and resets
in-memory counters. Falls back to a 60 second timeout if no scrape occurs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 15:44:14 +01:00
746e30b24f fix: initialize counter and histogram metrics at startup
Counter and histogram metrics were absent from Prometheus scrapes until
the first deployment occurred, making it impossible to distinguish
"no deployments" from "exporter not running" in dashboards and alerts.

Initialize all expected label combinations with zero values when the
collector is created so metrics appear in every scrape from startup.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 21:29:36 +01:00
fd0d63b103 fix: flush NATS buffer after sending completed response
The CLI was reporting deployment failures even when the listener showed
success. This was a race condition: after a successful switch deployment,
the listener would send the "completed" response then immediately signal
restart. The NATS connection closed before the buffered message was
actually sent to the broker, so the CLI never received it.

Adding Flush() after sending the completed response ensures the message
reaches NATS before the listener can exit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:30:34 +01:00
36a74b8cf9 feat: add heartbeat status updates during deployment
Send periodic "running" status messages while nixos-rebuild executes,
preventing the idle timeout from triggering before deployments complete.
This fixes false "Some deployments failed" warnings in MCP when builds
take longer than 30 seconds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 14:23:33 +01:00
79db119d1c feat: add Prometheus metrics to listener service
Add an optional Prometheus metrics HTTP endpoint to the listener for
monitoring deployment operations. Includes four metrics:

- homelab_deploy_deployments_total (counter with status/action/error_code)
- homelab_deploy_deployment_duration_seconds (histogram with action/success)
- homelab_deploy_deployment_in_progress (gauge)
- homelab_deploy_info (gauge with hostname/tier/role/version)

New CLI flags: --metrics-enabled, --metrics-addr (default :9972)
New NixOS options: metrics.enable, metrics.address, metrics.openFirewall

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 07:58:22 +01:00
56365835c7 feat: add list-hosts command to CLI
Adds a list-hosts command that mirrors the MCP list_hosts functionality,
allowing discovery of available deployment targets from the command line.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 07:30:57 +01:00
95b795dcfd fix: remove systemd hardening to allow nix sandbox namespace creation
The previous hardening options (ProtectControlGroups, LockPersonality,
SystemCallArchitectures, etc.) prevented Nix from creating the kernel
namespaces required for build sandboxing. Following the approach of
the NixOS auto-upgrade module which has no hardening since nixos-rebuild
requires broad system access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:52:16 +01:00
71d6aa8b61 fix: disable PrivateDevices to allow nix sandbox namespace creation
The PrivateDevices=true systemd hardening option was preventing Nix
from creating the kernel namespaces required for its build sandbox.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:26:53 +01:00
2c97b6140c fix: check only final responses in AllSucceeded to determine deployment success
The CLI was incorrectly reporting "some deployments failed" even when
deployments succeeded. This was because AllSucceeded() checked if every
response had StatusCompleted, but the Responses slice contains all
messages including intermediate ones like "started". Since started !=
completed, it returned false.

Now AllSucceeded() only examines final responses (using IsFinal()) and
checks that each host's final status is completed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:19:08 +01:00
efacb13b86 feat: exit listener after successful switch for automatic restart
After a successful switch deployment, the listener now exits gracefully
so systemd can restart it with the new binary. This works together with
stopIfChanged/restartIfChanged to ensure deployments complete before
the service restarts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:11:03 +01:00
ac3c9c7de6 fix: prevent listener service from restarting during deployment
Add stopIfChanged and restartIfChanged options to prevent the listener
from being interrupted when nixos-rebuild switch activates a new
configuration that changes the service definition.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:06:47 +01:00
9f205fee5e fix: add writable cache directory for nix git flake fetching
The listener service had ProtectHome=read-only which prevented Nix
from writing to /root/.cache when fetching git flakes. This adds a
CacheDirectory managed by systemd and sets XDG_CACHE_HOME to use it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:57:59 +01:00
5f3cfc3d21 fix: add nixos-rebuild to PATH and fix CLI hanging after deploy failure
- Add nixos-rebuild to listener service PATH in NixOS module
- Fix CLI deploy command hanging after receiving final status by properly
  tracking lastResponse time and exiting when all hosts have responded

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:53:22 +01:00
c9b85435ba fix: add git to listener service PATH for revision validation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:43:23 +01:00
cf3b1ce2c9 refactor: use flake package directly in NixOS module
Instead of requiring users to provide the package via overlay,
the module now receives `self` from the flake and uses the
package directly from `self.packages`.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:08:02 +01:00
9237814fed docs: add NATS subject structure and example server configuration
Document the complete subject hierarchy including deploy subjects,
response subjects, and discovery subject. Add example NATS server
configuration demonstrating tiered authentication with listener,
test deployer, and admin deployer permission patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 04:48:11 +01:00
f03eb5f7dc feat: add environment variable support for deploy command flags
Allows setting --nats-url, --nkey-file, --branch, --action, and --timeout
via HOMELAB_DEPLOY_* environment variables.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 04:43:50 +01:00
f51058964d fix: verify NKey file has secure permissions before reading
Reject NKey files that are readable by group or others (permissions
more permissive than 0600). This prevents accidental exposure of
private keys through overly permissive file permissions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 04:40:53 +01:00
28 changed files with 3193 additions and 141 deletions

443
README.md
View File

@@ -4,11 +4,12 @@ A message-based deployment system for NixOS configurations using NATS for messag
## Overview
The `homelab-deploy` binary provides three operational modes:
The `homelab-deploy` binary provides four operational modes:
1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive
2. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
3. **CLI mode** - Manual deployment commands for administrators
2. **Builder mode** - Runs on a dedicated build host, subscribing to NATS subjects and executing `nix build` to pre-build configurations
3. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
4. **CLI mode** - Manual deployment and build commands for administrators
## Installation
@@ -61,6 +62,8 @@ homelab-deploy listener \
| `--timeout` | No | Deployment timeout in seconds (default: 600) |
| `--deploy-subject` | No | NATS subjects to subscribe to (repeatable) |
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
#### Subject Templates
@@ -102,13 +105,13 @@ homelab-deploy deploy deploy.prod.role.dns \
#### Deploy Flags
| Flag | Required | Description |
|------|----------|-------------|
| `--nats-url` | Yes | NATS server URL |
| `--nkey-file` | Yes | Path to NKey seed file |
| `--branch` | No | Git branch or commit (default: `master`) |
| `--action` | No | nixos-rebuild action (default: `switch`) |
| `--timeout` | No | Response timeout in seconds (default: 900) |
| Flag | Required | Env Var | Description |
|------|----------|---------|-------------|
| `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL |
| `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file |
| `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch or commit (default: `master`) |
| `--action` | No | `HOMELAB_DEPLOY_ACTION` | nixos-rebuild action (default: `switch`) |
| `--timeout` | No | `HOMELAB_DEPLOY_TIMEOUT` | Response timeout in seconds (default: 900) |
#### Subject Aliases
@@ -126,6 +129,82 @@ homelab-deploy deploy prod-dns --nats-url ... --nkey-file ...
Alias lookup: `HOMELAB_DEPLOY_ALIAS_<NAME>` where name is uppercased and hyphens become underscores.
### Builder Mode
Run on a dedicated build host to pre-build NixOS configurations:
```bash
homelab-deploy builder \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/builder.nkey \
--config /etc/homelab-deploy/builder.yaml \
--timeout 1800 \
--metrics-enabled \
--metrics-addr :9973
```
#### Builder Configuration File
The builder uses a YAML configuration file to define allowed repositories:
```yaml
repos:
nixos-servers:
url: "git+https://git.example.com/org/nixos-servers.git"
default_branch: "master"
homelab:
url: "git+ssh://git@github.com/user/homelab.git"
default_branch: "main"
```
#### Builder Flags
| Flag | Required | Description |
|------|----------|-------------|
| `--nats-url` | Yes | NATS server URL |
| `--nkey-file` | Yes | Path to NKey seed file |
| `--config` | Yes | Path to builder configuration file |
| `--timeout` | No | Build timeout per host in seconds (default: 1800) |
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9973`) |
### Build Command
Trigger a build on the build server:
```bash
# Build all hosts in a repository
homelab-deploy build nixos-servers --all \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
# Build a specific host
homelab-deploy build nixos-servers myhost \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
# Build with a specific branch
homelab-deploy build nixos-servers --all --branch feature-x \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
# JSON output for scripting
homelab-deploy build nixos-servers --all --json \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
```
#### Build Flags
| Flag | Required | Env Var | Description |
|------|----------|---------|-------------|
| `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL |
| `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file |
| `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch (uses repo default if not specified) |
| `--all` | No | - | Build all hosts in the repository |
| `--timeout` | No | `HOMELAB_DEPLOY_BUILD_TIMEOUT` | Response timeout in seconds (default: 3600) |
| `--json` | No | - | Output results as JSON |
### MCP Server Mode
Run as an MCP server for AI assistant integration:
@@ -142,6 +221,12 @@ homelab-deploy mcp \
--nkey-file /run/secrets/mcp.nkey \
--enable-admin \
--admin-nkey-file /run/secrets/admin.nkey
# With build tool enabled
homelab-deploy mcp \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/mcp.nkey \
--enable-builds
```
#### MCP Tools
@@ -151,6 +236,7 @@ homelab-deploy mcp \
| `deploy` | Deploy to test-tier hosts only |
| `deploy_admin` | Deploy to any tier (requires `--enable-admin`) |
| `list_hosts` | Discover available deployment targets |
| `build` | Trigger builds on the build server (requires `--enable-builds`) |
#### Tool Parameters
@@ -165,6 +251,12 @@ homelab-deploy mcp \
**list_hosts:**
- `tier` - Filter by tier (optional)
**build:**
- `repo` - Repository name (required, must match builder config)
- `target` - Target hostname (optional, defaults to all)
- `all` - Build all hosts (default if no target specified)
- `branch` - Git branch (uses repo default if not specified)
## NixOS Module
Add the module to your NixOS configuration:
@@ -198,7 +290,7 @@ Add the module to your NixOS configuration:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enable` | bool | `false` | Enable the listener service |
| `package` | package | `pkgs.homelab-deploy` | Package to use |
| `package` | package | from flake | Package to use |
| `hostname` | string | `config.networking.hostName` | Hostname for subject templates |
| `tier` | enum | required | `"test"` or `"prod"` |
| `role` | string | `null` | Role for role-based targeting |
@@ -209,6 +301,9 @@ Add the module to your NixOS configuration:
| `deploySubjects` | list of string | see below | Subjects to subscribe to |
| `discoverSubject` | string | `"deploy.discover"` | Discovery subject |
| `environment` | attrs | `{}` | Additional environment variables |
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
Default `deploySubjects`:
```nix
@@ -219,6 +314,157 @@ Default `deploySubjects`:
]
```
### Builder Module Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enable` | bool | `false` | Enable the builder service |
| `package` | package | from flake | Package to use |
| `natsUrl` | string | required | NATS server URL |
| `nkeyFile` | path | required | Path to NKey seed file |
| `configFile` | path | `null` | Path to builder config file (alternative to `settings`) |
| `settings.repos` | attrs | `{}` | Repository configuration (see below) |
| `timeout` | int | `1800` | Build timeout per host in seconds |
| `environment` | attrs | `{}` | Additional environment variables |
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
| `metrics.address` | string | `":9973"` | Metrics HTTP server address |
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
Each entry in `settings.repos` is an attribute set with:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `url` | string | required | Git flake URL (must start with `git+https://`, `git+ssh://`, or `git+file://`) |
| `defaultBranch` | string | `"master"` | Default branch to build when not specified |
Example builder configuration using `settings`:
```nix
services.homelab-deploy.builder = {
enable = true;
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-builder-nkey";
settings.repos = {
nixos-servers = {
url = "git+https://git.example.com/org/nixos-servers.git";
defaultBranch = "master";
};
homelab = {
url = "git+ssh://git@github.com/user/homelab.git";
defaultBranch = "main";
};
};
metrics = {
enable = true;
address = ":9973";
openFirewall = true;
};
};
```
Alternatively, you can use `configFile` to point to an external YAML file:
```nix
services.homelab-deploy.builder = {
enable = true;
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-builder-nkey";
configFile = "/etc/homelab-deploy/builder.yaml";
};
```
## Prometheus Metrics
The listener can expose Prometheus metrics for monitoring deployment operations.
### Enabling Metrics
**CLI:**
```bash
homelab-deploy listener \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--metrics-enabled \
--metrics-addr :9972
```
**NixOS module:**
```nix
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
metrics = {
enable = true;
address = ":9972";
openFirewall = true; # Optional: open firewall for Prometheus scraping
};
};
```
### Available Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `homelab_deploy_deployments_total` | Counter | `status`, `action`, `error_code` | Total deployment requests processed |
| `homelab_deploy_deployment_duration_seconds` | Histogram | `action`, `success` | Deployment execution time |
| `homelab_deploy_deployment_in_progress` | Gauge | - | 1 if deployment running, 0 otherwise |
| `homelab_deploy_info` | Gauge | `hostname`, `tier`, `role`, `version` | Static instance metadata |
**Label values:**
- `status`: `completed`, `failed`, `rejected`
- `action`: `switch`, `boot`, `test`, `dry-activate`
- `error_code`: `invalid_action`, `invalid_revision`, `already_running`, `build_failed`, `timeout`, or empty
- `success`: `true`, `false`
### HTTP Endpoints
| Endpoint | Description |
|----------|-------------|
| `/metrics` | Prometheus metrics in text format |
| `/health` | Health check (returns `ok`) |
### Example Prometheus Queries
```promql
# Average deployment duration (last hour)
rate(homelab_deploy_deployment_duration_seconds_sum[1h]) /
rate(homelab_deploy_deployment_duration_seconds_count[1h])
# Deployment success rate (last 24 hours)
sum(rate(homelab_deploy_deployments_total{status="completed"}[24h])) /
sum(rate(homelab_deploy_deployments_total{status=~"completed|failed"}[24h]))
# 95th percentile deployment time
histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[1h]))
# Currently running deployments across all hosts
sum(homelab_deploy_deployment_in_progress)
```
### Builder Metrics
When running in builder mode, additional metrics are available:
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `homelab_deploy_builds_total` | Counter | `repo`, `status` | Total builds processed |
| `homelab_deploy_build_host_total` | Counter | `repo`, `host`, `status` | Total host builds processed |
| `homelab_deploy_build_duration_seconds` | Histogram | `repo`, `host` | Build execution time per host |
| `homelab_deploy_build_last_timestamp` | Gauge | `repo` | Timestamp of last build attempt |
| `homelab_deploy_build_last_success_timestamp` | Gauge | `repo` | Timestamp of last successful build |
| `homelab_deploy_build_last_failure_timestamp` | Gauge | `repo` | Timestamp of last failed build |
**Label values:**
- `status`: `success`, `failure`
- `repo`: Repository name from config
- `host`: Host name being built
## Message Protocol
### Deploy Request
@@ -246,6 +492,37 @@ Default `deploySubjects`:
**Error codes:** `invalid_revision`, `invalid_action`, `already_running`, `build_failed`, `timeout`
### Build Request
```json
{
"repo": "nixos-servers",
"target": "all",
"branch": "main",
"reply_to": "build.responses.abc123"
}
```
### Build Response
```json
{
"status": "completed",
"message": "built 5/5 hosts successfully",
"results": [
{"host": "host1", "success": true, "duration_seconds": 120.5},
{"host": "host2", "success": true, "duration_seconds": 95.3}
],
"total_duration_seconds": 450.2,
"succeeded": 5,
"failed": 0
}
```
**Status values:** `started`, `progress`, `completed`, `failed`, `rejected`
Progress updates include `host`, `host_success`, `hosts_completed`, and `hosts_total` fields.
## NATS Authentication
All connections use NKey authentication. Generate keys with:
@@ -256,6 +533,150 @@ nk -gen user -pubout
Configure appropriate publish/subscribe permissions in your NATS server for each credential type.
## NATS Subject Structure
The deployment system uses the following NATS subject hierarchy:
### Deploy Subjects
| Subject Pattern | Purpose |
|-----------------|---------|
| `deploy.<tier>.<hostname>` | Deploy to a specific host |
| `deploy.<tier>.all` | Deploy to all hosts in a tier |
| `deploy.<tier>.role.<role>` | Deploy to hosts with a specific role in a tier |
**Tier values:** `test`, `prod`
**Examples:**
- `deploy.test.myhost` - Deploy to myhost in test tier
- `deploy.prod.all` - Deploy to all production hosts
- `deploy.prod.role.dns` - Deploy to all DNS servers in production
### Build Subjects
| Subject Pattern | Purpose |
|-----------------|---------|
| `build.<repo>.*` | Build requests for a repository |
| `build.<repo>.all` | Build all hosts in a repository |
| `build.<repo>.<hostname>` | Build a specific host |
### Response Subjects
| Subject Pattern | Purpose |
|-----------------|---------|
| `deploy.responses.<uuid>` | Unique reply subject for each deployment request |
| `build.responses.<uuid>` | Unique reply subject for each build request |
Deployers and build clients create a unique response subject for each request and include it in the `reply_to` field. Listeners and builders publish status updates to this subject.
### Discovery Subject
| Subject Pattern | Purpose |
|-----------------|---------|
| `deploy.discover` | Host discovery requests and responses |
Used by the `list_hosts` MCP tool and for discovering available deployment targets.
## Example NATS Configuration
Below is an example NATS server configuration implementing tiered authentication. This setup provides:
- **Listeners** - Each host has credentials to subscribe to its own subjects and publish responses
- **Test deployer** - Can deploy to test tier only (suitable for MCP without admin access)
- **Admin deployer** - Can deploy to all tiers (for CLI or MCP with admin access)
```conf
authorization {
users = [
# Listener for a test-tier host
{
nkey: "UTEST_HOST1_PUBLIC_KEY_HERE"
permissions: {
subscribe: [
"deploy.test.testhost1"
"deploy.test.all"
"deploy.test.role.>"
"deploy.discover"
]
publish: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Listener for a prod-tier host with 'dns' role
{
nkey: "UPROD_DNS1_PUBLIC_KEY_HERE"
permissions: {
subscribe: [
"deploy.prod.dns1"
"deploy.prod.all"
"deploy.prod.role.dns"
"deploy.discover"
]
publish: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Test-tier deployer (MCP without admin)
{
nkey: "UTEST_DEPLOYER_PUBLIC_KEY_HERE"
permissions: {
publish: [
"deploy.test.>"
"deploy.discover"
]
subscribe: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Admin deployer (full access to all tiers)
{
nkey: "UADMIN_DEPLOYER_PUBLIC_KEY_HERE"
permissions: {
publish: [
"deploy.>"
]
subscribe: [
"deploy.>"
]
}
}
]
}
```
### Key Permission Patterns
| Credential Type | Publish | Subscribe |
|-----------------|---------|-----------|
| Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` |
| Builder | `build.responses.>` | `build.<repo>.*` for each configured repo |
| Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` |
| Build client | `build.<repo>.*` | `build.responses.>` |
| Admin deployer | `deploy.>` | `deploy.>` |
### Generating NKeys
```bash
# Generate a keypair (outputs public key, saves seed to file)
nk -gen user -pubout > mykey.pub
# The seed (private key) is printed to stderr - save it securely
# Or generate and save seed directly
nk -gen user > mykey.seed
nk -inkey mykey.seed -pubout # Get public key from seed
```
The public key (starting with `U`) goes in the NATS server config. The seed file (starting with `SU`) is used by homelab-deploy via `--nkey-file`.
## License
MIT

View File

@@ -9,14 +9,15 @@ import (
"syscall"
"time"
deploycli "git.t-juice.club/torjus/homelab-deploy/internal/cli"
"git.t-juice.club/torjus/homelab-deploy/internal/listener"
"git.t-juice.club/torjus/homelab-deploy/internal/mcp"
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/builder"
deploycli "code.t-juice.club/torjus/homelab-deploy/internal/cli"
"code.t-juice.club/torjus/homelab-deploy/internal/listener"
"code.t-juice.club/torjus/homelab-deploy/internal/mcp"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
"github.com/urfave/cli/v3"
)
const version = "0.1.0"
const version = "0.2.5"
func main() {
app := &cli.Command{
@@ -25,8 +26,11 @@ func main() {
Version: version,
Commands: []*cli.Command{
listenerCommand(),
builderCommand(),
mcpCommand(),
deployCommand(),
buildCommand(),
listHostsCommand(),
},
}
@@ -89,6 +93,20 @@ func listenerCommand() *cli.Command {
Usage: "NATS subject for host discovery requests",
Value: "deploy.discover",
},
&cli.BoolFlag{
Name: "metrics-enabled",
Usage: "Enable Prometheus metrics endpoint",
},
&cli.StringFlag{
Name: "metrics-addr",
Usage: "Address for Prometheus metrics HTTP server",
Value: ":9972",
},
&cli.IntFlag{
Name: "heartbeat-interval",
Usage: "Interval in seconds for sending status updates during deployment (0 to disable)",
Value: 15,
},
},
Action: func(ctx context.Context, c *cli.Command) error {
tier := c.String("tier")
@@ -97,15 +115,19 @@ func listenerCommand() *cli.Command {
}
cfg := listener.Config{
Hostname: c.String("hostname"),
Tier: tier,
Role: c.String("role"),
NATSUrl: c.String("nats-url"),
NKeyFile: c.String("nkey-file"),
FlakeURL: c.String("flake-url"),
Timeout: time.Duration(c.Int("timeout")) * time.Second,
DeploySubjects: c.StringSlice("deploy-subject"),
DiscoverSubject: c.String("discover-subject"),
Hostname: c.String("hostname"),
Tier: tier,
Role: c.String("role"),
NATSUrl: c.String("nats-url"),
NKeyFile: c.String("nkey-file"),
FlakeURL: c.String("flake-url"),
Timeout: time.Duration(c.Int("timeout")) * time.Second,
HeartbeatInterval: time.Duration(c.Int("heartbeat-interval")) * time.Second,
DeploySubjects: c.StringSlice("deploy-subject"),
DiscoverSubject: c.String("discover-subject"),
MetricsEnabled: c.Bool("metrics-enabled"),
MetricsAddr: c.String("metrics-addr"),
Version: version,
}
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
@@ -156,6 +178,10 @@ func mcpCommand() *cli.Command {
Usage: "Timeout in seconds for deployment operations",
Value: 900,
},
&cli.BoolFlag{
Name: "enable-builds",
Usage: "Enable build tool",
},
},
Action: func(_ context.Context, c *cli.Command) error {
enableAdmin := c.Bool("enable-admin")
@@ -170,6 +196,7 @@ func mcpCommand() *cli.Command {
NKeyFile: c.String("nkey-file"),
EnableAdmin: enableAdmin,
AdminNKeyFile: adminNKeyFile,
EnableBuilds: c.Bool("enable-builds"),
DiscoverSubject: c.String("discover-subject"),
Timeout: time.Duration(c.Int("timeout")) * time.Second,
}
@@ -189,27 +216,32 @@ func deployCommand() *cli.Command {
&cli.StringFlag{
Name: "nats-url",
Usage: "NATS server URL",
Sources: cli.EnvVars("HOMELAB_DEPLOY_NATS_URL"),
Required: true,
},
&cli.StringFlag{
Name: "nkey-file",
Usage: "Path to NKey seed file for NATS authentication",
Sources: cli.EnvVars("HOMELAB_DEPLOY_NKEY_FILE"),
Required: true,
},
&cli.StringFlag{
Name: "branch",
Usage: "Git branch or commit to deploy",
Value: "master",
Name: "branch",
Usage: "Git branch or commit to deploy",
Sources: cli.EnvVars("HOMELAB_DEPLOY_BRANCH"),
Value: "master",
},
&cli.StringFlag{
Name: "action",
Usage: "nixos-rebuild action (switch, boot, test, dry-activate)",
Value: "switch",
Name: "action",
Usage: "nixos-rebuild action (switch, boot, test, dry-activate)",
Sources: cli.EnvVars("HOMELAB_DEPLOY_ACTION"),
Value: "switch",
},
&cli.IntFlag{
Name: "timeout",
Usage: "Timeout in seconds for collecting responses",
Value: 900,
Name: "timeout",
Usage: "Timeout in seconds for collecting responses",
Sources: cli.EnvVars("HOMELAB_DEPLOY_TIMEOUT"),
Value: 900,
},
},
Action: func(ctx context.Context, c *cli.Command) error {
@@ -265,3 +297,289 @@ func deployCommand() *cli.Command {
},
}
}
func listHostsCommand() *cli.Command {
return &cli.Command{
Name: "list-hosts",
Usage: "List available deployment targets",
Flags: []cli.Flag{
&cli.StringFlag{
Name: "nats-url",
Usage: "NATS server URL",
Sources: cli.EnvVars("HOMELAB_DEPLOY_NATS_URL"),
Required: true,
},
&cli.StringFlag{
Name: "nkey-file",
Usage: "Path to NKey seed file for NATS authentication",
Sources: cli.EnvVars("HOMELAB_DEPLOY_NKEY_FILE"),
Required: true,
},
&cli.StringFlag{
Name: "tier",
Usage: "Filter by tier (test or prod)",
Sources: cli.EnvVars("HOMELAB_DEPLOY_TIER"),
},
&cli.StringFlag{
Name: "discover-subject",
Usage: "NATS subject for host discovery",
Sources: cli.EnvVars("HOMELAB_DEPLOY_DISCOVER_SUBJECT"),
Value: "deploy.discover",
},
&cli.IntFlag{
Name: "timeout",
Usage: "Timeout in seconds for discovery",
Sources: cli.EnvVars("HOMELAB_DEPLOY_DISCOVER_TIMEOUT"),
Value: 5,
},
},
Action: func(ctx context.Context, c *cli.Command) error {
tierFilter := c.String("tier")
if tierFilter != "" && tierFilter != "test" && tierFilter != "prod" {
return fmt.Errorf("tier must be 'test' or 'prod', got %q", tierFilter)
}
// Handle shutdown signals
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
defer cancel()
responses, err := deploycli.Discover(
ctx,
c.String("nats-url"),
c.String("nkey-file"),
c.String("discover-subject"),
time.Duration(c.Int("timeout"))*time.Second,
)
if err != nil {
return fmt.Errorf("discovery failed: %w", err)
}
if len(responses) == 0 {
fmt.Println("No hosts responded to discovery request")
return nil
}
fmt.Println("Available deployment targets:")
fmt.Println()
for _, resp := range responses {
if tierFilter != "" && resp.Tier != tierFilter {
continue
}
role := resp.Role
if role == "" {
role = "(none)"
}
fmt.Printf("- %s (tier=%s, role=%s)\n", resp.Hostname, resp.Tier, role)
for _, subj := range resp.DeploySubjects {
fmt.Printf(" %s\n", subj)
}
}
return nil
},
}
}
func builderCommand() *cli.Command {
return &cli.Command{
Name: "builder",
Usage: "Run as a build server (systemd service mode)",
Flags: []cli.Flag{
&cli.StringFlag{
Name: "nats-url",
Usage: "NATS server URL",
Required: true,
},
&cli.StringFlag{
Name: "nkey-file",
Usage: "Path to NKey seed file for NATS authentication",
Required: true,
},
&cli.StringFlag{
Name: "config",
Usage: "Path to builder configuration file",
Required: true,
},
&cli.IntFlag{
Name: "timeout",
Usage: "Build timeout in seconds per host",
Value: 1800,
},
&cli.BoolFlag{
Name: "metrics-enabled",
Usage: "Enable Prometheus metrics endpoint",
},
&cli.StringFlag{
Name: "metrics-addr",
Usage: "Address for Prometheus metrics HTTP server",
Value: ":9973",
},
},
Action: func(ctx context.Context, c *cli.Command) error {
repoCfg, err := builder.LoadConfig(c.String("config"))
if err != nil {
return fmt.Errorf("failed to load config: %w", err)
}
cfg := builder.BuilderConfig{
NATSUrl: c.String("nats-url"),
NKeyFile: c.String("nkey-file"),
ConfigFile: c.String("config"),
Timeout: time.Duration(c.Int("timeout")) * time.Second,
MetricsEnabled: c.Bool("metrics-enabled"),
MetricsAddr: c.String("metrics-addr"),
}
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelInfo,
}))
b := builder.New(cfg, repoCfg, logger)
// Handle shutdown signals
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
defer cancel()
return b.Run(ctx)
},
}
}
func buildCommand() *cli.Command {
return &cli.Command{
Name: "build",
Usage: "Trigger a build on the build server",
ArgsUsage: "<repo> [hostname]",
Flags: []cli.Flag{
&cli.StringFlag{
Name: "nats-url",
Usage: "NATS server URL",
Sources: cli.EnvVars("HOMELAB_DEPLOY_NATS_URL"),
Required: true,
},
&cli.StringFlag{
Name: "nkey-file",
Usage: "Path to NKey seed file for NATS authentication",
Sources: cli.EnvVars("HOMELAB_DEPLOY_NKEY_FILE"),
Required: true,
},
&cli.StringFlag{
Name: "branch",
Usage: "Git branch to build (uses repo default if not specified)",
Sources: cli.EnvVars("HOMELAB_DEPLOY_BRANCH"),
},
&cli.BoolFlag{
Name: "all",
Usage: "Build all hosts in the repo",
},
&cli.IntFlag{
Name: "timeout",
Usage: "Timeout in seconds for collecting responses",
Sources: cli.EnvVars("HOMELAB_DEPLOY_BUILD_TIMEOUT"),
Value: 3600,
},
&cli.BoolFlag{
Name: "json",
Usage: "Output results as JSON",
},
},
Action: func(ctx context.Context, c *cli.Command) error {
if c.Args().Len() < 1 {
return fmt.Errorf("repo argument required")
}
repo := c.Args().First()
target := c.Args().Get(1)
all := c.Bool("all")
if target == "" && !all {
return fmt.Errorf("must specify hostname or --all")
}
if target != "" && all {
return fmt.Errorf("cannot specify both hostname and --all")
}
if all {
target = "all"
}
cfg := deploycli.BuildConfig{
NATSUrl: c.String("nats-url"),
NKeyFile: c.String("nkey-file"),
Repo: repo,
Target: target,
Branch: c.String("branch"),
Timeout: time.Duration(c.Int("timeout")) * time.Second,
}
jsonOutput := c.Bool("json")
if !jsonOutput {
branchStr := cfg.Branch
if branchStr == "" {
branchStr = "(default)"
}
fmt.Printf("Building %s target=%s branch=%s\n", repo, target, branchStr)
}
// Handle shutdown signals
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
defer cancel()
result, err := deploycli.Build(ctx, cfg, func(resp *messages.BuildResponse) {
if jsonOutput {
return
}
switch resp.Status {
case messages.BuildStatusStarted:
fmt.Printf("Started: %s\n", resp.Message)
case messages.BuildStatusProgress:
successStr := "..."
if resp.HostSuccess != nil {
if *resp.HostSuccess {
successStr = "success"
} else {
successStr = "failed"
}
}
fmt.Printf("[%d/%d] %s: %s\n", resp.HostsCompleted, resp.HostsTotal, resp.Host, successStr)
case messages.BuildStatusCompleted, messages.BuildStatusFailed:
fmt.Printf("\n%s\n", resp.Message)
case messages.BuildStatusRejected:
fmt.Printf("Rejected: %s\n", resp.Message)
}
})
if err != nil {
return fmt.Errorf("build failed: %w", err)
}
if jsonOutput {
data, err := result.MarshalJSON()
if err != nil {
return fmt.Errorf("failed to marshal result: %w", err)
}
fmt.Println(string(data))
} else if result.FinalResponse != nil {
fmt.Printf("\nBuild complete: %d succeeded, %d failed (%.1fs)\n",
result.FinalResponse.Succeeded,
result.FinalResponse.Failed,
result.FinalResponse.TotalDurationSeconds)
for _, hr := range result.FinalResponse.Results {
if !hr.Success {
fmt.Printf("\n--- %s (error: %s) ---\n", hr.Host, hr.Error)
if hr.Output != "" {
fmt.Println(hr.Output)
}
}
}
}
if !result.AllSucceeded() {
return fmt.Errorf("some builds failed")
}
return nil
},
}
}

6
flake.lock generated
View File

@@ -2,11 +2,11 @@
"nodes": {
"nixpkgs": {
"locked": {
"lastModified": 1770197578,
"narHash": "sha256-AYqlWrX09+HvGs8zM6ebZ1pwUqjkfpnv8mewYwAo+iM=",
"lastModified": 1770562336,
"narHash": "sha256-ub1gpAONMFsT/GU2hV6ZWJjur8rJ6kKxdm9IlCT0j84=",
"owner": "nixos",
"repo": "nixpkgs",
"rev": "00c21e4c93d963c50d4c0c89bfa84ed6e0694df2",
"rev": "d6c71932130818840fc8fe9509cf50be8c64634f",
"type": "github"
},
"original": {

View File

@@ -26,7 +26,7 @@
pname = "homelab-deploy";
inherit version;
src = ./.;
vendorHash = "sha256-JXa+obN62zrrwXlplqojY7dvEunUqDdSTee6N8c5JTg=";
vendorHash = "sha256-CN+l0JbQu+HDfotkt3PUFzBexHCHpCKIIZpAQRyojBk=";
subPackages = [ "cmd/homelab-deploy" ];
};
default = self.packages.${system}.homelab-deploy;
@@ -49,7 +49,7 @@
};
});
nixosModules.default = import ./nixos/module.nix;
nixosModules.default = import ./nixos/module.nix { inherit self; };
nixosModules.homelab-deploy = self.nixosModules.default;
};
}

14
go.mod
View File

@@ -1,4 +1,4 @@
module git.t-juice.club/torjus/homelab-deploy
module code.t-juice.club/torjus/homelab-deploy
go 1.25.5
@@ -7,20 +7,30 @@ require (
github.com/mark3labs/mcp-go v0.43.2
github.com/nats-io/nats.go v1.48.0
github.com/nats-io/nkeys v0.4.15
github.com/prometheus/client_golang v1.23.2
github.com/urfave/cli/v3 v3.6.2
gopkg.in/yaml.v3 v3.0.1
)
require (
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/invopop/jsonschema v0.13.0 // indirect
github.com/klauspost/compress v1.18.0 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/nats-io/nuid v1.0.1 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/spf13/cast v1.7.1 // indirect
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
golang.org/x/crypto v0.47.0 // indirect
golang.org/x/sys v0.40.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
google.golang.org/protobuf v1.36.8 // indirect
)

33
go.sum
View File

@@ -1,13 +1,17 @@
github.com/bahlo/generic-list-go v0.2.0 h1:5sz/EEAK+ls5wF+NeqDpk5+iNdMDXrh3z3nPnH1Wvgk=
github.com/bahlo/generic-list-go v0.2.0/go.mod h1:2KvAjgMlE5NNynlg/5iLrrCCZ2+5xWbdbCW3pNTGyYg=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/buger/jsonparser v1.1.1 h1:2PnMjfWD7wBILjqQbt530v576A/cAbQvEW9gGIpYMUs=
github.com/buger/jsonparser v1.1.1/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8=
github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0=
github.com/google/go-cmp v0.5.9 h1:O2Tfq5qg4qc4AmwVlvv0oLiVAGB7enBSJ2x2DqQFi38=
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/invopop/jsonschema v0.13.0 h1:KvpoAJWEjR3uD9Kbm2HWJmqsEaHt8lBUpd0qHcIi21E=
@@ -19,10 +23,14 @@ github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0=
github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc=
github.com/mark3labs/mcp-go v0.43.2 h1:21PUSlWWiSbUPQwXIJ5WKlETixpFpq+WBpbMGDSVy/I=
github.com/mark3labs/mcp-go v0.43.2/go.mod h1:YnJfOL382MIWDx1kMY+2zsRHU/q78dBg9aFb8W6Thdw=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/nats-io/nats.go v1.48.0 h1:pSFyXApG+yWU/TgbKCjmm5K4wrHu86231/w84qRVR+U=
github.com/nats-io/nats.go v1.48.0/go.mod h1:iRWIPokVIFbVijxuMQq4y9ttaBTMe0SFdlZfMDd+33g=
github.com/nats-io/nkeys v0.4.15 h1:JACV5jRVO9V856KOapQ7x+EY8Jo3qw1vJt/9Jpwzkk4=
@@ -31,8 +39,16 @@ github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw=
github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/rogpeppe/go-internal v1.9.0 h1:73kH8U+JUqXU8lRuOHeVHaa/SZPifC7BkcraZVejAe8=
github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
github.com/prometheus/client_golang v1.23.2 h1:Je96obch5RDVy3FDMndoUsjAhG5Edi49h0RJWRi/o0o=
github.com/prometheus/client_golang v1.23.2/go.mod h1:Tb1a6LWHB3/SPIzCoaDXI4I8UHKeFTEQ1YCr+0Gyqmg=
github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE=
github.com/prometheus/common v0.66.1 h1:h5E0h5/Y8niHc5DlaLlWLArTQI7tMrsfQjHV+d9ZoGs=
github.com/prometheus/common v0.66.1/go.mod h1:gcaUsgf3KfRSwHY4dIMXLPV0K/Wg1oZ8+SbZk/HH/dA=
github.com/prometheus/procfs v0.16.1 h1:hZ15bTNuirocR6u0JZ6BAHHmwS1p8B4P6MRqxtzMyRg=
github.com/prometheus/procfs v0.16.1/go.mod h1:teAbpZRB1iIAJYREa1LsoWUXykVXA1KlTmWl8x/U+Is=
github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ=
github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog=
github.com/spf13/cast v1.7.1 h1:cuNEagBQEHWN1FnbGEjCXL2szYEXqfJPbP2HNUaca9Y=
github.com/spf13/cast v1.7.1/go.mod h1:ancEpBxwJDODSW/UG4rDrAqiKolqNNh2DX3mk86cAdo=
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
@@ -43,11 +59,18 @@ github.com/wk8/go-ordered-map/v2 v2.1.8 h1:5h/BUHu93oj4gIdvHHHGsScSTMijfx5PeYkE/
github.com/wk8/go-ordered-map/v2 v2.1.8/go.mod h1:5nJHM5DyteebpVlHnWMV0rPz6Zp7+xBAnxjb1X5vnTw=
github.com/yosida95/uritemplate/v3 v3.0.2 h1:Ed3Oyj9yrmi9087+NczuL5BwkIc4wvTb5zIM+UJPGz4=
github.com/yosida95/uritemplate/v3 v3.0.2/go.mod h1:ILOh0sOhIJR3+L/8afwt/kE++YT040gmv5BQTMR2HP4=
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI=
go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
golang.org/x/crypto v0.47.0 h1:V6e3FRj+n4dbpw86FJ8Fv7XVOql7TEwpHapKoMJ/GO8=
golang.org/x/crypto v0.47.0/go.mod h1:ff3Y9VzzKbwSSEzWqJsJVBnWmRwRSHt/6Op5n9bQc4A=
golang.org/x/sys v0.40.0 h1:DBZZqJ2Rkml6QMQsZywtnjnnGvHza6BTfYFWY9kjEWQ=
golang.org/x/sys v0.40.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
google.golang.org/protobuf v1.36.8 h1:xHScyCOEuuwZEc6UtSOvPbAT4zRh0xcNRYekJwfqyMc=
google.golang.org/protobuf v1.36.8/go.mod h1:fuxRtAxBytpl4zzqUh6/eyUujkJdNiuEkXntxiD/uRU=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

377
internal/builder/builder.go Normal file
View File

@@ -0,0 +1,377 @@
package builder
import (
"context"
"fmt"
"log/slog"
"regexp"
"sort"
"strings"
"sync"
"time"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/metrics"
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
)
// hostnameRegex validates hostnames from flake output.
// Allows: alphanumeric, dashes, underscores, dots.
var hostnameRegex = regexp.MustCompile(`^[a-zA-Z0-9._-]+$`)
// truncateOutputLines truncates output to the first and last N lines if it exceeds 2*N lines,
// returning the result as a slice of strings.
func truncateOutputLines(output string, keepLines int) []string {
lines := strings.Split(output, "\n")
if len(lines) <= keepLines*2 {
return lines
}
head := lines[:keepLines]
tail := lines[len(lines)-keepLines:]
omitted := len(lines) - keepLines*2
result := make([]string, 0, keepLines*2+1)
result = append(result, head...)
result = append(result, fmt.Sprintf("... (%d lines omitted) ...", omitted))
result = append(result, tail...)
return result
}
// truncateOutput truncates output to the first and last N lines if it exceeds 2*N lines.
func truncateOutput(output string, keepLines int) string {
lines := strings.Split(output, "\n")
if len(lines) <= keepLines*2 {
return output
}
head := lines[:keepLines]
tail := lines[len(lines)-keepLines:]
omitted := len(lines) - keepLines*2
return strings.Join(head, "\n") + fmt.Sprintf("\n\n... (%d lines omitted) ...\n\n", omitted) + strings.Join(tail, "\n")
}
// BuilderConfig holds the configuration for the builder.
type BuilderConfig struct {
NATSUrl string
NKeyFile string
ConfigFile string
Timeout time.Duration
MetricsEnabled bool
MetricsAddr string
}
// Builder handles build requests from NATS.
type Builder struct {
cfg BuilderConfig
repoCfg *Config
client *nats.Client
executor *Executor
lock sync.Mutex
busy bool
logger *slog.Logger
// metrics server and collector (nil if metrics disabled)
metricsServer *metrics.Server
metrics *metrics.BuildCollector
}
// New creates a new builder with the given configuration.
func New(cfg BuilderConfig, repoCfg *Config, logger *slog.Logger) *Builder {
if logger == nil {
logger = slog.Default()
}
b := &Builder{
cfg: cfg,
repoCfg: repoCfg,
executor: NewExecutor(cfg.Timeout),
logger: logger,
}
if cfg.MetricsEnabled {
b.metricsServer = metrics.NewServer(metrics.ServerConfig{
Addr: cfg.MetricsAddr,
Logger: logger,
})
b.metrics = metrics.NewBuildCollector(b.metricsServer.Registry())
}
return b
}
// Run starts the builder and blocks until the context is cancelled.
func (b *Builder) Run(ctx context.Context) error {
// Start metrics server if enabled
if b.metricsServer != nil {
if err := b.metricsServer.Start(); err != nil {
return fmt.Errorf("failed to start metrics server: %w", err)
}
defer func() {
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
_ = b.metricsServer.Shutdown(shutdownCtx)
}()
}
// Connect to NATS
b.logger.Info("connecting to NATS", "url", b.cfg.NATSUrl)
client, err := nats.Connect(nats.Config{
URL: b.cfg.NATSUrl,
NKeyFile: b.cfg.NKeyFile,
Name: "homelab-deploy-builder",
})
if err != nil {
return fmt.Errorf("failed to connect to NATS: %w", err)
}
b.client = client
defer b.client.Close()
b.logger.Info("connected to NATS")
// Subscribe to build subjects for each repo
for repoName := range b.repoCfg.Repos {
// Subscribe to build.<repo>.all and build.<repo>.<hostname>
allSubject := fmt.Sprintf("build.%s.*", repoName)
b.logger.Info("subscribing to build subject", "subject", allSubject)
if _, err := b.client.Subscribe(allSubject, b.handleBuildRequest); err != nil {
return fmt.Errorf("failed to subscribe to %s: %w", allSubject, err)
}
}
b.logger.Info("builder started", "repos", len(b.repoCfg.Repos))
// Wait for context cancellation
<-ctx.Done()
b.logger.Info("shutting down builder")
return nil
}
func (b *Builder) handleBuildRequest(subject string, data []byte) {
req, err := messages.UnmarshalBuildRequest(data)
if err != nil {
b.logger.Error("failed to unmarshal build request",
"subject", subject,
"error", err,
)
return
}
b.logger.Info("received build request",
"subject", subject,
"repo", req.Repo,
"target", req.Target,
"branch", req.Branch,
"reply_to", req.ReplyTo,
)
// Validate request
if err := req.Validate(); err != nil {
b.logger.Warn("invalid build request", "error", err)
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
messages.BuildStatusRejected,
err.Error(),
))
return
}
// Get repo config
repo, err := b.repoCfg.GetRepo(req.Repo)
if err != nil {
b.logger.Warn("unknown repo", "repo", req.Repo)
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
messages.BuildStatusRejected,
fmt.Sprintf("unknown repo: %s", req.Repo),
))
return
}
// Try to acquire lock
b.lock.Lock()
if b.busy {
b.lock.Unlock()
b.logger.Warn("build already in progress")
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
messages.BuildStatusRejected,
"another build is already in progress",
))
return
}
b.busy = true
b.lock.Unlock()
defer func() {
b.lock.Lock()
b.busy = false
b.lock.Unlock()
}()
// Use default branch if not specified
branch := req.Branch
if branch == "" {
branch = repo.DefaultBranch
}
// Determine hosts to build
var hosts []string
if req.Target == "all" {
// List hosts from flake
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
messages.BuildStatusStarted,
"discovering hosts...",
))
hosts, err = b.executor.ListHosts(context.Background(), repo.URL, branch)
if err != nil {
b.logger.Error("failed to list hosts", "error", err)
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
messages.BuildStatusFailed,
fmt.Sprintf("failed to list hosts: %v", err),
).WithError(err.Error()))
if b.metrics != nil {
b.metrics.RecordBuildFailure(req.Repo, "")
}
return
}
// Filter out hostnames with invalid characters (security: prevent injection)
validHosts := make([]string, 0, len(hosts))
for _, host := range hosts {
if hostnameRegex.MatchString(host) {
validHosts = append(validHosts, host)
} else {
b.logger.Warn("skipping hostname with invalid characters", "hostname", host)
}
}
hosts = validHosts
// Sort hosts for consistent ordering
sort.Strings(hosts)
} else {
hosts = []string{req.Target}
}
if len(hosts) == 0 {
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
messages.BuildStatusFailed,
"no hosts to build",
))
return
}
// Send started response
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
Status: messages.BuildStatusStarted,
Message: fmt.Sprintf("building %d host(s)", len(hosts)),
HostsTotal: len(hosts),
})
// Build each host sequentially
startTime := time.Now()
results := make([]messages.BuildHostResult, 0, len(hosts))
succeeded := 0
failed := 0
for i, host := range hosts {
hostStart := time.Now()
b.logger.Info("building host",
"host", host,
"repo", req.Repo,
"rev", branch,
"progress", fmt.Sprintf("%d/%d", i+1, len(hosts)),
"command", b.executor.BuildCommand(repo.URL, branch, host),
)
result := b.executor.Build(context.Background(), repo.URL, branch, host)
hostDuration := time.Since(hostStart).Seconds()
hostResult := messages.BuildHostResult{
Host: host,
Success: result.Success,
DurationSeconds: hostDuration,
}
if !result.Success {
if result.Error != nil {
hostResult.Error = result.Error.Error()
}
if result.Stderr != "" {
hostResult.Output = truncateOutput(result.Stderr, 50)
}
}
results = append(results, hostResult)
if result.Success {
succeeded++
b.logger.Info("host build succeeded", "host", host, "repo", req.Repo, "rev", branch, "duration", hostDuration)
if b.metrics != nil {
b.metrics.RecordHostBuildSuccess(req.Repo, host, hostDuration)
}
} else {
failed++
b.logger.Error("host build failed", "host", host, "repo", req.Repo, "rev", branch, "error", hostResult.Error)
if result.Stderr != "" {
for _, line := range truncateOutputLines(result.Stderr, 50) {
b.logger.Warn("build output", "host", host, "repo", req.Repo, "line", line)
}
}
if b.metrics != nil {
b.metrics.RecordHostBuildFailure(req.Repo, host, hostDuration)
}
}
// Send progress update
success := result.Success
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
Status: messages.BuildStatusProgress,
Host: host,
HostSuccess: &success,
HostsCompleted: i + 1,
HostsTotal: len(hosts),
})
}
totalDuration := time.Since(startTime).Seconds()
// Send final response
status := messages.BuildStatusCompleted
message := fmt.Sprintf("built %d/%d hosts successfully", succeeded, len(hosts))
if failed > 0 {
status = messages.BuildStatusFailed
message = fmt.Sprintf("build failed: %d/%d hosts failed", failed, len(hosts))
}
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
Status: status,
Message: message,
Results: results,
TotalDurationSeconds: totalDuration,
Succeeded: succeeded,
Failed: failed,
})
// Record overall build metrics
if b.metrics != nil {
if failed == 0 {
b.metrics.RecordBuildSuccess(req.Repo)
} else {
b.metrics.RecordBuildFailure(req.Repo, "")
}
}
}
func (b *Builder) sendResponse(replyTo string, resp *messages.BuildResponse) {
data, err := resp.Marshal()
if err != nil {
b.logger.Error("failed to marshal build response", "error", err)
return
}
if err := b.client.Publish(replyTo, data); err != nil {
b.logger.Error("failed to publish build response",
"reply_to", replyTo,
"error", err,
)
}
// Flush to ensure response is sent immediately
if err := b.client.Flush(); err != nil {
b.logger.Error("failed to flush", "error", err)
}
}

View File

@@ -0,0 +1,164 @@
package builder
import (
"fmt"
"strings"
"testing"
)
func TestTruncateOutput(t *testing.T) {
tests := []struct {
name string
input string
keepLines int
wantLines int
wantOmit bool
}{
{
name: "short output unchanged",
input: "line1\nline2\nline3",
keepLines: 50,
wantLines: 3,
wantOmit: false,
},
{
name: "exactly at threshold unchanged",
input: strings.Join(makeLines(100), "\n"),
keepLines: 50,
wantLines: 100,
wantOmit: false,
},
{
name: "over threshold truncated",
input: strings.Join(makeLines(150), "\n"),
keepLines: 50,
wantLines: 103, // 50 + 1 (empty) + 1 (omitted msg) + 1 (empty) + 50
wantOmit: true,
},
{
name: "large output truncated",
input: strings.Join(makeLines(1000), "\n"),
keepLines: 50,
wantLines: 103,
wantOmit: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := truncateOutput(tt.input, tt.keepLines)
gotLines := strings.Split(got, "\n")
if len(gotLines) != tt.wantLines {
t.Errorf("got %d lines, want %d", len(gotLines), tt.wantLines)
}
hasOmit := strings.Contains(got, "lines omitted")
if hasOmit != tt.wantOmit {
t.Errorf("got omit marker = %v, want %v", hasOmit, tt.wantOmit)
}
if tt.wantOmit {
// Verify first and last lines are preserved
inputLines := strings.Split(tt.input, "\n")
firstLine := inputLines[0]
lastLine := inputLines[len(inputLines)-1]
if !strings.HasPrefix(got, firstLine+"\n") {
t.Errorf("first line not preserved, got prefix %q, want %q",
gotLines[0], firstLine)
}
if !strings.HasSuffix(got, lastLine) {
t.Errorf("last line not preserved, got suffix %q, want %q",
gotLines[len(gotLines)-1], lastLine)
}
}
})
}
}
func makeLines(n int) []string {
lines := make([]string, n)
for i := range lines {
lines[i] = "line " + strings.Repeat("x", i%80)
}
return lines
}
func TestTruncateOutputLines(t *testing.T) {
t.Run("short output returns all lines", func(t *testing.T) {
input := "line1\nline2\nline3"
got := truncateOutputLines(input, 50)
if len(got) != 3 {
t.Errorf("got %d lines, want 3", len(got))
}
if got[0] != "line1" || got[1] != "line2" || got[2] != "line3" {
t.Errorf("unexpected lines: %v", got)
}
})
t.Run("over threshold returns head + marker + tail", func(t *testing.T) {
lines := makeLines(200)
input := strings.Join(lines, "\n")
got := truncateOutputLines(input, 50)
// Should be 50 head + 1 marker + 50 tail = 101
if len(got) != 101 {
t.Errorf("got %d lines, want 101", len(got))
}
// Check first and last lines preserved
if got[0] != lines[0] {
t.Errorf("first line = %q, want %q", got[0], lines[0])
}
if got[len(got)-1] != lines[len(lines)-1] {
t.Errorf("last line = %q, want %q", got[len(got)-1], lines[len(lines)-1])
}
// Check omitted marker
marker := got[50]
expected := fmt.Sprintf("... (%d lines omitted) ...", 100)
if marker != expected {
t.Errorf("marker = %q, want %q", marker, expected)
}
})
t.Run("exactly at threshold returns all lines", func(t *testing.T) {
lines := makeLines(100)
input := strings.Join(lines, "\n")
got := truncateOutputLines(input, 50)
if len(got) != 100 {
t.Errorf("got %d lines, want 100", len(got))
}
})
}
func TestTruncateOutputPreservesContent(t *testing.T) {
// Create input with distinct first and last lines
lines := make([]string, 200)
for i := range lines {
lines[i] = "middle"
}
lines[0] = "FIRST"
lines[49] = "LAST_OF_HEAD"
lines[150] = "FIRST_OF_TAIL"
lines[199] = "LAST"
input := strings.Join(lines, "\n")
got := truncateOutput(input, 50)
if !strings.Contains(got, "FIRST") {
t.Error("missing FIRST")
}
if !strings.Contains(got, "LAST_OF_HEAD") {
t.Error("missing LAST_OF_HEAD")
}
if !strings.Contains(got, "FIRST_OF_TAIL") {
t.Error("missing FIRST_OF_TAIL")
}
if !strings.Contains(got, "LAST") {
t.Error("missing LAST")
}
if !strings.Contains(got, "(100 lines omitted)") {
t.Errorf("wrong omitted count, got: %s", got)
}
}

View File

@@ -0,0 +1,96 @@
package builder
import (
"fmt"
"os"
"regexp"
"strings"
"gopkg.in/yaml.v3"
)
// repoNameRegex validates repository names for safe use in NATS subjects.
// Only allows alphanumeric, dashes, and underscores (no dots or wildcards).
var repoNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
// validURLPrefixes are the allowed prefixes for repository URLs.
var validURLPrefixes = []string{
"git+https://",
"git+ssh://",
"git+file://",
}
// RepoConfig holds configuration for a single repository.
type RepoConfig struct {
URL string `yaml:"url"`
DefaultBranch string `yaml:"default_branch"`
}
// Config holds the builder configuration.
type Config struct {
Repos map[string]RepoConfig `yaml:"repos"`
}
// LoadConfig loads configuration from a YAML file.
func LoadConfig(path string) (*Config, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("failed to read config file: %w", err)
}
var cfg Config
if err := yaml.Unmarshal(data, &cfg); err != nil {
return nil, fmt.Errorf("failed to parse config file: %w", err)
}
if err := cfg.Validate(); err != nil {
return nil, err
}
return &cfg, nil
}
// Validate checks that the configuration is valid.
func (c *Config) Validate() error {
if len(c.Repos) == 0 {
return fmt.Errorf("no repos configured")
}
for name, repo := range c.Repos {
// Validate repo name for safe use in NATS subjects
if !repoNameRegex.MatchString(name) {
return fmt.Errorf("repo name %q contains invalid characters (only alphanumeric, dash, underscore allowed)", name)
}
if repo.URL == "" {
return fmt.Errorf("repo %q: url is required", name)
}
// Validate URL format
validURL := false
for _, prefix := range validURLPrefixes {
if strings.HasPrefix(repo.URL, prefix) {
validURL = true
break
}
}
if !validURL {
return fmt.Errorf("repo %q: url must start with git+https://, git+ssh://, or git+file://", name)
}
if repo.DefaultBranch == "" {
return fmt.Errorf("repo %q: default_branch is required", name)
}
}
return nil
}
// GetRepo returns the configuration for a repository, or an error if not found.
func (c *Config) GetRepo(name string) (*RepoConfig, error) {
repo, ok := c.Repos[name]
if !ok {
return nil, fmt.Errorf("repo %q not found in configuration", name)
}
return &repo, nil
}

View File

@@ -0,0 +1,116 @@
package builder
import (
"bytes"
"context"
"encoding/json"
"fmt"
"os/exec"
"time"
)
// Executor handles the execution of nix build commands.
type Executor struct {
timeout time.Duration
}
// NewExecutor creates a new build executor.
func NewExecutor(timeout time.Duration) *Executor {
return &Executor{
timeout: timeout,
}
}
// BuildResult contains the result of a build execution.
type BuildResult struct {
Success bool
ExitCode int
Stdout string
Stderr string
Error error
}
// FlakeShowResult contains the parsed output of nix flake show.
type FlakeShowResult struct {
NixosConfigurations map[string]any `json:"nixosConfigurations"`
}
// ListHosts returns the list of hosts (nixosConfigurations) available in a flake.
func (e *Executor) ListHosts(ctx context.Context, flakeURL, branch string) ([]string, error) {
ctx, cancel := context.WithTimeout(ctx, 60*time.Second)
defer cancel()
flakeRef := fmt.Sprintf("%s?ref=%s", flakeURL, branch)
cmd := exec.CommandContext(ctx, "nix", "flake", "show", "--json", flakeRef)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
if ctx.Err() == context.DeadlineExceeded {
return nil, fmt.Errorf("timeout listing hosts")
}
return nil, fmt.Errorf("failed to list hosts: %w\n%s", err, stderr.String())
}
var result FlakeShowResult
if err := json.Unmarshal(stdout.Bytes(), &result); err != nil {
return nil, fmt.Errorf("failed to parse flake show output: %w", err)
}
hosts := make([]string, 0, len(result.NixosConfigurations))
for host := range result.NixosConfigurations {
hosts = append(hosts, host)
}
return hosts, nil
}
// Build builds a single host's system configuration.
func (e *Executor) Build(ctx context.Context, flakeURL, branch, host string) *BuildResult {
ctx, cancel := context.WithTimeout(ctx, e.timeout)
defer cancel()
// Build the flake reference for the system toplevel
flakeRef := fmt.Sprintf("%s?ref=%s#nixosConfigurations.%s.config.system.build.toplevel", flakeURL, branch, host)
cmd := exec.CommandContext(ctx, "nix", "build", "--no-link", flakeRef)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
result := &BuildResult{
Stdout: stdout.String(),
Stderr: stderr.String(),
}
if err != nil {
result.Success = false
result.Error = err
if ctx.Err() == context.DeadlineExceeded {
result.Error = fmt.Errorf("build timed out after %v", e.timeout)
}
if exitErr, ok := err.(*exec.ExitError); ok {
result.ExitCode = exitErr.ExitCode()
} else {
result.ExitCode = -1
}
} else {
result.Success = true
result.ExitCode = 0
}
return result
}
// BuildCommand returns the command that would be executed (for logging/debugging).
func (e *Executor) BuildCommand(flakeURL, branch, host string) string {
flakeRef := fmt.Sprintf("%s?ref=%s#nixosConfigurations.%s.config.system.build.toplevel", flakeURL, branch, host)
return fmt.Sprintf("nix build --no-link %s", flakeRef)
}

140
internal/cli/build.go Normal file
View File

@@ -0,0 +1,140 @@
package cli
import (
"context"
"encoding/json"
"fmt"
"sync"
"time"
"github.com/google/uuid"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
)
// BuildConfig holds configuration for a build operation.
type BuildConfig struct {
NATSUrl string
NKeyFile string
Repo string
Target string
Branch string
Timeout time.Duration
}
// BuildResult contains the aggregated results from a build.
type BuildResult struct {
Responses []*messages.BuildResponse
FinalResponse *messages.BuildResponse
Errors []error
}
// AllSucceeded returns true if the build completed successfully.
func (r *BuildResult) AllSucceeded() bool {
if len(r.Errors) > 0 {
return false
}
if r.FinalResponse == nil {
return false
}
return r.FinalResponse.Status == messages.BuildStatusCompleted && r.FinalResponse.Failed == 0
}
// MarshalJSON returns the JSON representation of the build result.
func (r *BuildResult) MarshalJSON() ([]byte, error) {
if r.FinalResponse != nil {
return json.Marshal(r.FinalResponse)
}
return json.Marshal(map[string]any{
"status": "unknown",
"responses": r.Responses,
"errors": r.Errors,
})
}
// Build triggers a build and collects responses.
func Build(ctx context.Context, cfg BuildConfig, onResponse func(*messages.BuildResponse)) (*BuildResult, error) {
// Connect to NATS
client, err := nats.Connect(nats.Config{
URL: cfg.NATSUrl,
NKeyFile: cfg.NKeyFile,
Name: "homelab-deploy-build-cli",
})
if err != nil {
return nil, fmt.Errorf("failed to connect to NATS: %w", err)
}
defer client.Close()
// Generate unique reply subject
requestID := uuid.New().String()
replySubject := fmt.Sprintf("build.responses.%s", requestID)
var mu sync.Mutex
result := &BuildResult{}
done := make(chan struct{})
// Subscribe to reply subject
sub, err := client.Subscribe(replySubject, func(subject string, data []byte) {
resp, err := messages.UnmarshalBuildResponse(data)
if err != nil {
mu.Lock()
result.Errors = append(result.Errors, fmt.Errorf("failed to unmarshal response: %w", err))
mu.Unlock()
return
}
mu.Lock()
result.Responses = append(result.Responses, resp)
if resp.Status.IsFinal() {
result.FinalResponse = resp
select {
case <-done:
default:
close(done)
}
}
mu.Unlock()
if onResponse != nil {
onResponse(resp)
}
})
if err != nil {
return nil, fmt.Errorf("failed to subscribe to reply subject: %w", err)
}
defer func() { _ = sub.Unsubscribe() }()
// Build and send request
req := &messages.BuildRequest{
Repo: cfg.Repo,
Target: cfg.Target,
Branch: cfg.Branch,
ReplyTo: replySubject,
}
data, err := req.Marshal()
if err != nil {
return nil, fmt.Errorf("failed to marshal request: %w", err)
}
// Publish to build.<repo>.<target>
buildSubject := fmt.Sprintf("build.%s.%s", cfg.Repo, cfg.Target)
if err := client.Publish(buildSubject, data); err != nil {
return nil, fmt.Errorf("failed to publish request: %w", err)
}
if err := client.Flush(); err != nil {
return nil, fmt.Errorf("failed to flush: %w", err)
}
// Wait for final response or timeout
select {
case <-ctx.Done():
return result, ctx.Err()
case <-done:
return result, nil
case <-time.After(cfg.Timeout):
return result, nil
}
}

View File

@@ -8,8 +8,8 @@ import (
"github.com/google/uuid"
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
"git.t-juice.club/torjus/homelab-deploy/internal/nats"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
)
// DeployConfig holds configuration for a deploy operation.
@@ -28,14 +28,32 @@ type DeployResult struct {
Errors []error
}
// AllSucceeded returns true if all responses indicate success.
// AllSucceeded returns true if all hosts' final responses indicate success.
func (r *DeployResult) AllSucceeded() bool {
if len(r.Errors) > 0 {
return false
}
// Track the final status for each host
finalStatus := make(map[string]messages.Status)
for _, resp := range r.Responses {
if resp.Status != messages.StatusCompleted {
if resp.Status.IsFinal() {
finalStatus[resp.Hostname] = resp.Status
}
}
// Need at least one host with a final status
if len(finalStatus) == 0 {
return false
}
// All final statuses must be completed
for _, status := range finalStatus {
if status != messages.StatusCompleted {
return false
}
}
return len(r.Responses) > 0 && len(r.Errors) == 0
return true
}
// HostCount returns the number of unique hosts that responded.
@@ -67,7 +85,9 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
// Track responses by hostname to handle multiple messages per host
var mu sync.Mutex
result := &DeployResult{}
hostFinal := make(map[string]bool) // track which hosts have sent final status
hostFinal := make(map[string]bool) // track which hosts have sent final status
hostSeen := make(map[string]bool) // track all hosts that have responded
lastResponse := time.Now()
// Subscribe to reply subject
sub, err := client.Subscribe(replySubject, func(subject string, data []byte) {
@@ -81,9 +101,11 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
mu.Lock()
result.Responses = append(result.Responses, resp)
hostSeen[resp.Hostname] = true
if resp.Status.IsFinal() {
hostFinal[resp.Hostname] = true
}
lastResponse = time.Now()
mu.Unlock()
if onResponse != nil {
@@ -119,8 +141,7 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
// Use a dynamic timeout: wait for initial responses, then extend
// timeout after each response until no new responses or max timeout
deadline := time.Now().Add(cfg.Timeout)
lastResponse := time.Now()
idleTimeout := 30 * time.Second // wait this long after last response
idleTimeout := 30 * time.Second // wait this long after last response for new hosts
for {
select {
@@ -128,7 +149,9 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
return result, ctx.Err()
case <-time.After(1 * time.Second):
mu.Lock()
responseCount := len(result.Responses)
seenCount := len(hostSeen)
finalCount := len(hostFinal)
lastResponseTime := lastResponse
mu.Unlock()
now := time.Now()
@@ -138,21 +161,19 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
return result, nil
}
// If we have responses, use idle timeout
if responseCount > 0 {
mu.Lock()
lastResponseTime := lastResponse
// Update lastResponse time if we got new responses
if responseCount > 0 {
// Simple approximation - in practice you'd track this more precisely
lastResponseTime = now
}
mu.Unlock()
if now.Sub(lastResponseTime) > idleTimeout {
// If all hosts that responded have sent final status, we're done
// Add a short grace period for late arrivals from other hosts
if seenCount > 0 && seenCount == finalCount {
// Wait a bit for any other hosts to respond
if now.Sub(lastResponseTime) > 2*time.Second {
return result, nil
}
}
// If we have responses but waiting for more hosts, use idle timeout
if seenCount > 0 && now.Sub(lastResponseTime) > idleTimeout {
return result, nil
}
}
}
}

View File

@@ -3,7 +3,7 @@ package cli
import (
"testing"
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
)
func TestDeployResult_AllSucceeded(t *testing.T) {
@@ -49,6 +49,40 @@ func TestDeployResult_AllSucceeded(t *testing.T) {
errors: []error{nil}, // placeholder error
want: false,
},
{
name: "with intermediate responses - success",
responses: []*messages.DeployResponse{
{Hostname: "host1", Status: messages.StatusStarted},
{Hostname: "host1", Status: messages.StatusCompleted},
},
want: true,
},
{
name: "with intermediate responses - failure",
responses: []*messages.DeployResponse{
{Hostname: "host1", Status: messages.StatusStarted},
{Hostname: "host1", Status: messages.StatusFailed},
},
want: false,
},
{
name: "multiple hosts with intermediate responses",
responses: []*messages.DeployResponse{
{Hostname: "host1", Status: messages.StatusStarted},
{Hostname: "host2", Status: messages.StatusStarted},
{Hostname: "host1", Status: messages.StatusCompleted},
{Hostname: "host2", Status: messages.StatusCompleted},
},
want: true,
},
{
name: "only intermediate responses - no final",
responses: []*messages.DeployResponse{
{Hostname: "host1", Status: messages.StatusStarted},
{Hostname: "host1", Status: messages.StatusAccepted},
},
want: false,
},
}
for _, tc := range tests {

View File

@@ -7,7 +7,7 @@ import (
"os/exec"
"time"
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
)
// Executor handles the execution of nixos-rebuild commands.
@@ -35,6 +35,15 @@ type Result struct {
Error error
}
// ExecuteOptions contains optional settings for Execute.
type ExecuteOptions struct {
// HeartbeatInterval is how often to call the heartbeat callback.
// If zero, no heartbeat is sent.
HeartbeatInterval time.Duration
// HeartbeatCallback is called periodically with elapsed time while the command runs.
HeartbeatCallback func(elapsed time.Duration)
}
// ValidateRevision checks if a revision exists in the remote repository.
// It uses git ls-remote to verify the ref exists.
func (e *Executor) ValidateRevision(ctx context.Context, revision string) error {
@@ -65,6 +74,11 @@ func (e *Executor) ValidateRevision(ctx context.Context, revision string) error
// Execute runs nixos-rebuild with the specified action and revision.
func (e *Executor) Execute(ctx context.Context, action messages.Action, revision string) *Result {
return e.ExecuteWithOptions(ctx, action, revision, nil)
}
// ExecuteWithOptions runs nixos-rebuild with the specified action, revision, and options.
func (e *Executor) ExecuteWithOptions(ctx context.Context, action messages.Action, revision string, opts *ExecuteOptions) *Result {
ctx, cancel := context.WithTimeout(ctx, e.timeout)
defer cancel()
@@ -77,7 +91,41 @@ func (e *Executor) Execute(ctx context.Context, action messages.Action, revision
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
// Start the command
startTime := time.Now()
if err := cmd.Start(); err != nil {
return &Result{
Success: false,
ExitCode: -1,
Error: fmt.Errorf("failed to start command: %w", err),
}
}
// Set up heartbeat if configured
var heartbeatDone chan struct{}
if opts != nil && opts.HeartbeatInterval > 0 && opts.HeartbeatCallback != nil {
heartbeatDone = make(chan struct{})
go func() {
ticker := time.NewTicker(opts.HeartbeatInterval)
defer ticker.Stop()
for {
select {
case <-heartbeatDone:
return
case <-ticker.C:
opts.HeartbeatCallback(time.Since(startTime))
}
}
}()
}
// Wait for command to complete
err := cmd.Wait()
// Stop heartbeat goroutine
if heartbeatDone != nil {
close(heartbeatDone)
}
result := &Result{
Stdout: stdout.String(),

View File

@@ -4,7 +4,7 @@ import (
"testing"
"time"
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
)
func TestExecutor_BuildCommand(t *testing.T) {

View File

@@ -6,22 +6,27 @@ import (
"log/slog"
"time"
"git.t-juice.club/torjus/homelab-deploy/internal/deploy"
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
"git.t-juice.club/torjus/homelab-deploy/internal/nats"
"code.t-juice.club/torjus/homelab-deploy/internal/deploy"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
"code.t-juice.club/torjus/homelab-deploy/internal/metrics"
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
)
// Config holds the configuration for the listener.
type Config struct {
Hostname string
Tier string
Role string
NATSUrl string
NKeyFile string
FlakeURL string
Timeout time.Duration
DeploySubjects []string
DiscoverSubject string
Hostname string
Tier string
Role string
NATSUrl string
NKeyFile string
FlakeURL string
Timeout time.Duration
HeartbeatInterval time.Duration
DeploySubjects []string
DiscoverSubject string
MetricsEnabled bool
MetricsAddr string
Version string
}
// Listener handles deployment requests from NATS.
@@ -34,6 +39,14 @@ type Listener struct {
// Expanded subjects for discovery responses
expandedSubjects []string
// restartCh signals that the listener should exit for restart
// (e.g., after a successful switch deployment)
restartCh chan struct{}
// metrics server and collector (nil if metrics disabled)
metricsServer *metrics.Server
metrics *metrics.Collector
}
// New creates a new listener with the given configuration.
@@ -42,16 +55,42 @@ func New(cfg Config, logger *slog.Logger) *Listener {
logger = slog.Default()
}
return &Listener{
cfg: cfg,
executor: deploy.NewExecutor(cfg.FlakeURL, cfg.Hostname, cfg.Timeout),
lock: deploy.NewLock(),
logger: logger,
l := &Listener{
cfg: cfg,
executor: deploy.NewExecutor(cfg.FlakeURL, cfg.Hostname, cfg.Timeout),
lock: deploy.NewLock(),
logger: logger,
restartCh: make(chan struct{}, 1),
}
if cfg.MetricsEnabled {
l.metricsServer = metrics.NewServer(metrics.ServerConfig{
Addr: cfg.MetricsAddr,
Logger: logger,
})
l.metrics = l.metricsServer.Collector()
}
return l
}
// Run starts the listener and blocks until the context is cancelled.
func (l *Listener) Run(ctx context.Context) error {
// Start metrics server if enabled
if l.metricsServer != nil {
if err := l.metricsServer.Start(); err != nil {
return fmt.Errorf("failed to start metrics server: %w", err)
}
defer func() {
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
_ = l.metricsServer.Shutdown(shutdownCtx)
}()
// Set instance info metric
l.metrics.SetInfo(l.cfg.Hostname, l.cfg.Tier, l.cfg.Role, l.cfg.Version)
}
// Connect to NATS
l.logger.Info("connecting to NATS",
"url", l.cfg.NATSUrl,
@@ -93,9 +132,13 @@ func (l *Listener) Run(ctx context.Context) error {
l.logger.Info("listener started", "deploy_subjects", l.expandedSubjects, "discover_subject", discoverSubject)
// Wait for context cancellation
<-ctx.Done()
l.logger.Info("shutting down listener")
// Wait for context cancellation or restart signal
select {
case <-ctx.Done():
l.logger.Info("shutting down listener")
case <-l.restartCh:
l.logger.Info("exiting for restart after successful switch deployment")
}
return nil
}
@@ -127,6 +170,9 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
messages.StatusRejected,
err.Error(),
).WithError(messages.ErrorInvalidAction))
if l.metrics != nil {
l.metrics.RecordRejection(req.Action, messages.ErrorInvalidAction)
}
return
}
@@ -141,6 +187,9 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
messages.StatusRejected,
"another deployment is already in progress",
).WithError(messages.ErrorAlreadyRunning))
if l.metrics != nil {
l.metrics.RecordRejection(req.Action, messages.ErrorAlreadyRunning)
}
return
}
defer l.lock.Release()
@@ -152,6 +201,12 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
fmt.Sprintf("starting deployment: %s", l.executor.BuildCommand(req.Action, req.Revision)),
))
// Record deployment start for metrics
if l.metrics != nil {
l.metrics.RecordDeploymentStart()
}
startTime := time.Now()
// Validate revision
ctx := context.Background()
if err := l.executor.ValidateRevision(ctx, req.Revision); err != nil {
@@ -164,6 +219,10 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
messages.StatusFailed,
fmt.Sprintf("revision validation failed: %v", err),
).WithError(messages.ErrorInvalidRevision))
if l.metrics != nil {
duration := time.Since(startTime).Seconds()
l.metrics.RecordDeploymentFailure(req.Action, messages.ErrorInvalidRevision, duration)
}
return
}
@@ -174,7 +233,23 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
"command", l.executor.BuildCommand(req.Action, req.Revision),
)
result := l.executor.Execute(ctx, req.Action, req.Revision)
// Set up heartbeat options to send periodic status updates
var opts *deploy.ExecuteOptions
if l.cfg.HeartbeatInterval > 0 {
opts = &deploy.ExecuteOptions{
HeartbeatInterval: l.cfg.HeartbeatInterval,
HeartbeatCallback: func(elapsed time.Duration) {
l.sendResponse(req.ReplyTo, messages.NewDeployResponse(
l.cfg.Hostname,
messages.StatusRunning,
fmt.Sprintf("deployment in progress (%s elapsed)", elapsed.Round(time.Second)),
))
},
}
}
result := l.executor.ExecuteWithOptions(ctx, req.Action, req.Revision, opts)
duration := time.Since(startTime).Seconds()
if result.Success {
l.logger.Info("deployment completed successfully",
@@ -185,6 +260,33 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
messages.StatusCompleted,
"deployment completed successfully",
))
// Flush to ensure the completed response is sent before we potentially restart
if err := l.client.Flush(); err != nil {
l.logger.Error("failed to flush completed response", "error", err)
}
if l.metrics != nil {
l.metrics.RecordDeploymentEnd(req.Action, true, duration)
}
// After a successful switch, signal restart so we pick up any new version
if req.Action == messages.ActionSwitch {
// Wait for metrics scrape before restarting (if metrics enabled)
if l.metricsServer != nil {
l.logger.Info("waiting for metrics scrape before restart")
select {
case <-l.metricsServer.ScrapeCh():
l.logger.Info("metrics scraped, proceeding with restart")
case <-time.After(60 * time.Second):
l.logger.Warn("no metrics scrape within timeout, proceeding with restart anyway")
}
}
select {
case l.restartCh <- struct{}{}:
default:
// Channel already has a signal pending
}
}
} else {
l.logger.Error("deployment failed",
"exit_code", result.ExitCode,
@@ -202,6 +304,9 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
messages.StatusFailed,
fmt.Sprintf("deployment failed (exit code %d): %s", result.ExitCode, result.Stderr),
).WithError(errorCode))
if l.metrics != nil {
l.metrics.RecordDeploymentFailure(req.Action, errorCode, duration)
}
}
}

109
internal/mcp/build_tools.go Normal file
View File

@@ -0,0 +1,109 @@
package mcp
import (
"context"
"fmt"
"strings"
"github.com/mark3labs/mcp-go/mcp"
deploycli "code.t-juice.club/torjus/homelab-deploy/internal/cli"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
)
// BuildTool creates the build tool definition.
func BuildTool() mcp.Tool {
return mcp.NewTool(
"build",
mcp.WithDescription("Trigger a Nix build on the build server"),
mcp.WithString("repo",
mcp.Required(),
mcp.Description("Repository name (must match builder config)"),
),
mcp.WithString("target",
mcp.Description("Target hostname, or omit to build all hosts"),
),
mcp.WithBoolean("all",
mcp.Description("Build all hosts in the repository (default if no target specified)"),
),
mcp.WithString("branch",
mcp.Description("Git branch to build (uses repo default if not specified)"),
),
)
}
// HandleBuild handles the build tool.
func (h *ToolHandler) HandleBuild(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
repo, err := request.RequireString("repo")
if err != nil {
return mcp.NewToolResultError("repo is required"), nil
}
target := request.GetString("target", "")
all := request.GetBool("all", false)
branch := request.GetString("branch", "")
// Default to "all" if no target specified
if target == "" {
if !all {
all = true
}
target = "all"
}
if all && target != "all" {
return mcp.NewToolResultError("cannot specify both target and all"), nil
}
cfg := deploycli.BuildConfig{
NATSUrl: h.cfg.NATSUrl,
NKeyFile: h.cfg.NKeyFile,
Repo: repo,
Target: target,
Branch: branch,
Timeout: h.cfg.Timeout,
}
var output strings.Builder
branchStr := branch
if branchStr == "" {
branchStr = "(default)"
}
output.WriteString(fmt.Sprintf("Building %s target=%s branch=%s\n\n", repo, target, branchStr))
result, err := deploycli.Build(ctx, cfg, func(resp *messages.BuildResponse) {
switch resp.Status {
case messages.BuildStatusStarted:
output.WriteString(fmt.Sprintf("Started: %s\n", resp.Message))
case messages.BuildStatusProgress:
successStr := "..."
if resp.HostSuccess != nil {
if *resp.HostSuccess {
successStr = "success"
} else {
successStr = "failed"
}
}
output.WriteString(fmt.Sprintf("[%d/%d] %s: %s\n", resp.HostsCompleted, resp.HostsTotal, resp.Host, successStr))
case messages.BuildStatusCompleted, messages.BuildStatusFailed:
output.WriteString(fmt.Sprintf("\n%s\n", resp.Message))
case messages.BuildStatusRejected:
output.WriteString(fmt.Sprintf("Rejected: %s\n", resp.Message))
}
})
if err != nil {
return mcp.NewToolResultError(fmt.Sprintf("build failed: %v", err)), nil
}
if result.FinalResponse != nil {
output.WriteString(fmt.Sprintf("\nBuild complete: %d succeeded, %d failed (%.1fs)\n",
result.FinalResponse.Succeeded,
result.FinalResponse.Failed,
result.FinalResponse.TotalDurationSeconds))
}
if !result.AllSucceeded() {
output.WriteString("WARNING: Some builds failed\n")
}
return mcp.NewToolResultText(output.String()), nil
}

View File

@@ -12,6 +12,7 @@ type ServerConfig struct {
NKeyFile string
EnableAdmin bool
AdminNKeyFile string
EnableBuilds bool
DiscoverSubject string
Timeout time.Duration
}
@@ -49,6 +50,11 @@ func New(cfg ServerConfig) *Server {
s.AddTool(DeployAdminTool(), handler.HandleDeployAdmin)
}
// Optionally register build tool
if cfg.EnableBuilds {
s.AddTool(BuildTool(), handler.HandleBuild)
}
return &Server{
cfg: cfg,
server: s,

View File

@@ -9,8 +9,8 @@ import (
"github.com/mark3labs/mcp-go/mcp"
deploycli "git.t-juice.club/torjus/homelab-deploy/internal/cli"
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
deploycli "code.t-juice.club/torjus/homelab-deploy/internal/cli"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
)
// ToolConfig holds configuration for the MCP tools.

135
internal/messages/build.go Normal file
View File

@@ -0,0 +1,135 @@
package messages
import (
"encoding/json"
"fmt"
"strings"
)
// BuildStatus represents the status of a build response.
type BuildStatus string
const (
BuildStatusStarted BuildStatus = "started"
BuildStatusProgress BuildStatus = "progress"
BuildStatusCompleted BuildStatus = "completed"
BuildStatusFailed BuildStatus = "failed"
BuildStatusRejected BuildStatus = "rejected"
)
// IsFinal returns true if this status indicates a terminal state.
func (s BuildStatus) IsFinal() bool {
switch s {
case BuildStatusCompleted, BuildStatusFailed, BuildStatusRejected:
return true
default:
return false
}
}
// BuildRequest is the message sent to request a build.
type BuildRequest struct {
Repo string `json:"repo"` // Must match config
Target string `json:"target"` // Hostname or "all"
Branch string `json:"branch,omitempty"` // Optional, uses repo default
ReplyTo string `json:"reply_to"`
}
// Validate checks that the request is valid.
func (r *BuildRequest) Validate() error {
if r.Repo == "" {
return fmt.Errorf("repo is required")
}
if !revisionRegex.MatchString(r.Repo) {
return fmt.Errorf("invalid repo name format: %q", r.Repo)
}
if r.Target == "" {
return fmt.Errorf("target is required")
}
// Target must be "all" or a valid hostname (same format as revision/branch)
if r.Target != "all" && !revisionRegex.MatchString(r.Target) {
return fmt.Errorf("invalid target format: %q", r.Target)
}
if r.Branch != "" && !revisionRegex.MatchString(r.Branch) {
return fmt.Errorf("invalid branch format: %q", r.Branch)
}
if r.ReplyTo == "" {
return fmt.Errorf("reply_to is required")
}
// Validate reply_to format to prevent publishing to arbitrary subjects
if !strings.HasPrefix(r.ReplyTo, "build.responses.") {
return fmt.Errorf("invalid reply_to format: must start with 'build.responses.'")
}
return nil
}
// Marshal serializes the request to JSON.
func (r *BuildRequest) Marshal() ([]byte, error) {
return json.Marshal(r)
}
// UnmarshalBuildRequest deserializes a request from JSON.
func UnmarshalBuildRequest(data []byte) (*BuildRequest, error) {
var r BuildRequest
if err := json.Unmarshal(data, &r); err != nil {
return nil, fmt.Errorf("failed to unmarshal build request: %w", err)
}
return &r, nil
}
// BuildHostResult contains the result of building a single host.
type BuildHostResult struct {
Host string `json:"host"`
Success bool `json:"success"`
Error string `json:"error,omitempty"`
Output string `json:"output,omitempty"`
DurationSeconds float64 `json:"duration_seconds"`
}
// BuildResponse is the message sent in response to a build request.
type BuildResponse struct {
Status BuildStatus `json:"status"`
Message string `json:"message,omitempty"`
// Progress updates
Host string `json:"host,omitempty"`
HostSuccess *bool `json:"host_success,omitempty"`
HostsCompleted int `json:"hosts_completed,omitempty"`
HostsTotal int `json:"hosts_total,omitempty"`
// Final response
Results []BuildHostResult `json:"results,omitempty"`
TotalDurationSeconds float64 `json:"total_duration_seconds,omitempty"`
Succeeded int `json:"succeeded,omitempty"`
Failed int `json:"failed,omitempty"`
Error string `json:"error,omitempty"`
}
// NewBuildResponse creates a new response with the given status and message.
func NewBuildResponse(status BuildStatus, message string) *BuildResponse {
return &BuildResponse{
Status: status,
Message: message,
}
}
// WithError adds an error message to the response.
func (r *BuildResponse) WithError(err string) *BuildResponse {
r.Error = err
return r
}
// Marshal serializes the response to JSON.
func (r *BuildResponse) Marshal() ([]byte, error) {
return json.Marshal(r)
}
// UnmarshalBuildResponse deserializes a response from JSON.
func UnmarshalBuildResponse(data []byte) (*BuildResponse, error) {
var r BuildResponse
if err := json.Unmarshal(data, &r); err != nil {
return nil, fmt.Errorf("failed to unmarshal build response: %w", err)
}
return &r, nil
}

View File

@@ -35,6 +35,7 @@ const (
StatusAccepted Status = "accepted"
StatusRejected Status = "rejected"
StatusStarted Status = "started"
StatusRunning Status = "running"
StatusCompleted Status = "completed"
StatusFailed Status = "failed"
)

View File

@@ -0,0 +1,99 @@
package metrics
import (
"github.com/prometheus/client_golang/prometheus"
)
// BuildCollector holds all Prometheus metrics for the builder.
type BuildCollector struct {
buildsTotal *prometheus.CounterVec
buildHostTotal *prometheus.CounterVec
buildDuration *prometheus.HistogramVec
buildLastTimestamp *prometheus.GaugeVec
buildLastSuccessTime *prometheus.GaugeVec
buildLastFailureTime *prometheus.GaugeVec
}
// NewBuildCollector creates a new build metrics collector and registers it with the given registerer.
func NewBuildCollector(reg prometheus.Registerer) *BuildCollector {
c := &BuildCollector{
buildsTotal: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "homelab_deploy_builds_total",
Help: "Total builds processed",
},
[]string{"repo", "status"},
),
buildHostTotal: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "homelab_deploy_build_host_total",
Help: "Total host builds processed",
},
[]string{"repo", "host", "status"},
),
buildDuration: prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "homelab_deploy_build_duration_seconds",
Help: "Build execution time per host",
Buckets: []float64{5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 14400},
},
[]string{"repo", "host"},
),
buildLastTimestamp: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "homelab_deploy_build_last_timestamp",
Help: "Timestamp of last build attempt",
},
[]string{"repo"},
),
buildLastSuccessTime: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "homelab_deploy_build_last_success_timestamp",
Help: "Timestamp of last successful build",
},
[]string{"repo"},
),
buildLastFailureTime: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "homelab_deploy_build_last_failure_timestamp",
Help: "Timestamp of last failed build",
},
[]string{"repo"},
),
}
reg.MustRegister(c.buildsTotal)
reg.MustRegister(c.buildHostTotal)
reg.MustRegister(c.buildDuration)
reg.MustRegister(c.buildLastTimestamp)
reg.MustRegister(c.buildLastSuccessTime)
reg.MustRegister(c.buildLastFailureTime)
return c
}
// RecordBuildSuccess records a successful build.
func (c *BuildCollector) RecordBuildSuccess(repo string) {
c.buildsTotal.WithLabelValues(repo, "success").Inc()
c.buildLastTimestamp.WithLabelValues(repo).SetToCurrentTime()
c.buildLastSuccessTime.WithLabelValues(repo).SetToCurrentTime()
}
// RecordBuildFailure records a failed build.
func (c *BuildCollector) RecordBuildFailure(repo, errorCode string) {
c.buildsTotal.WithLabelValues(repo, "failure").Inc()
c.buildLastTimestamp.WithLabelValues(repo).SetToCurrentTime()
c.buildLastFailureTime.WithLabelValues(repo).SetToCurrentTime()
}
// RecordHostBuildSuccess records a successful host build.
func (c *BuildCollector) RecordHostBuildSuccess(repo, host string, durationSeconds float64) {
c.buildHostTotal.WithLabelValues(repo, host, "success").Inc()
c.buildDuration.WithLabelValues(repo, host).Observe(durationSeconds)
}
// RecordHostBuildFailure records a failed host build.
func (c *BuildCollector) RecordHostBuildFailure(repo, host string, durationSeconds float64) {
c.buildHostTotal.WithLabelValues(repo, host, "failure").Inc()
c.buildDuration.WithLabelValues(repo, host).Observe(durationSeconds)
}

125
internal/metrics/metrics.go Normal file
View File

@@ -0,0 +1,125 @@
// Package metrics provides Prometheus metrics for the homelab-deploy listener.
package metrics
import (
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
"github.com/prometheus/client_golang/prometheus"
)
// Collector holds all Prometheus metrics for the listener.
type Collector struct {
deploymentsTotal *prometheus.CounterVec
deploymentDuration *prometheus.HistogramVec
deploymentInProgress prometheus.Gauge
info *prometheus.GaugeVec
}
// NewCollector creates a new metrics collector and registers it with the given registerer.
func NewCollector(reg prometheus.Registerer) *Collector {
c := &Collector{
deploymentsTotal: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "homelab_deploy_deployments_total",
Help: "Total deployment requests processed",
},
[]string{"status", "action", "error_code"},
),
deploymentDuration: prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "homelab_deploy_deployment_duration_seconds",
Help: "Deployment execution time",
// Bucket boundaries for typical NixOS build times
Buckets: []float64{30, 60, 120, 300, 600, 900, 1200, 1800},
},
[]string{"action", "success"},
),
deploymentInProgress: prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "homelab_deploy_deployment_in_progress",
Help: "1 if deployment running, 0 otherwise",
},
),
info: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "homelab_deploy_info",
Help: "Static instance metadata",
},
[]string{"hostname", "tier", "role", "version"},
),
}
reg.MustRegister(c.deploymentsTotal)
reg.MustRegister(c.deploymentDuration)
reg.MustRegister(c.deploymentInProgress)
reg.MustRegister(c.info)
c.initMetrics()
return c
}
// initMetrics initializes all metric label combinations with zero values.
// This ensures metrics appear in Prometheus scrapes before any deployments occur.
func (c *Collector) initMetrics() {
actions := []messages.Action{
messages.ActionSwitch,
messages.ActionBoot,
messages.ActionTest,
messages.ActionDryActivate,
}
// Initialize deployment counter for common status/action combinations
for _, action := range actions {
// Successful completions (no error code)
c.deploymentsTotal.WithLabelValues("completed", string(action), "")
// Failed deployments (no error code - from RecordDeploymentEnd)
c.deploymentsTotal.WithLabelValues("failed", string(action), "")
}
// Initialize histogram for all action/success combinations
for _, action := range actions {
c.deploymentDuration.WithLabelValues(string(action), "true")
c.deploymentDuration.WithLabelValues(string(action), "false")
}
}
// SetInfo sets the static instance metadata.
func (c *Collector) SetInfo(hostname, tier, role, version string) {
c.info.WithLabelValues(hostname, tier, role, version).Set(1)
}
// RecordDeploymentStart marks the start of a deployment.
func (c *Collector) RecordDeploymentStart() {
c.deploymentInProgress.Set(1)
}
// RecordDeploymentEnd records the completion of a deployment.
func (c *Collector) RecordDeploymentEnd(action messages.Action, success bool, durationSeconds float64) {
c.deploymentInProgress.Set(0)
successLabel := "false"
if success {
successLabel = "true"
}
c.deploymentDuration.WithLabelValues(string(action), successLabel).Observe(durationSeconds)
status := "completed"
if !success {
status = "failed"
}
c.deploymentsTotal.WithLabelValues(status, string(action), "").Inc()
}
// RecordDeploymentFailure records a deployment failure with an error code.
func (c *Collector) RecordDeploymentFailure(action messages.Action, errorCode messages.ErrorCode, durationSeconds float64) {
c.deploymentInProgress.Set(0)
c.deploymentDuration.WithLabelValues(string(action), "false").Observe(durationSeconds)
c.deploymentsTotal.WithLabelValues("failed", string(action), string(errorCode)).Inc()
}
// RecordRejection records a rejected deployment request.
func (c *Collector) RecordRejection(action messages.Action, errorCode messages.ErrorCode) {
c.deploymentsTotal.WithLabelValues("rejected", string(action), string(errorCode)).Inc()
}

View File

@@ -0,0 +1,359 @@
package metrics
import (
"context"
"io"
"net/http"
"strings"
"testing"
"time"
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/testutil"
)
func TestCollector_SetInfo(t *testing.T) {
reg := prometheus.NewRegistry()
c := NewCollector(reg)
c.SetInfo("testhost", "test", "web", "1.0.0")
expected := `
# HELP homelab_deploy_info Static instance metadata
# TYPE homelab_deploy_info gauge
homelab_deploy_info{hostname="testhost",role="web",tier="test",version="1.0.0"} 1
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(expected), "homelab_deploy_info"); err != nil {
t.Errorf("unexpected metrics: %v", err)
}
}
func TestCollector_RecordDeploymentStart(t *testing.T) {
reg := prometheus.NewRegistry()
c := NewCollector(reg)
c.RecordDeploymentStart()
expected := `
# HELP homelab_deploy_deployment_in_progress 1 if deployment running, 0 otherwise
# TYPE homelab_deploy_deployment_in_progress gauge
homelab_deploy_deployment_in_progress 1
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(expected), "homelab_deploy_deployment_in_progress"); err != nil {
t.Errorf("unexpected metrics: %v", err)
}
}
func TestCollector_RecordDeploymentEnd_Success(t *testing.T) {
reg := prometheus.NewRegistry()
c := NewCollector(reg)
c.RecordDeploymentStart()
c.RecordDeploymentEnd(messages.ActionSwitch, true, 120.5)
// Check in_progress is 0
inProgressExpected := `
# HELP homelab_deploy_deployment_in_progress 1 if deployment running, 0 otherwise
# TYPE homelab_deploy_deployment_in_progress gauge
homelab_deploy_deployment_in_progress 0
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(inProgressExpected), "homelab_deploy_deployment_in_progress"); err != nil {
t.Errorf("unexpected in_progress metrics: %v", err)
}
// Check counter incremented (includes all pre-initialized metrics)
counterExpected := `
# HELP homelab_deploy_deployments_total Total deployment requests processed
# TYPE homelab_deploy_deployments_total counter
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 1
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
t.Errorf("unexpected counter metrics: %v", err)
}
}
func TestCollector_RecordDeploymentEnd_Failure(t *testing.T) {
reg := prometheus.NewRegistry()
c := NewCollector(reg)
c.RecordDeploymentStart()
c.RecordDeploymentEnd(messages.ActionBoot, false, 60.0)
counterExpected := `
# HELP homelab_deploy_deployments_total Total deployment requests processed
# TYPE homelab_deploy_deployments_total counter
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 1
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
t.Errorf("unexpected counter metrics: %v", err)
}
}
func TestCollector_RecordDeploymentFailure(t *testing.T) {
reg := prometheus.NewRegistry()
c := NewCollector(reg)
c.RecordDeploymentStart()
c.RecordDeploymentFailure(messages.ActionSwitch, messages.ErrorBuildFailed, 300.0)
counterExpected := `
# HELP homelab_deploy_deployments_total Total deployment requests processed
# TYPE homelab_deploy_deployments_total counter
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="switch",error_code="build_failed",status="failed"} 1
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
t.Errorf("unexpected counter metrics: %v", err)
}
}
func TestCollector_RecordRejection(t *testing.T) {
reg := prometheus.NewRegistry()
c := NewCollector(reg)
c.RecordRejection(messages.ActionSwitch, messages.ErrorAlreadyRunning)
expected := `
# HELP homelab_deploy_deployments_total Total deployment requests processed
# TYPE homelab_deploy_deployments_total counter
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="switch",error_code="already_running",status="rejected"} 1
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(expected), "homelab_deploy_deployments_total"); err != nil {
t.Errorf("unexpected metrics: %v", err)
}
}
func TestCollector_MetricsInitializedAtStartup(t *testing.T) {
reg := prometheus.NewRegistry()
_ = NewCollector(reg)
// Verify counter metrics are initialized with zero values before any deployments
counterExpected := `
# HELP homelab_deploy_deployments_total Total deployment requests processed
# TYPE homelab_deploy_deployments_total counter
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
t.Errorf("counter metrics not initialized: %v", err)
}
// Verify histogram metrics are initialized with zero values before any deployments
histogramExpected := `
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
# TYPE homelab_deploy_deployment_duration_seconds histogram
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 0
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
`
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
t.Errorf("histogram metrics not initialized: %v", err)
}
}
func TestServer_StartShutdown(t *testing.T) {
srv := NewServer(ServerConfig{
Addr: ":0", // Let OS pick a free port
})
if err := srv.Start(); err != nil {
t.Fatalf("failed to start server: %v", err)
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
t.Errorf("failed to shutdown server: %v", err)
}
}
func TestServer_Endpoints(t *testing.T) {
srv := NewServer(ServerConfig{
Addr: "127.0.0.1:19972", // Use a fixed port for testing
})
if err := srv.Start(); err != nil {
t.Fatalf("failed to start server: %v", err)
}
defer func() {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
_ = srv.Shutdown(ctx)
}()
// Give server time to start
time.Sleep(50 * time.Millisecond)
t.Run("health endpoint", func(t *testing.T) {
resp, err := http.Get("http://127.0.0.1:19972/health")
if err != nil {
t.Fatalf("failed to get health endpoint: %v", err)
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
body, _ := io.ReadAll(resp.Body)
if string(body) != "ok" {
t.Errorf("expected body 'ok', got %q", string(body))
}
})
t.Run("metrics endpoint", func(t *testing.T) {
// Set some info to have metrics to display
srv.Collector().SetInfo("testhost", "test", "web", "1.0.0")
resp, err := http.Get("http://127.0.0.1:19972/metrics")
if err != nil {
t.Fatalf("failed to get metrics endpoint: %v", err)
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
body, _ := io.ReadAll(resp.Body)
bodyStr := string(body)
if !strings.Contains(bodyStr, "homelab_deploy_info") {
t.Error("expected metrics to contain homelab_deploy_info")
}
})
}
func TestServer_Collector(t *testing.T) {
srv := NewServer(ServerConfig{
Addr: ":0",
})
collector := srv.Collector()
if collector == nil {
t.Error("expected non-nil collector")
}
}

107
internal/metrics/server.go Normal file
View File

@@ -0,0 +1,107 @@
package metrics
import (
"context"
"fmt"
"log/slog"
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// ServerConfig holds configuration for the metrics server.
type ServerConfig struct {
Addr string
Logger *slog.Logger
}
// Server serves Prometheus metrics over HTTP.
type Server struct {
httpServer *http.Server
registry *prometheus.Registry
collector *Collector
logger *slog.Logger
scrapeCh chan struct{}
}
// NewServer creates a new metrics server.
func NewServer(cfg ServerConfig) *Server {
logger := cfg.Logger
if logger == nil {
logger = slog.Default()
}
registry := prometheus.NewRegistry()
collector := NewCollector(registry)
scrapeCh := make(chan struct{})
metricsHandler := promhttp.HandlerFor(registry, promhttp.HandlerOpts{
Registry: registry,
})
mux := http.NewServeMux()
mux.Handle("/metrics", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
metricsHandler.ServeHTTP(w, r)
// Signal that a scrape occurred (non-blocking)
select {
case scrapeCh <- struct{}{}:
default:
}
}))
mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("ok"))
})
return &Server{
httpServer: &http.Server{
Addr: cfg.Addr,
Handler: mux,
ReadHeaderTimeout: 10 * time.Second,
},
registry: registry,
collector: collector,
logger: logger,
scrapeCh: scrapeCh,
}
}
// Collector returns the metrics collector.
func (s *Server) Collector() *Collector {
return s.collector
}
// Registry returns the Prometheus registry.
func (s *Server) Registry() *prometheus.Registry {
return s.registry
}
// ScrapeCh returns a channel that receives a signal each time the metrics endpoint is scraped.
func (s *Server) ScrapeCh() <-chan struct{} {
return s.scrapeCh
}
// Start starts the HTTP server in a goroutine.
func (s *Server) Start() error {
s.logger.Info("starting metrics server", "addr", s.httpServer.Addr)
go func() {
if err := s.httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
s.logger.Error("metrics server error", "error", err)
}
}()
return nil
}
// Shutdown gracefully shuts down the server.
func (s *Server) Shutdown(ctx context.Context) error {
s.logger.Info("shutting down metrics server")
if err := s.httpServer.Shutdown(ctx); err != nil {
return fmt.Errorf("failed to shutdown metrics server: %w", err)
}
return nil
}

View File

@@ -25,6 +25,15 @@ type Client struct {
// Connect establishes a connection to NATS using NKey authentication.
func Connect(cfg Config) (*Client, error) {
// Verify NKey file has secure permissions (no group/other access)
info, err := os.Stat(cfg.NKeyFile)
if err != nil {
return nil, fmt.Errorf("failed to stat nkey file: %w", err)
}
if perm := info.Mode().Perm(); perm&0o077 != 0 {
return nil, fmt.Errorf("nkey file has insecure permissions %04o: must not be accessible by group or others", perm)
}
seed, err := os.ReadFile(cfg.NKeyFile)
if err != nil {
return nil, fmt.Errorf("failed to read nkey file: %w", err)

View File

@@ -21,6 +21,29 @@ func TestConnect_InvalidNKeyFile(t *testing.T) {
}
}
func TestConnect_InsecureNKeyFilePermissions(t *testing.T) {
// Create a temp file with insecure permissions
tmpDir := t.TempDir()
keyFile := filepath.Join(tmpDir, "insecure.nkey")
if err := os.WriteFile(keyFile, []byte("test-content"), 0644); err != nil {
t.Fatalf("failed to write temp file: %v", err)
}
cfg := Config{
URL: "nats://localhost:4222",
NKeyFile: keyFile,
Name: "test",
}
_, err := Connect(cfg)
if err == nil {
t.Error("expected error for insecure nkey file permissions")
}
if err != nil && !contains(err.Error(), "insecure permissions") {
t.Errorf("expected insecure permissions error, got: %v", err)
}
}
func TestConnect_InvalidNKeySeed(t *testing.T) {
// Create a temp file with invalid content
tmpDir := t.TempDir()

View File

@@ -1,27 +1,72 @@
{ self }:
{ config, lib, pkgs, ... }:
let
cfg = config.services.homelab-deploy.listener;
listenerCfg = config.services.homelab-deploy.listener;
builderCfg = config.services.homelab-deploy.builder;
# Build command line arguments from configuration
args = lib.concatStringsSep " " ([
"--hostname ${lib.escapeShellArg cfg.hostname}"
"--tier ${cfg.tier}"
"--nats-url ${lib.escapeShellArg cfg.natsUrl}"
"--nkey-file ${lib.escapeShellArg cfg.nkeyFile}"
"--flake-url ${lib.escapeShellArg cfg.flakeUrl}"
"--timeout ${toString cfg.timeout}"
"--discover-subject ${lib.escapeShellArg cfg.discoverSubject}"
# Generate YAML config from settings
generatedConfigFile = pkgs.writeText "builder.yaml" (lib.generators.toYAML {} {
repos = lib.mapAttrs (name: repo: {
url = repo.url;
default_branch = repo.defaultBranch;
}) builderCfg.settings.repos;
});
# Use provided configFile or generate from settings
builderConfigFile =
if builderCfg.configFile != null
then builderCfg.configFile
else generatedConfigFile;
# Build command line arguments for listener from configuration
listenerArgs = lib.concatStringsSep " " ([
"--hostname ${lib.escapeShellArg listenerCfg.hostname}"
"--tier ${listenerCfg.tier}"
"--nats-url ${lib.escapeShellArg listenerCfg.natsUrl}"
"--nkey-file ${lib.escapeShellArg listenerCfg.nkeyFile}"
"--flake-url ${lib.escapeShellArg listenerCfg.flakeUrl}"
"--timeout ${toString listenerCfg.timeout}"
"--discover-subject ${lib.escapeShellArg listenerCfg.discoverSubject}"
]
++ lib.optional (cfg.role != null) "--role ${lib.escapeShellArg cfg.role}"
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") cfg.deploySubjects);
++ lib.optional (listenerCfg.role != null) "--role ${lib.escapeShellArg listenerCfg.role}"
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") listenerCfg.deploySubjects
++ lib.optionals listenerCfg.metrics.enable [
"--metrics-enabled"
"--metrics-addr ${lib.escapeShellArg listenerCfg.metrics.address}"
]);
# Build command line arguments for builder from configuration
builderArgs = lib.concatStringsSep " " ([
"--nats-url ${lib.escapeShellArg builderCfg.natsUrl}"
"--nkey-file ${lib.escapeShellArg builderCfg.nkeyFile}"
"--config ${builderConfigFile}"
"--timeout ${toString builderCfg.timeout}"
]
++ lib.optionals builderCfg.metrics.enable [
"--metrics-enabled"
"--metrics-addr ${lib.escapeShellArg builderCfg.metrics.address}"
]);
# Extract port from metrics address for firewall rule
extractPort = addr: let
# Handle both ":9972" and "0.0.0.0:9972" formats
parts = lib.splitString ":" addr;
in lib.toInt (lib.last parts);
listenerMetricsPort = extractPort listenerCfg.metrics.address;
builderMetricsPort = extractPort builderCfg.metrics.address;
in
{
options.services.homelab-deploy.listener = {
enable = lib.mkEnableOption "homelab-deploy listener service";
package = lib.mkPackageOption pkgs "homelab-deploy" { };
package = lib.mkOption {
type = lib.types.package;
default = self.packages.${pkgs.system}.homelab-deploy;
description = "The homelab-deploy package to use";
};
hostname = lib.mkOption {
type = lib.types.str;
@@ -89,44 +134,205 @@ in
description = "Additional environment variables for the service";
example = { GIT_SSH_COMMAND = "ssh -i /run/secrets/deploy-key"; };
};
};
config = lib.mkIf cfg.enable {
systemd.services.homelab-deploy-listener = {
description = "homelab-deploy listener";
wantedBy = [ "multi-user.target" ];
after = [ "network-online.target" ];
wants = [ "network-online.target" ];
metrics = {
enable = lib.mkEnableOption "Prometheus metrics endpoint";
environment = cfg.environment;
address = lib.mkOption {
type = lib.types.str;
default = ":9972";
description = "Address for Prometheus metrics HTTP server";
example = "127.0.0.1:9972";
};
serviceConfig = {
Type = "simple";
ExecStart = "${cfg.package}/bin/homelab-deploy listener ${args}";
Restart = "always";
RestartSec = 10;
# Hardening (compatible with nixos-rebuild requirements)
# Note: Some options are relaxed because nixos-rebuild requires:
# - Write access to /nix/store for building
# - Ability to activate system configurations
# - Network access for fetching from git/cache
# - Namespace support for nix sandbox builds
NoNewPrivileges = false;
ProtectSystem = "false";
ProtectHome = "read-only";
PrivateTmp = true;
PrivateDevices = true;
ProtectKernelTunables = true;
ProtectKernelModules = true;
ProtectControlGroups = true;
RestrictAddressFamilies = [ "AF_UNIX" "AF_INET" "AF_INET6" ];
RestrictNamespaces = false;
RestrictSUIDSGID = true;
LockPersonality = true;
MemoryDenyWriteExecute = false;
SystemCallArchitectures = "native";
openFirewall = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Open firewall for metrics port";
};
};
};
options.services.homelab-deploy.builder = {
enable = lib.mkEnableOption "homelab-deploy builder service";
package = lib.mkOption {
type = lib.types.package;
default = self.packages.${pkgs.system}.homelab-deploy;
description = "The homelab-deploy package to use";
};
natsUrl = lib.mkOption {
type = lib.types.str;
description = "NATS server URL";
example = "nats://nats.example.com:4222";
};
nkeyFile = lib.mkOption {
type = lib.types.path;
description = "Path to NKey seed file for NATS authentication";
example = "/run/secrets/homelab-deploy-builder-nkey";
};
configFile = lib.mkOption {
type = lib.types.nullOr lib.types.path;
default = null;
description = ''
Path to builder configuration file (YAML).
If not specified, a config file will be generated from the `settings` option.
'';
example = "/etc/homelab-deploy/builder.yaml";
};
settings = {
repos = lib.mkOption {
type = lib.types.attrsOf (lib.types.submodule {
options = {
url = lib.mkOption {
type = lib.types.str;
description = "Git flake URL for the repository";
example = "git+https://git.example.com/org/nixos-configs.git";
};
defaultBranch = lib.mkOption {
type = lib.types.str;
default = "master";
description = "Default branch to build when not specified in request";
example = "main";
};
};
});
default = {};
description = ''
Repository configuration for the builder.
Each key is the repository name used in build requests.
'';
example = lib.literalExpression ''
{
nixos-servers = {
url = "git+https://git.example.com/org/nixos-servers.git";
defaultBranch = "master";
};
homelab = {
url = "git+ssh://git@github.com/user/homelab.git";
defaultBranch = "main";
};
}
'';
};
};
timeout = lib.mkOption {
type = lib.types.int;
default = 1800;
description = "Build timeout in seconds per host";
};
environment = lib.mkOption {
type = lib.types.attrsOf lib.types.str;
default = { };
description = "Additional environment variables for the service";
example = { GIT_SSH_COMMAND = "ssh -i /run/secrets/deploy-key"; };
};
metrics = {
enable = lib.mkEnableOption "Prometheus metrics endpoint";
address = lib.mkOption {
type = lib.types.str;
default = ":9973";
description = "Address for Prometheus metrics HTTP server";
example = "127.0.0.1:9973";
};
openFirewall = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Open firewall for metrics port";
};
};
};
config = lib.mkMerge [
(lib.mkIf builderCfg.enable {
assertions = [
{
assertion = builderCfg.configFile != null || builderCfg.settings.repos != {};
message = "services.homelab-deploy.builder: either configFile or settings.repos must be specified";
}
];
})
(lib.mkIf listenerCfg.enable {
systemd.services.homelab-deploy-listener = {
description = "homelab-deploy listener";
wantedBy = [ "multi-user.target" ];
after = [ "network-online.target" ];
wants = [ "network-online.target" ];
# Prevent self-interruption during nixos-rebuild switch
# The service will continue running the old version until manually restarted
stopIfChanged = false;
restartIfChanged = false;
environment = listenerCfg.environment // {
# Nix needs a writable cache for git flake fetching
XDG_CACHE_HOME = "/var/cache/homelab-deploy";
};
path = [ pkgs.git config.system.build.nixos-rebuild ];
serviceConfig = {
CacheDirectory = "homelab-deploy";
Type = "simple";
ExecStart = "${listenerCfg.package}/bin/homelab-deploy listener ${listenerArgs}";
Restart = "always";
RestartSec = 10;
# Minimal hardening - nixos-rebuild requires broad system access:
# - Write access to /nix/store for building
# - Kernel namespace support for nix sandbox builds
# - Ability to activate system configurations
# - Network access for fetching from git/cache
# Following the approach of nixos auto-upgrade which has no hardening
};
};
networking.firewall.allowedTCPPorts = lib.mkIf (listenerCfg.metrics.enable && listenerCfg.metrics.openFirewall) [
listenerMetricsPort
];
})
(lib.mkIf builderCfg.enable {
systemd.services.homelab-deploy-builder = {
description = "homelab-deploy builder";
wantedBy = [ "multi-user.target" ];
after = [ "network-online.target" ];
wants = [ "network-online.target" ];
environment = builderCfg.environment // {
# Nix needs a writable cache for git flake fetching
XDG_CACHE_HOME = "/var/cache/homelab-deploy-builder";
};
path = [ pkgs.git pkgs.nix ];
serviceConfig = {
CacheDirectory = "homelab-deploy-builder";
Type = "simple";
ExecStart = "${builderCfg.package}/bin/homelab-deploy builder ${builderArgs}";
Restart = "always";
RestartSec = 10;
# Minimal hardening - nix build requires broad system access:
# - Write access to /nix/store for building
# - Kernel namespace support for nix sandbox builds
# - Network access for fetching from git/cache
};
};
networking.firewall.allowedTCPPorts = lib.mkIf (builderCfg.metrics.enable && builderCfg.metrics.openFirewall) [
builderMetricsPort
];
})
];
}