feat: add builder mode for centralized Nix builds
Add a new "builder" capability to trigger Nix builds on a dedicated build host via NATS messaging. This allows pre-building NixOS configurations before deployment. New components: - Builder mode: subscribes to build.<repo>.* subjects, executes nix build - Build CLI command: triggers builds with progress tracking - MCP build tool: available with --enable-builds flag - Builder metrics: tracks build success/failure per repo and host - NixOS module: services.homelab-deploy.builder The builder uses a YAML config file to define allowed repositories with their URLs and default branches. Builds can target all hosts or specific hosts, with real-time progress updates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
189
README.md
189
README.md
@@ -4,11 +4,12 @@ A message-based deployment system for NixOS configurations using NATS for messag
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The `homelab-deploy` binary provides three operational modes:
|
The `homelab-deploy` binary provides four operational modes:
|
||||||
|
|
||||||
1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive
|
1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive
|
||||||
2. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
|
2. **Builder mode** - Runs on a dedicated build host, subscribing to NATS subjects and executing `nix build` to pre-build configurations
|
||||||
3. **CLI mode** - Manual deployment commands for administrators
|
3. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
|
||||||
|
4. **CLI mode** - Manual deployment and build commands for administrators
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@@ -128,6 +129,82 @@ homelab-deploy deploy prod-dns --nats-url ... --nkey-file ...
|
|||||||
|
|
||||||
Alias lookup: `HOMELAB_DEPLOY_ALIAS_<NAME>` where name is uppercased and hyphens become underscores.
|
Alias lookup: `HOMELAB_DEPLOY_ALIAS_<NAME>` where name is uppercased and hyphens become underscores.
|
||||||
|
|
||||||
|
### Builder Mode
|
||||||
|
|
||||||
|
Run on a dedicated build host to pre-build NixOS configurations:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
homelab-deploy builder \
|
||||||
|
--nats-url nats://nats.example.com:4222 \
|
||||||
|
--nkey-file /run/secrets/builder.nkey \
|
||||||
|
--config /etc/homelab-deploy/builder.yaml \
|
||||||
|
--timeout 1800 \
|
||||||
|
--metrics-enabled \
|
||||||
|
--metrics-addr :9973
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Builder Configuration File
|
||||||
|
|
||||||
|
The builder uses a YAML configuration file to define allowed repositories:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
repos:
|
||||||
|
nixos-servers:
|
||||||
|
url: "git+https://git.example.com/org/nixos-servers.git"
|
||||||
|
default_branch: "master"
|
||||||
|
homelab:
|
||||||
|
url: "git+ssh://git@github.com/user/homelab.git"
|
||||||
|
default_branch: "main"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Builder Flags
|
||||||
|
|
||||||
|
| Flag | Required | Description |
|
||||||
|
|------|----------|-------------|
|
||||||
|
| `--nats-url` | Yes | NATS server URL |
|
||||||
|
| `--nkey-file` | Yes | Path to NKey seed file |
|
||||||
|
| `--config` | Yes | Path to builder configuration file |
|
||||||
|
| `--timeout` | No | Build timeout per host in seconds (default: 1800) |
|
||||||
|
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
|
||||||
|
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9973`) |
|
||||||
|
|
||||||
|
### Build Command
|
||||||
|
|
||||||
|
Trigger a build on the build server:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build all hosts in a repository
|
||||||
|
homelab-deploy build nixos-servers --all \
|
||||||
|
--nats-url nats://nats.example.com:4222 \
|
||||||
|
--nkey-file /run/secrets/deployer.nkey
|
||||||
|
|
||||||
|
# Build a specific host
|
||||||
|
homelab-deploy build nixos-servers myhost \
|
||||||
|
--nats-url nats://nats.example.com:4222 \
|
||||||
|
--nkey-file /run/secrets/deployer.nkey
|
||||||
|
|
||||||
|
# Build with a specific branch
|
||||||
|
homelab-deploy build nixos-servers --all --branch feature-x \
|
||||||
|
--nats-url nats://nats.example.com:4222 \
|
||||||
|
--nkey-file /run/secrets/deployer.nkey
|
||||||
|
|
||||||
|
# JSON output for scripting
|
||||||
|
homelab-deploy build nixos-servers --all --json \
|
||||||
|
--nats-url nats://nats.example.com:4222 \
|
||||||
|
--nkey-file /run/secrets/deployer.nkey
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Build Flags
|
||||||
|
|
||||||
|
| Flag | Required | Env Var | Description |
|
||||||
|
|------|----------|---------|-------------|
|
||||||
|
| `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL |
|
||||||
|
| `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file |
|
||||||
|
| `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch (uses repo default if not specified) |
|
||||||
|
| `--all` | No | - | Build all hosts in the repository |
|
||||||
|
| `--timeout` | No | `HOMELAB_DEPLOY_BUILD_TIMEOUT` | Response timeout in seconds (default: 3600) |
|
||||||
|
| `--json` | No | - | Output results as JSON |
|
||||||
|
|
||||||
### MCP Server Mode
|
### MCP Server Mode
|
||||||
|
|
||||||
Run as an MCP server for AI assistant integration:
|
Run as an MCP server for AI assistant integration:
|
||||||
@@ -144,6 +221,12 @@ homelab-deploy mcp \
|
|||||||
--nkey-file /run/secrets/mcp.nkey \
|
--nkey-file /run/secrets/mcp.nkey \
|
||||||
--enable-admin \
|
--enable-admin \
|
||||||
--admin-nkey-file /run/secrets/admin.nkey
|
--admin-nkey-file /run/secrets/admin.nkey
|
||||||
|
|
||||||
|
# With build tool enabled
|
||||||
|
homelab-deploy mcp \
|
||||||
|
--nats-url nats://nats.example.com:4222 \
|
||||||
|
--nkey-file /run/secrets/mcp.nkey \
|
||||||
|
--enable-builds
|
||||||
```
|
```
|
||||||
|
|
||||||
#### MCP Tools
|
#### MCP Tools
|
||||||
@@ -153,6 +236,7 @@ homelab-deploy mcp \
|
|||||||
| `deploy` | Deploy to test-tier hosts only |
|
| `deploy` | Deploy to test-tier hosts only |
|
||||||
| `deploy_admin` | Deploy to any tier (requires `--enable-admin`) |
|
| `deploy_admin` | Deploy to any tier (requires `--enable-admin`) |
|
||||||
| `list_hosts` | Discover available deployment targets |
|
| `list_hosts` | Discover available deployment targets |
|
||||||
|
| `build` | Trigger builds on the build server (requires `--enable-builds`) |
|
||||||
|
|
||||||
#### Tool Parameters
|
#### Tool Parameters
|
||||||
|
|
||||||
@@ -167,6 +251,12 @@ homelab-deploy mcp \
|
|||||||
**list_hosts:**
|
**list_hosts:**
|
||||||
- `tier` - Filter by tier (optional)
|
- `tier` - Filter by tier (optional)
|
||||||
|
|
||||||
|
**build:**
|
||||||
|
- `repo` - Repository name (required, must match builder config)
|
||||||
|
- `target` - Target hostname (optional, defaults to all)
|
||||||
|
- `all` - Build all hosts (default if no target specified)
|
||||||
|
- `branch` - Git branch (uses repo default if not specified)
|
||||||
|
|
||||||
## NixOS Module
|
## NixOS Module
|
||||||
|
|
||||||
Add the module to your NixOS configuration:
|
Add the module to your NixOS configuration:
|
||||||
@@ -224,6 +314,37 @@ Default `deploySubjects`:
|
|||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Builder Module Options
|
||||||
|
|
||||||
|
| Option | Type | Default | Description |
|
||||||
|
|--------|------|---------|-------------|
|
||||||
|
| `enable` | bool | `false` | Enable the builder service |
|
||||||
|
| `package` | package | from flake | Package to use |
|
||||||
|
| `natsUrl` | string | required | NATS server URL |
|
||||||
|
| `nkeyFile` | path | required | Path to NKey seed file |
|
||||||
|
| `configFile` | path | required | Path to builder configuration file |
|
||||||
|
| `timeout` | int | `1800` | Build timeout per host in seconds |
|
||||||
|
| `environment` | attrs | `{}` | Additional environment variables |
|
||||||
|
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
|
||||||
|
| `metrics.address` | string | `":9973"` | Metrics HTTP server address |
|
||||||
|
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
|
||||||
|
|
||||||
|
Example builder configuration:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
services.homelab-deploy.builder = {
|
||||||
|
enable = true;
|
||||||
|
natsUrl = "nats://nats.example.com:4222";
|
||||||
|
nkeyFile = "/run/secrets/homelab-deploy-builder-nkey";
|
||||||
|
configFile = "/etc/homelab-deploy/builder.yaml";
|
||||||
|
metrics = {
|
||||||
|
enable = true;
|
||||||
|
address = ":9973";
|
||||||
|
openFirewall = true;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
## Prometheus Metrics
|
## Prometheus Metrics
|
||||||
|
|
||||||
The listener can expose Prometheus metrics for monitoring deployment operations.
|
The listener can expose Prometheus metrics for monitoring deployment operations.
|
||||||
@@ -298,6 +419,24 @@ histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[
|
|||||||
sum(homelab_deploy_deployment_in_progress)
|
sum(homelab_deploy_deployment_in_progress)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Builder Metrics
|
||||||
|
|
||||||
|
When running in builder mode, additional metrics are available:
|
||||||
|
|
||||||
|
| Metric | Type | Labels | Description |
|
||||||
|
|--------|------|--------|-------------|
|
||||||
|
| `homelab_deploy_builds_total` | Counter | `repo`, `status` | Total builds processed |
|
||||||
|
| `homelab_deploy_build_host_total` | Counter | `repo`, `host`, `status` | Total host builds processed |
|
||||||
|
| `homelab_deploy_build_duration_seconds` | Histogram | `repo`, `host` | Build execution time per host |
|
||||||
|
| `homelab_deploy_build_last_timestamp` | Gauge | `repo` | Timestamp of last build attempt |
|
||||||
|
| `homelab_deploy_build_last_success_timestamp` | Gauge | `repo` | Timestamp of last successful build |
|
||||||
|
| `homelab_deploy_build_last_failure_timestamp` | Gauge | `repo` | Timestamp of last failed build |
|
||||||
|
|
||||||
|
**Label values:**
|
||||||
|
- `status`: `success`, `failure`
|
||||||
|
- `repo`: Repository name from config
|
||||||
|
- `host`: Host name being built
|
||||||
|
|
||||||
## Message Protocol
|
## Message Protocol
|
||||||
|
|
||||||
### Deploy Request
|
### Deploy Request
|
||||||
@@ -325,6 +464,37 @@ sum(homelab_deploy_deployment_in_progress)
|
|||||||
|
|
||||||
**Error codes:** `invalid_revision`, `invalid_action`, `already_running`, `build_failed`, `timeout`
|
**Error codes:** `invalid_revision`, `invalid_action`, `already_running`, `build_failed`, `timeout`
|
||||||
|
|
||||||
|
### Build Request
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"repo": "nixos-servers",
|
||||||
|
"target": "all",
|
||||||
|
"branch": "main",
|
||||||
|
"reply_to": "build.responses.abc123"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Build Response
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "completed",
|
||||||
|
"message": "built 5/5 hosts successfully",
|
||||||
|
"results": [
|
||||||
|
{"host": "host1", "success": true, "duration_seconds": 120.5},
|
||||||
|
{"host": "host2", "success": true, "duration_seconds": 95.3}
|
||||||
|
],
|
||||||
|
"total_duration_seconds": 450.2,
|
||||||
|
"succeeded": 5,
|
||||||
|
"failed": 0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status values:** `started`, `progress`, `completed`, `failed`, `rejected`
|
||||||
|
|
||||||
|
Progress updates include `host`, `host_success`, `hosts_completed`, and `hosts_total` fields.
|
||||||
|
|
||||||
## NATS Authentication
|
## NATS Authentication
|
||||||
|
|
||||||
All connections use NKey authentication. Generate keys with:
|
All connections use NKey authentication. Generate keys with:
|
||||||
@@ -354,13 +524,22 @@ The deployment system uses the following NATS subject hierarchy:
|
|||||||
- `deploy.prod.all` - Deploy to all production hosts
|
- `deploy.prod.all` - Deploy to all production hosts
|
||||||
- `deploy.prod.role.dns` - Deploy to all DNS servers in production
|
- `deploy.prod.role.dns` - Deploy to all DNS servers in production
|
||||||
|
|
||||||
|
### Build Subjects
|
||||||
|
|
||||||
|
| Subject Pattern | Purpose |
|
||||||
|
|-----------------|---------|
|
||||||
|
| `build.<repo>.*` | Build requests for a repository |
|
||||||
|
| `build.<repo>.all` | Build all hosts in a repository |
|
||||||
|
| `build.<repo>.<hostname>` | Build a specific host |
|
||||||
|
|
||||||
### Response Subjects
|
### Response Subjects
|
||||||
|
|
||||||
| Subject Pattern | Purpose |
|
| Subject Pattern | Purpose |
|
||||||
|-----------------|---------|
|
|-----------------|---------|
|
||||||
| `deploy.responses.<uuid>` | Unique reply subject for each deployment request |
|
| `deploy.responses.<uuid>` | Unique reply subject for each deployment request |
|
||||||
|
| `build.responses.<uuid>` | Unique reply subject for each build request |
|
||||||
|
|
||||||
Deployers create a unique response subject for each request and include it in the `reply_to` field. Listeners publish status updates to this subject.
|
Deployers and build clients create a unique response subject for each request and include it in the `reply_to` field. Listeners and builders publish status updates to this subject.
|
||||||
|
|
||||||
### Discovery Subject
|
### Discovery Subject
|
||||||
|
|
||||||
@@ -451,7 +630,9 @@ authorization {
|
|||||||
| Credential Type | Publish | Subscribe |
|
| Credential Type | Publish | Subscribe |
|
||||||
|-----------------|---------|-----------|
|
|-----------------|---------|-----------|
|
||||||
| Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` |
|
| Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` |
|
||||||
|
| Builder | `build.responses.>` | `build.<repo>.*` for each configured repo |
|
||||||
| Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` |
|
| Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` |
|
||||||
|
| Build client | `build.<repo>.*` | `build.responses.>` |
|
||||||
| Admin deployer | `deploy.>` | `deploy.>` |
|
| Admin deployer | `deploy.>` | `deploy.>` |
|
||||||
|
|
||||||
### Generating NKeys
|
### Generating NKeys
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ import (
|
|||||||
"syscall"
|
"syscall"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.t-juice.club/torjus/homelab-deploy/internal/builder"
|
||||||
deploycli "git.t-juice.club/torjus/homelab-deploy/internal/cli"
|
deploycli "git.t-juice.club/torjus/homelab-deploy/internal/cli"
|
||||||
"git.t-juice.club/torjus/homelab-deploy/internal/listener"
|
"git.t-juice.club/torjus/homelab-deploy/internal/listener"
|
||||||
"git.t-juice.club/torjus/homelab-deploy/internal/mcp"
|
"git.t-juice.club/torjus/homelab-deploy/internal/mcp"
|
||||||
@@ -16,7 +17,7 @@ import (
|
|||||||
"github.com/urfave/cli/v3"
|
"github.com/urfave/cli/v3"
|
||||||
)
|
)
|
||||||
|
|
||||||
const version = "0.1.13"
|
const version = "0.2.0"
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
app := &cli.Command{
|
app := &cli.Command{
|
||||||
@@ -25,8 +26,10 @@ func main() {
|
|||||||
Version: version,
|
Version: version,
|
||||||
Commands: []*cli.Command{
|
Commands: []*cli.Command{
|
||||||
listenerCommand(),
|
listenerCommand(),
|
||||||
|
builderCommand(),
|
||||||
mcpCommand(),
|
mcpCommand(),
|
||||||
deployCommand(),
|
deployCommand(),
|
||||||
|
buildCommand(),
|
||||||
listHostsCommand(),
|
listHostsCommand(),
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
@@ -175,6 +178,10 @@ func mcpCommand() *cli.Command {
|
|||||||
Usage: "Timeout in seconds for deployment operations",
|
Usage: "Timeout in seconds for deployment operations",
|
||||||
Value: 900,
|
Value: 900,
|
||||||
},
|
},
|
||||||
|
&cli.BoolFlag{
|
||||||
|
Name: "enable-builds",
|
||||||
|
Usage: "Enable build tool",
|
||||||
|
},
|
||||||
},
|
},
|
||||||
Action: func(_ context.Context, c *cli.Command) error {
|
Action: func(_ context.Context, c *cli.Command) error {
|
||||||
enableAdmin := c.Bool("enable-admin")
|
enableAdmin := c.Bool("enable-admin")
|
||||||
@@ -189,6 +196,7 @@ func mcpCommand() *cli.Command {
|
|||||||
NKeyFile: c.String("nkey-file"),
|
NKeyFile: c.String("nkey-file"),
|
||||||
EnableAdmin: enableAdmin,
|
EnableAdmin: enableAdmin,
|
||||||
AdminNKeyFile: adminNKeyFile,
|
AdminNKeyFile: adminNKeyFile,
|
||||||
|
EnableBuilds: c.Bool("enable-builds"),
|
||||||
DiscoverSubject: c.String("discover-subject"),
|
DiscoverSubject: c.String("discover-subject"),
|
||||||
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||||
}
|
}
|
||||||
@@ -374,3 +382,196 @@ func listHostsCommand() *cli.Command {
|
|||||||
},
|
},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func builderCommand() *cli.Command {
|
||||||
|
return &cli.Command{
|
||||||
|
Name: "builder",
|
||||||
|
Usage: "Run as a build server (systemd service mode)",
|
||||||
|
Flags: []cli.Flag{
|
||||||
|
&cli.StringFlag{
|
||||||
|
Name: "nats-url",
|
||||||
|
Usage: "NATS server URL",
|
||||||
|
Required: true,
|
||||||
|
},
|
||||||
|
&cli.StringFlag{
|
||||||
|
Name: "nkey-file",
|
||||||
|
Usage: "Path to NKey seed file for NATS authentication",
|
||||||
|
Required: true,
|
||||||
|
},
|
||||||
|
&cli.StringFlag{
|
||||||
|
Name: "config",
|
||||||
|
Usage: "Path to builder configuration file",
|
||||||
|
Required: true,
|
||||||
|
},
|
||||||
|
&cli.IntFlag{
|
||||||
|
Name: "timeout",
|
||||||
|
Usage: "Build timeout in seconds per host",
|
||||||
|
Value: 1800,
|
||||||
|
},
|
||||||
|
&cli.BoolFlag{
|
||||||
|
Name: "metrics-enabled",
|
||||||
|
Usage: "Enable Prometheus metrics endpoint",
|
||||||
|
},
|
||||||
|
&cli.StringFlag{
|
||||||
|
Name: "metrics-addr",
|
||||||
|
Usage: "Address for Prometheus metrics HTTP server",
|
||||||
|
Value: ":9973",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
Action: func(ctx context.Context, c *cli.Command) error {
|
||||||
|
repoCfg, err := builder.LoadConfig(c.String("config"))
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to load config: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cfg := builder.BuilderConfig{
|
||||||
|
NATSUrl: c.String("nats-url"),
|
||||||
|
NKeyFile: c.String("nkey-file"),
|
||||||
|
ConfigFile: c.String("config"),
|
||||||
|
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||||
|
MetricsEnabled: c.Bool("metrics-enabled"),
|
||||||
|
MetricsAddr: c.String("metrics-addr"),
|
||||||
|
}
|
||||||
|
|
||||||
|
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
|
||||||
|
Level: slog.LevelInfo,
|
||||||
|
}))
|
||||||
|
|
||||||
|
b := builder.New(cfg, repoCfg, logger)
|
||||||
|
|
||||||
|
// Handle shutdown signals
|
||||||
|
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
return b.Run(ctx)
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func buildCommand() *cli.Command {
|
||||||
|
return &cli.Command{
|
||||||
|
Name: "build",
|
||||||
|
Usage: "Trigger a build on the build server",
|
||||||
|
ArgsUsage: "<repo> [hostname]",
|
||||||
|
Flags: []cli.Flag{
|
||||||
|
&cli.StringFlag{
|
||||||
|
Name: "nats-url",
|
||||||
|
Usage: "NATS server URL",
|
||||||
|
Sources: cli.EnvVars("HOMELAB_DEPLOY_NATS_URL"),
|
||||||
|
Required: true,
|
||||||
|
},
|
||||||
|
&cli.StringFlag{
|
||||||
|
Name: "nkey-file",
|
||||||
|
Usage: "Path to NKey seed file for NATS authentication",
|
||||||
|
Sources: cli.EnvVars("HOMELAB_DEPLOY_NKEY_FILE"),
|
||||||
|
Required: true,
|
||||||
|
},
|
||||||
|
&cli.StringFlag{
|
||||||
|
Name: "branch",
|
||||||
|
Usage: "Git branch to build (uses repo default if not specified)",
|
||||||
|
Sources: cli.EnvVars("HOMELAB_DEPLOY_BRANCH"),
|
||||||
|
},
|
||||||
|
&cli.BoolFlag{
|
||||||
|
Name: "all",
|
||||||
|
Usage: "Build all hosts in the repo",
|
||||||
|
},
|
||||||
|
&cli.IntFlag{
|
||||||
|
Name: "timeout",
|
||||||
|
Usage: "Timeout in seconds for collecting responses",
|
||||||
|
Sources: cli.EnvVars("HOMELAB_DEPLOY_BUILD_TIMEOUT"),
|
||||||
|
Value: 3600,
|
||||||
|
},
|
||||||
|
&cli.BoolFlag{
|
||||||
|
Name: "json",
|
||||||
|
Usage: "Output results as JSON",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
Action: func(ctx context.Context, c *cli.Command) error {
|
||||||
|
if c.Args().Len() < 1 {
|
||||||
|
return fmt.Errorf("repo argument required")
|
||||||
|
}
|
||||||
|
|
||||||
|
repo := c.Args().First()
|
||||||
|
target := c.Args().Get(1)
|
||||||
|
all := c.Bool("all")
|
||||||
|
|
||||||
|
if target == "" && !all {
|
||||||
|
return fmt.Errorf("must specify hostname or --all")
|
||||||
|
}
|
||||||
|
if target != "" && all {
|
||||||
|
return fmt.Errorf("cannot specify both hostname and --all")
|
||||||
|
}
|
||||||
|
if all {
|
||||||
|
target = "all"
|
||||||
|
}
|
||||||
|
|
||||||
|
cfg := deploycli.BuildConfig{
|
||||||
|
NATSUrl: c.String("nats-url"),
|
||||||
|
NKeyFile: c.String("nkey-file"),
|
||||||
|
Repo: repo,
|
||||||
|
Target: target,
|
||||||
|
Branch: c.String("branch"),
|
||||||
|
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonOutput := c.Bool("json")
|
||||||
|
if !jsonOutput {
|
||||||
|
branchStr := cfg.Branch
|
||||||
|
if branchStr == "" {
|
||||||
|
branchStr = "(default)"
|
||||||
|
}
|
||||||
|
fmt.Printf("Building %s target=%s branch=%s\n", repo, target, branchStr)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle shutdown signals
|
||||||
|
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
result, err := deploycli.Build(ctx, cfg, func(resp *messages.BuildResponse) {
|
||||||
|
if jsonOutput {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
switch resp.Status {
|
||||||
|
case messages.BuildStatusStarted:
|
||||||
|
fmt.Printf("Started: %s\n", resp.Message)
|
||||||
|
case messages.BuildStatusProgress:
|
||||||
|
successStr := "..."
|
||||||
|
if resp.HostSuccess != nil {
|
||||||
|
if *resp.HostSuccess {
|
||||||
|
successStr = "success"
|
||||||
|
} else {
|
||||||
|
successStr = "failed"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fmt.Printf("[%d/%d] %s: %s\n", resp.HostsCompleted, resp.HostsTotal, resp.Host, successStr)
|
||||||
|
case messages.BuildStatusCompleted, messages.BuildStatusFailed:
|
||||||
|
fmt.Printf("\n%s\n", resp.Message)
|
||||||
|
case messages.BuildStatusRejected:
|
||||||
|
fmt.Printf("Rejected: %s\n", resp.Message)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("build failed: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if jsonOutput {
|
||||||
|
data, err := result.MarshalJSON()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to marshal result: %w", err)
|
||||||
|
}
|
||||||
|
fmt.Println(string(data))
|
||||||
|
} else if result.FinalResponse != nil {
|
||||||
|
fmt.Printf("\nBuild complete: %d succeeded, %d failed (%.1fs)\n",
|
||||||
|
result.FinalResponse.Succeeded,
|
||||||
|
result.FinalResponse.Failed,
|
||||||
|
result.FinalResponse.TotalDurationSeconds)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !result.AllSucceeded() {
|
||||||
|
return fmt.Errorf("some builds failed")
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
2
go.mod
2
go.mod
@@ -9,6 +9,7 @@ require (
|
|||||||
github.com/nats-io/nkeys v0.4.15
|
github.com/nats-io/nkeys v0.4.15
|
||||||
github.com/prometheus/client_golang v1.23.2
|
github.com/prometheus/client_golang v1.23.2
|
||||||
github.com/urfave/cli/v3 v3.6.2
|
github.com/urfave/cli/v3 v3.6.2
|
||||||
|
gopkg.in/yaml.v3 v3.0.1
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
@@ -32,5 +33,4 @@ require (
|
|||||||
golang.org/x/crypto v0.47.0 // indirect
|
golang.org/x/crypto v0.47.0 // indirect
|
||||||
golang.org/x/sys v0.40.0 // indirect
|
golang.org/x/sys v0.40.0 // indirect
|
||||||
google.golang.org/protobuf v1.36.8 // indirect
|
google.golang.org/protobuf v1.36.8 // indirect
|
||||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
|
||||||
)
|
)
|
||||||
|
|||||||
323
internal/builder/builder.go
Normal file
323
internal/builder/builder.go
Normal file
@@ -0,0 +1,323 @@
|
|||||||
|
package builder
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
"sort"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||||
|
"git.t-juice.club/torjus/homelab-deploy/internal/metrics"
|
||||||
|
"git.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BuilderConfig holds the configuration for the builder.
|
||||||
|
type BuilderConfig struct {
|
||||||
|
NATSUrl string
|
||||||
|
NKeyFile string
|
||||||
|
ConfigFile string
|
||||||
|
Timeout time.Duration
|
||||||
|
MetricsEnabled bool
|
||||||
|
MetricsAddr string
|
||||||
|
}
|
||||||
|
|
||||||
|
// Builder handles build requests from NATS.
|
||||||
|
type Builder struct {
|
||||||
|
cfg BuilderConfig
|
||||||
|
repoCfg *Config
|
||||||
|
client *nats.Client
|
||||||
|
executor *Executor
|
||||||
|
lock sync.Mutex
|
||||||
|
busy bool
|
||||||
|
logger *slog.Logger
|
||||||
|
|
||||||
|
// metrics server and collector (nil if metrics disabled)
|
||||||
|
metricsServer *metrics.Server
|
||||||
|
metrics *metrics.BuildCollector
|
||||||
|
}
|
||||||
|
|
||||||
|
// New creates a new builder with the given configuration.
|
||||||
|
func New(cfg BuilderConfig, repoCfg *Config, logger *slog.Logger) *Builder {
|
||||||
|
if logger == nil {
|
||||||
|
logger = slog.Default()
|
||||||
|
}
|
||||||
|
|
||||||
|
b := &Builder{
|
||||||
|
cfg: cfg,
|
||||||
|
repoCfg: repoCfg,
|
||||||
|
executor: NewExecutor(cfg.Timeout),
|
||||||
|
logger: logger,
|
||||||
|
}
|
||||||
|
|
||||||
|
if cfg.MetricsEnabled {
|
||||||
|
b.metricsServer = metrics.NewServer(metrics.ServerConfig{
|
||||||
|
Addr: cfg.MetricsAddr,
|
||||||
|
Logger: logger,
|
||||||
|
})
|
||||||
|
b.metrics = metrics.NewBuildCollector(b.metricsServer.Registry())
|
||||||
|
}
|
||||||
|
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run starts the builder and blocks until the context is cancelled.
|
||||||
|
func (b *Builder) Run(ctx context.Context) error {
|
||||||
|
// Start metrics server if enabled
|
||||||
|
if b.metricsServer != nil {
|
||||||
|
if err := b.metricsServer.Start(); err != nil {
|
||||||
|
return fmt.Errorf("failed to start metrics server: %w", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
_ = b.metricsServer.Shutdown(shutdownCtx)
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Connect to NATS
|
||||||
|
b.logger.Info("connecting to NATS", "url", b.cfg.NATSUrl)
|
||||||
|
|
||||||
|
client, err := nats.Connect(nats.Config{
|
||||||
|
URL: b.cfg.NATSUrl,
|
||||||
|
NKeyFile: b.cfg.NKeyFile,
|
||||||
|
Name: "homelab-deploy-builder",
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to connect to NATS: %w", err)
|
||||||
|
}
|
||||||
|
b.client = client
|
||||||
|
defer b.client.Close()
|
||||||
|
|
||||||
|
b.logger.Info("connected to NATS")
|
||||||
|
|
||||||
|
// Subscribe to build subjects for each repo
|
||||||
|
for repoName := range b.repoCfg.Repos {
|
||||||
|
// Subscribe to build.<repo>.all and build.<repo>.<hostname>
|
||||||
|
allSubject := fmt.Sprintf("build.%s.*", repoName)
|
||||||
|
b.logger.Info("subscribing to build subject", "subject", allSubject)
|
||||||
|
if _, err := b.client.Subscribe(allSubject, b.handleBuildRequest); err != nil {
|
||||||
|
return fmt.Errorf("failed to subscribe to %s: %w", allSubject, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
b.logger.Info("builder started", "repos", len(b.repoCfg.Repos))
|
||||||
|
|
||||||
|
// Wait for context cancellation
|
||||||
|
<-ctx.Done()
|
||||||
|
b.logger.Info("shutting down builder")
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (b *Builder) handleBuildRequest(subject string, data []byte) {
|
||||||
|
req, err := messages.UnmarshalBuildRequest(data)
|
||||||
|
if err != nil {
|
||||||
|
b.logger.Error("failed to unmarshal build request",
|
||||||
|
"subject", subject,
|
||||||
|
"error", err,
|
||||||
|
)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
b.logger.Info("received build request",
|
||||||
|
"subject", subject,
|
||||||
|
"repo", req.Repo,
|
||||||
|
"target", req.Target,
|
||||||
|
"branch", req.Branch,
|
||||||
|
"reply_to", req.ReplyTo,
|
||||||
|
)
|
||||||
|
|
||||||
|
// Validate request
|
||||||
|
if err := req.Validate(); err != nil {
|
||||||
|
b.logger.Warn("invalid build request", "error", err)
|
||||||
|
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||||
|
messages.BuildStatusRejected,
|
||||||
|
err.Error(),
|
||||||
|
))
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get repo config
|
||||||
|
repo, err := b.repoCfg.GetRepo(req.Repo)
|
||||||
|
if err != nil {
|
||||||
|
b.logger.Warn("unknown repo", "repo", req.Repo)
|
||||||
|
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||||
|
messages.BuildStatusRejected,
|
||||||
|
fmt.Sprintf("unknown repo: %s", req.Repo),
|
||||||
|
))
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to acquire lock
|
||||||
|
b.lock.Lock()
|
||||||
|
if b.busy {
|
||||||
|
b.lock.Unlock()
|
||||||
|
b.logger.Warn("build already in progress")
|
||||||
|
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||||
|
messages.BuildStatusRejected,
|
||||||
|
"another build is already in progress",
|
||||||
|
))
|
||||||
|
return
|
||||||
|
}
|
||||||
|
b.busy = true
|
||||||
|
b.lock.Unlock()
|
||||||
|
|
||||||
|
defer func() {
|
||||||
|
b.lock.Lock()
|
||||||
|
b.busy = false
|
||||||
|
b.lock.Unlock()
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Use default branch if not specified
|
||||||
|
branch := req.Branch
|
||||||
|
if branch == "" {
|
||||||
|
branch = repo.DefaultBranch
|
||||||
|
}
|
||||||
|
|
||||||
|
// Determine hosts to build
|
||||||
|
var hosts []string
|
||||||
|
if req.Target == "all" {
|
||||||
|
// List hosts from flake
|
||||||
|
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||||
|
messages.BuildStatusStarted,
|
||||||
|
"discovering hosts...",
|
||||||
|
))
|
||||||
|
|
||||||
|
hosts, err = b.executor.ListHosts(context.Background(), repo.URL, branch)
|
||||||
|
if err != nil {
|
||||||
|
b.logger.Error("failed to list hosts", "error", err)
|
||||||
|
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||||
|
messages.BuildStatusFailed,
|
||||||
|
fmt.Sprintf("failed to list hosts: %v", err),
|
||||||
|
).WithError(err.Error()))
|
||||||
|
if b.metrics != nil {
|
||||||
|
b.metrics.RecordBuildFailure(req.Repo, "")
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// Sort hosts for consistent ordering
|
||||||
|
sort.Strings(hosts)
|
||||||
|
} else {
|
||||||
|
hosts = []string{req.Target}
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(hosts) == 0 {
|
||||||
|
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||||
|
messages.BuildStatusFailed,
|
||||||
|
"no hosts to build",
|
||||||
|
))
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send started response
|
||||||
|
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
|
||||||
|
Status: messages.BuildStatusStarted,
|
||||||
|
Message: fmt.Sprintf("building %d host(s)", len(hosts)),
|
||||||
|
HostsTotal: len(hosts),
|
||||||
|
})
|
||||||
|
|
||||||
|
// Build each host sequentially
|
||||||
|
startTime := time.Now()
|
||||||
|
results := make([]messages.BuildHostResult, 0, len(hosts))
|
||||||
|
succeeded := 0
|
||||||
|
failed := 0
|
||||||
|
|
||||||
|
for i, host := range hosts {
|
||||||
|
hostStart := time.Now()
|
||||||
|
b.logger.Info("building host",
|
||||||
|
"host", host,
|
||||||
|
"progress", fmt.Sprintf("%d/%d", i+1, len(hosts)),
|
||||||
|
"command", b.executor.BuildCommand(repo.URL, branch, host),
|
||||||
|
)
|
||||||
|
|
||||||
|
result := b.executor.Build(context.Background(), repo.URL, branch, host)
|
||||||
|
hostDuration := time.Since(hostStart).Seconds()
|
||||||
|
|
||||||
|
hostResult := messages.BuildHostResult{
|
||||||
|
Host: host,
|
||||||
|
Success: result.Success,
|
||||||
|
DurationSeconds: hostDuration,
|
||||||
|
}
|
||||||
|
if !result.Success {
|
||||||
|
hostResult.Error = result.Stderr
|
||||||
|
if hostResult.Error == "" && result.Error != nil {
|
||||||
|
hostResult.Error = result.Error.Error()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
results = append(results, hostResult)
|
||||||
|
|
||||||
|
if result.Success {
|
||||||
|
succeeded++
|
||||||
|
b.logger.Info("host build succeeded", "host", host, "duration", hostDuration)
|
||||||
|
if b.metrics != nil {
|
||||||
|
b.metrics.RecordHostBuildSuccess(req.Repo, host, hostDuration)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
failed++
|
||||||
|
b.logger.Error("host build failed", "host", host, "error", hostResult.Error)
|
||||||
|
if b.metrics != nil {
|
||||||
|
b.metrics.RecordHostBuildFailure(req.Repo, host, hostDuration)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send progress update
|
||||||
|
success := result.Success
|
||||||
|
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
|
||||||
|
Status: messages.BuildStatusProgress,
|
||||||
|
Host: host,
|
||||||
|
HostSuccess: &success,
|
||||||
|
HostsCompleted: i + 1,
|
||||||
|
HostsTotal: len(hosts),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
totalDuration := time.Since(startTime).Seconds()
|
||||||
|
|
||||||
|
// Send final response
|
||||||
|
status := messages.BuildStatusCompleted
|
||||||
|
message := fmt.Sprintf("built %d/%d hosts successfully", succeeded, len(hosts))
|
||||||
|
if failed > 0 {
|
||||||
|
status = messages.BuildStatusFailed
|
||||||
|
message = fmt.Sprintf("build failed: %d/%d hosts failed", failed, len(hosts))
|
||||||
|
}
|
||||||
|
|
||||||
|
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
|
||||||
|
Status: status,
|
||||||
|
Message: message,
|
||||||
|
Results: results,
|
||||||
|
TotalDurationSeconds: totalDuration,
|
||||||
|
Succeeded: succeeded,
|
||||||
|
Failed: failed,
|
||||||
|
})
|
||||||
|
|
||||||
|
// Record overall build metrics
|
||||||
|
if b.metrics != nil {
|
||||||
|
if failed == 0 {
|
||||||
|
b.metrics.RecordBuildSuccess(req.Repo)
|
||||||
|
} else {
|
||||||
|
b.metrics.RecordBuildFailure(req.Repo, "")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (b *Builder) sendResponse(replyTo string, resp *messages.BuildResponse) {
|
||||||
|
data, err := resp.Marshal()
|
||||||
|
if err != nil {
|
||||||
|
b.logger.Error("failed to marshal build response", "error", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := b.client.Publish(replyTo, data); err != nil {
|
||||||
|
b.logger.Error("failed to publish build response",
|
||||||
|
"reply_to", replyTo,
|
||||||
|
"error", err,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Flush to ensure response is sent immediately
|
||||||
|
if err := b.client.Flush(); err != nil {
|
||||||
|
b.logger.Error("failed to flush", "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
65
internal/builder/config.go
Normal file
65
internal/builder/config.go
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
package builder
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"gopkg.in/yaml.v3"
|
||||||
|
)
|
||||||
|
|
||||||
|
// RepoConfig holds configuration for a single repository.
|
||||||
|
type RepoConfig struct {
|
||||||
|
URL string `yaml:"url"`
|
||||||
|
DefaultBranch string `yaml:"default_branch"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Config holds the builder configuration.
|
||||||
|
type Config struct {
|
||||||
|
Repos map[string]RepoConfig `yaml:"repos"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// LoadConfig loads configuration from a YAML file.
|
||||||
|
func LoadConfig(path string) (*Config, error) {
|
||||||
|
data, err := os.ReadFile(path)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to read config file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var cfg Config
|
||||||
|
if err := yaml.Unmarshal(data, &cfg); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse config file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := cfg.Validate(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
return &cfg, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate checks that the configuration is valid.
|
||||||
|
func (c *Config) Validate() error {
|
||||||
|
if len(c.Repos) == 0 {
|
||||||
|
return fmt.Errorf("no repos configured")
|
||||||
|
}
|
||||||
|
|
||||||
|
for name, repo := range c.Repos {
|
||||||
|
if repo.URL == "" {
|
||||||
|
return fmt.Errorf("repo %q: url is required", name)
|
||||||
|
}
|
||||||
|
if repo.DefaultBranch == "" {
|
||||||
|
return fmt.Errorf("repo %q: default_branch is required", name)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetRepo returns the configuration for a repository, or an error if not found.
|
||||||
|
func (c *Config) GetRepo(name string) (*RepoConfig, error) {
|
||||||
|
repo, ok := c.Repos[name]
|
||||||
|
if !ok {
|
||||||
|
return nil, fmt.Errorf("repo %q not found in configuration", name)
|
||||||
|
}
|
||||||
|
return &repo, nil
|
||||||
|
}
|
||||||
116
internal/builder/executor.go
Normal file
116
internal/builder/executor.go
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
package builder
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"os/exec"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Executor handles the execution of nix build commands.
|
||||||
|
type Executor struct {
|
||||||
|
timeout time.Duration
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewExecutor creates a new build executor.
|
||||||
|
func NewExecutor(timeout time.Duration) *Executor {
|
||||||
|
return &Executor{
|
||||||
|
timeout: timeout,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// BuildResult contains the result of a build execution.
|
||||||
|
type BuildResult struct {
|
||||||
|
Success bool
|
||||||
|
ExitCode int
|
||||||
|
Stdout string
|
||||||
|
Stderr string
|
||||||
|
Error error
|
||||||
|
}
|
||||||
|
|
||||||
|
// FlakeShowResult contains the parsed output of nix flake show.
|
||||||
|
type FlakeShowResult struct {
|
||||||
|
NixosConfigurations map[string]any `json:"nixosConfigurations"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListHosts returns the list of hosts (nixosConfigurations) available in a flake.
|
||||||
|
func (e *Executor) ListHosts(ctx context.Context, flakeURL, branch string) ([]string, error) {
|
||||||
|
ctx, cancel := context.WithTimeout(ctx, 60*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
flakeRef := fmt.Sprintf("%s?ref=%s", flakeURL, branch)
|
||||||
|
cmd := exec.CommandContext(ctx, "nix", "flake", "show", "--json", flakeRef)
|
||||||
|
|
||||||
|
var stdout, stderr bytes.Buffer
|
||||||
|
cmd.Stdout = &stdout
|
||||||
|
cmd.Stderr = &stderr
|
||||||
|
|
||||||
|
if err := cmd.Run(); err != nil {
|
||||||
|
if ctx.Err() == context.DeadlineExceeded {
|
||||||
|
return nil, fmt.Errorf("timeout listing hosts")
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("failed to list hosts: %w\n%s", err, stderr.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
var result FlakeShowResult
|
||||||
|
if err := json.Unmarshal(stdout.Bytes(), &result); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse flake show output: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
hosts := make([]string, 0, len(result.NixosConfigurations))
|
||||||
|
for host := range result.NixosConfigurations {
|
||||||
|
hosts = append(hosts, host)
|
||||||
|
}
|
||||||
|
|
||||||
|
return hosts, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build builds a single host's system configuration.
|
||||||
|
func (e *Executor) Build(ctx context.Context, flakeURL, branch, host string) *BuildResult {
|
||||||
|
ctx, cancel := context.WithTimeout(ctx, e.timeout)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
// Build the flake reference for the system toplevel
|
||||||
|
flakeRef := fmt.Sprintf("%s?ref=%s#nixosConfigurations.%s.config.system.build.toplevel", flakeURL, branch, host)
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, "nix", "build", "--no-link", flakeRef)
|
||||||
|
|
||||||
|
var stdout, stderr bytes.Buffer
|
||||||
|
cmd.Stdout = &stdout
|
||||||
|
cmd.Stderr = &stderr
|
||||||
|
|
||||||
|
err := cmd.Run()
|
||||||
|
|
||||||
|
result := &BuildResult{
|
||||||
|
Stdout: stdout.String(),
|
||||||
|
Stderr: stderr.String(),
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
result.Success = false
|
||||||
|
result.Error = err
|
||||||
|
|
||||||
|
if ctx.Err() == context.DeadlineExceeded {
|
||||||
|
result.Error = fmt.Errorf("build timed out after %v", e.timeout)
|
||||||
|
}
|
||||||
|
|
||||||
|
if exitErr, ok := err.(*exec.ExitError); ok {
|
||||||
|
result.ExitCode = exitErr.ExitCode()
|
||||||
|
} else {
|
||||||
|
result.ExitCode = -1
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
result.Success = true
|
||||||
|
result.ExitCode = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
return result
|
||||||
|
}
|
||||||
|
|
||||||
|
// BuildCommand returns the command that would be executed (for logging/debugging).
|
||||||
|
func (e *Executor) BuildCommand(flakeURL, branch, host string) string {
|
||||||
|
flakeRef := fmt.Sprintf("%s?ref=%s#nixosConfigurations.%s.config.system.build.toplevel", flakeURL, branch, host)
|
||||||
|
return fmt.Sprintf("nix build --no-link %s", flakeRef)
|
||||||
|
}
|
||||||
140
internal/cli/build.go
Normal file
140
internal/cli/build.go
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/google/uuid"
|
||||||
|
|
||||||
|
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||||
|
"git.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BuildConfig holds configuration for a build operation.
|
||||||
|
type BuildConfig struct {
|
||||||
|
NATSUrl string
|
||||||
|
NKeyFile string
|
||||||
|
Repo string
|
||||||
|
Target string
|
||||||
|
Branch string
|
||||||
|
Timeout time.Duration
|
||||||
|
}
|
||||||
|
|
||||||
|
// BuildResult contains the aggregated results from a build.
|
||||||
|
type BuildResult struct {
|
||||||
|
Responses []*messages.BuildResponse
|
||||||
|
FinalResponse *messages.BuildResponse
|
||||||
|
Errors []error
|
||||||
|
}
|
||||||
|
|
||||||
|
// AllSucceeded returns true if the build completed successfully.
|
||||||
|
func (r *BuildResult) AllSucceeded() bool {
|
||||||
|
if len(r.Errors) > 0 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
if r.FinalResponse == nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
return r.FinalResponse.Status == messages.BuildStatusCompleted && r.FinalResponse.Failed == 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// MarshalJSON returns the JSON representation of the build result.
|
||||||
|
func (r *BuildResult) MarshalJSON() ([]byte, error) {
|
||||||
|
if r.FinalResponse != nil {
|
||||||
|
return json.Marshal(r.FinalResponse)
|
||||||
|
}
|
||||||
|
return json.Marshal(map[string]any{
|
||||||
|
"status": "unknown",
|
||||||
|
"responses": r.Responses,
|
||||||
|
"errors": r.Errors,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build triggers a build and collects responses.
|
||||||
|
func Build(ctx context.Context, cfg BuildConfig, onResponse func(*messages.BuildResponse)) (*BuildResult, error) {
|
||||||
|
// Connect to NATS
|
||||||
|
client, err := nats.Connect(nats.Config{
|
||||||
|
URL: cfg.NATSUrl,
|
||||||
|
NKeyFile: cfg.NKeyFile,
|
||||||
|
Name: "homelab-deploy-build-cli",
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to connect to NATS: %w", err)
|
||||||
|
}
|
||||||
|
defer client.Close()
|
||||||
|
|
||||||
|
// Generate unique reply subject
|
||||||
|
requestID := uuid.New().String()
|
||||||
|
replySubject := fmt.Sprintf("build.responses.%s", requestID)
|
||||||
|
|
||||||
|
var mu sync.Mutex
|
||||||
|
result := &BuildResult{}
|
||||||
|
done := make(chan struct{})
|
||||||
|
|
||||||
|
// Subscribe to reply subject
|
||||||
|
sub, err := client.Subscribe(replySubject, func(subject string, data []byte) {
|
||||||
|
resp, err := messages.UnmarshalBuildResponse(data)
|
||||||
|
if err != nil {
|
||||||
|
mu.Lock()
|
||||||
|
result.Errors = append(result.Errors, fmt.Errorf("failed to unmarshal response: %w", err))
|
||||||
|
mu.Unlock()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
mu.Lock()
|
||||||
|
result.Responses = append(result.Responses, resp)
|
||||||
|
if resp.Status.IsFinal() {
|
||||||
|
result.FinalResponse = resp
|
||||||
|
select {
|
||||||
|
case <-done:
|
||||||
|
default:
|
||||||
|
close(done)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
mu.Unlock()
|
||||||
|
|
||||||
|
if onResponse != nil {
|
||||||
|
onResponse(resp)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to subscribe to reply subject: %w", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = sub.Unsubscribe() }()
|
||||||
|
|
||||||
|
// Build and send request
|
||||||
|
req := &messages.BuildRequest{
|
||||||
|
Repo: cfg.Repo,
|
||||||
|
Target: cfg.Target,
|
||||||
|
Branch: cfg.Branch,
|
||||||
|
ReplyTo: replySubject,
|
||||||
|
}
|
||||||
|
|
||||||
|
data, err := req.Marshal()
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to marshal request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Publish to build.<repo>.<target>
|
||||||
|
buildSubject := fmt.Sprintf("build.%s.%s", cfg.Repo, cfg.Target)
|
||||||
|
if err := client.Publish(buildSubject, data); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to publish request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := client.Flush(); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to flush: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for final response or timeout
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
return result, ctx.Err()
|
||||||
|
case <-done:
|
||||||
|
return result, nil
|
||||||
|
case <-time.After(cfg.Timeout):
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
}
|
||||||
109
internal/mcp/build_tools.go
Normal file
109
internal/mcp/build_tools.go
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
package mcp
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/mark3labs/mcp-go/mcp"
|
||||||
|
|
||||||
|
deploycli "git.t-juice.club/torjus/homelab-deploy/internal/cli"
|
||||||
|
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BuildTool creates the build tool definition.
|
||||||
|
func BuildTool() mcp.Tool {
|
||||||
|
return mcp.NewTool(
|
||||||
|
"build",
|
||||||
|
mcp.WithDescription("Trigger a Nix build on the build server"),
|
||||||
|
mcp.WithString("repo",
|
||||||
|
mcp.Required(),
|
||||||
|
mcp.Description("Repository name (must match builder config)"),
|
||||||
|
),
|
||||||
|
mcp.WithString("target",
|
||||||
|
mcp.Description("Target hostname, or omit to build all hosts"),
|
||||||
|
),
|
||||||
|
mcp.WithBoolean("all",
|
||||||
|
mcp.Description("Build all hosts in the repository (default if no target specified)"),
|
||||||
|
),
|
||||||
|
mcp.WithString("branch",
|
||||||
|
mcp.Description("Git branch to build (uses repo default if not specified)"),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// HandleBuild handles the build tool.
|
||||||
|
func (h *ToolHandler) HandleBuild(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
|
||||||
|
repo, err := request.RequireString("repo")
|
||||||
|
if err != nil {
|
||||||
|
return mcp.NewToolResultError("repo is required"), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
target := request.GetString("target", "")
|
||||||
|
all := request.GetBool("all", false)
|
||||||
|
branch := request.GetString("branch", "")
|
||||||
|
|
||||||
|
// Default to "all" if no target specified
|
||||||
|
if target == "" {
|
||||||
|
if !all {
|
||||||
|
all = true
|
||||||
|
}
|
||||||
|
target = "all"
|
||||||
|
}
|
||||||
|
if all && target != "all" {
|
||||||
|
return mcp.NewToolResultError("cannot specify both target and all"), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
cfg := deploycli.BuildConfig{
|
||||||
|
NATSUrl: h.cfg.NATSUrl,
|
||||||
|
NKeyFile: h.cfg.NKeyFile,
|
||||||
|
Repo: repo,
|
||||||
|
Target: target,
|
||||||
|
Branch: branch,
|
||||||
|
Timeout: h.cfg.Timeout,
|
||||||
|
}
|
||||||
|
|
||||||
|
var output strings.Builder
|
||||||
|
branchStr := branch
|
||||||
|
if branchStr == "" {
|
||||||
|
branchStr = "(default)"
|
||||||
|
}
|
||||||
|
output.WriteString(fmt.Sprintf("Building %s target=%s branch=%s\n\n", repo, target, branchStr))
|
||||||
|
|
||||||
|
result, err := deploycli.Build(ctx, cfg, func(resp *messages.BuildResponse) {
|
||||||
|
switch resp.Status {
|
||||||
|
case messages.BuildStatusStarted:
|
||||||
|
output.WriteString(fmt.Sprintf("Started: %s\n", resp.Message))
|
||||||
|
case messages.BuildStatusProgress:
|
||||||
|
successStr := "..."
|
||||||
|
if resp.HostSuccess != nil {
|
||||||
|
if *resp.HostSuccess {
|
||||||
|
successStr = "success"
|
||||||
|
} else {
|
||||||
|
successStr = "failed"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
output.WriteString(fmt.Sprintf("[%d/%d] %s: %s\n", resp.HostsCompleted, resp.HostsTotal, resp.Host, successStr))
|
||||||
|
case messages.BuildStatusCompleted, messages.BuildStatusFailed:
|
||||||
|
output.WriteString(fmt.Sprintf("\n%s\n", resp.Message))
|
||||||
|
case messages.BuildStatusRejected:
|
||||||
|
output.WriteString(fmt.Sprintf("Rejected: %s\n", resp.Message))
|
||||||
|
}
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return mcp.NewToolResultError(fmt.Sprintf("build failed: %v", err)), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
if result.FinalResponse != nil {
|
||||||
|
output.WriteString(fmt.Sprintf("\nBuild complete: %d succeeded, %d failed (%.1fs)\n",
|
||||||
|
result.FinalResponse.Succeeded,
|
||||||
|
result.FinalResponse.Failed,
|
||||||
|
result.FinalResponse.TotalDurationSeconds))
|
||||||
|
}
|
||||||
|
|
||||||
|
if !result.AllSucceeded() {
|
||||||
|
output.WriteString("WARNING: Some builds failed\n")
|
||||||
|
}
|
||||||
|
|
||||||
|
return mcp.NewToolResultText(output.String()), nil
|
||||||
|
}
|
||||||
@@ -12,6 +12,7 @@ type ServerConfig struct {
|
|||||||
NKeyFile string
|
NKeyFile string
|
||||||
EnableAdmin bool
|
EnableAdmin bool
|
||||||
AdminNKeyFile string
|
AdminNKeyFile string
|
||||||
|
EnableBuilds bool
|
||||||
DiscoverSubject string
|
DiscoverSubject string
|
||||||
Timeout time.Duration
|
Timeout time.Duration
|
||||||
}
|
}
|
||||||
@@ -49,6 +50,11 @@ func New(cfg ServerConfig) *Server {
|
|||||||
s.AddTool(DeployAdminTool(), handler.HandleDeployAdmin)
|
s.AddTool(DeployAdminTool(), handler.HandleDeployAdmin)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Optionally register build tool
|
||||||
|
if cfg.EnableBuilds {
|
||||||
|
s.AddTool(BuildTool(), handler.HandleBuild)
|
||||||
|
}
|
||||||
|
|
||||||
return &Server{
|
return &Server{
|
||||||
cfg: cfg,
|
cfg: cfg,
|
||||||
server: s,
|
server: s,
|
||||||
|
|||||||
125
internal/messages/build.go
Normal file
125
internal/messages/build.go
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
package messages
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BuildStatus represents the status of a build response.
|
||||||
|
type BuildStatus string
|
||||||
|
|
||||||
|
const (
|
||||||
|
BuildStatusStarted BuildStatus = "started"
|
||||||
|
BuildStatusProgress BuildStatus = "progress"
|
||||||
|
BuildStatusCompleted BuildStatus = "completed"
|
||||||
|
BuildStatusFailed BuildStatus = "failed"
|
||||||
|
BuildStatusRejected BuildStatus = "rejected"
|
||||||
|
)
|
||||||
|
|
||||||
|
// IsFinal returns true if this status indicates a terminal state.
|
||||||
|
func (s BuildStatus) IsFinal() bool {
|
||||||
|
switch s {
|
||||||
|
case BuildStatusCompleted, BuildStatusFailed, BuildStatusRejected:
|
||||||
|
return true
|
||||||
|
default:
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// BuildRequest is the message sent to request a build.
|
||||||
|
type BuildRequest struct {
|
||||||
|
Repo string `json:"repo"` // Must match config
|
||||||
|
Target string `json:"target"` // Hostname or "all"
|
||||||
|
Branch string `json:"branch,omitempty"` // Optional, uses repo default
|
||||||
|
ReplyTo string `json:"reply_to"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate checks that the request is valid.
|
||||||
|
func (r *BuildRequest) Validate() error {
|
||||||
|
if r.Repo == "" {
|
||||||
|
return fmt.Errorf("repo is required")
|
||||||
|
}
|
||||||
|
if !revisionRegex.MatchString(r.Repo) {
|
||||||
|
return fmt.Errorf("invalid repo name format: %q", r.Repo)
|
||||||
|
}
|
||||||
|
if r.Target == "" {
|
||||||
|
return fmt.Errorf("target is required")
|
||||||
|
}
|
||||||
|
if r.Branch != "" && !revisionRegex.MatchString(r.Branch) {
|
||||||
|
return fmt.Errorf("invalid branch format: %q", r.Branch)
|
||||||
|
}
|
||||||
|
if r.ReplyTo == "" {
|
||||||
|
return fmt.Errorf("reply_to is required")
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Marshal serializes the request to JSON.
|
||||||
|
func (r *BuildRequest) Marshal() ([]byte, error) {
|
||||||
|
return json.Marshal(r)
|
||||||
|
}
|
||||||
|
|
||||||
|
// UnmarshalBuildRequest deserializes a request from JSON.
|
||||||
|
func UnmarshalBuildRequest(data []byte) (*BuildRequest, error) {
|
||||||
|
var r BuildRequest
|
||||||
|
if err := json.Unmarshal(data, &r); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to unmarshal build request: %w", err)
|
||||||
|
}
|
||||||
|
return &r, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// BuildHostResult contains the result of building a single host.
|
||||||
|
type BuildHostResult struct {
|
||||||
|
Host string `json:"host"`
|
||||||
|
Success bool `json:"success"`
|
||||||
|
Error string `json:"error,omitempty"`
|
||||||
|
DurationSeconds float64 `json:"duration_seconds"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// BuildResponse is the message sent in response to a build request.
|
||||||
|
type BuildResponse struct {
|
||||||
|
Status BuildStatus `json:"status"`
|
||||||
|
Message string `json:"message,omitempty"`
|
||||||
|
|
||||||
|
// Progress updates
|
||||||
|
Host string `json:"host,omitempty"`
|
||||||
|
HostSuccess *bool `json:"host_success,omitempty"`
|
||||||
|
HostsCompleted int `json:"hosts_completed,omitempty"`
|
||||||
|
HostsTotal int `json:"hosts_total,omitempty"`
|
||||||
|
|
||||||
|
// Final response
|
||||||
|
Results []BuildHostResult `json:"results,omitempty"`
|
||||||
|
TotalDurationSeconds float64 `json:"total_duration_seconds,omitempty"`
|
||||||
|
Succeeded int `json:"succeeded,omitempty"`
|
||||||
|
Failed int `json:"failed,omitempty"`
|
||||||
|
|
||||||
|
Error string `json:"error,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewBuildResponse creates a new response with the given status and message.
|
||||||
|
func NewBuildResponse(status BuildStatus, message string) *BuildResponse {
|
||||||
|
return &BuildResponse{
|
||||||
|
Status: status,
|
||||||
|
Message: message,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// WithError adds an error message to the response.
|
||||||
|
func (r *BuildResponse) WithError(err string) *BuildResponse {
|
||||||
|
r.Error = err
|
||||||
|
return r
|
||||||
|
}
|
||||||
|
|
||||||
|
// Marshal serializes the response to JSON.
|
||||||
|
func (r *BuildResponse) Marshal() ([]byte, error) {
|
||||||
|
return json.Marshal(r)
|
||||||
|
}
|
||||||
|
|
||||||
|
// UnmarshalBuildResponse deserializes a response from JSON.
|
||||||
|
func UnmarshalBuildResponse(data []byte) (*BuildResponse, error) {
|
||||||
|
var r BuildResponse
|
||||||
|
if err := json.Unmarshal(data, &r); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to unmarshal build response: %w", err)
|
||||||
|
}
|
||||||
|
return &r, nil
|
||||||
|
}
|
||||||
99
internal/metrics/build_metrics.go
Normal file
99
internal/metrics/build_metrics.go
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
package metrics
|
||||||
|
|
||||||
|
import (
|
||||||
|
"github.com/prometheus/client_golang/prometheus"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BuildCollector holds all Prometheus metrics for the builder.
|
||||||
|
type BuildCollector struct {
|
||||||
|
buildsTotal *prometheus.CounterVec
|
||||||
|
buildHostTotal *prometheus.CounterVec
|
||||||
|
buildDuration *prometheus.HistogramVec
|
||||||
|
buildLastTimestamp *prometheus.GaugeVec
|
||||||
|
buildLastSuccessTime *prometheus.GaugeVec
|
||||||
|
buildLastFailureTime *prometheus.GaugeVec
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewBuildCollector creates a new build metrics collector and registers it with the given registerer.
|
||||||
|
func NewBuildCollector(reg prometheus.Registerer) *BuildCollector {
|
||||||
|
c := &BuildCollector{
|
||||||
|
buildsTotal: prometheus.NewCounterVec(
|
||||||
|
prometheus.CounterOpts{
|
||||||
|
Name: "homelab_deploy_builds_total",
|
||||||
|
Help: "Total builds processed",
|
||||||
|
},
|
||||||
|
[]string{"repo", "status"},
|
||||||
|
),
|
||||||
|
buildHostTotal: prometheus.NewCounterVec(
|
||||||
|
prometheus.CounterOpts{
|
||||||
|
Name: "homelab_deploy_build_host_total",
|
||||||
|
Help: "Total host builds processed",
|
||||||
|
},
|
||||||
|
[]string{"repo", "host", "status"},
|
||||||
|
),
|
||||||
|
buildDuration: prometheus.NewHistogramVec(
|
||||||
|
prometheus.HistogramOpts{
|
||||||
|
Name: "homelab_deploy_build_duration_seconds",
|
||||||
|
Help: "Build execution time per host",
|
||||||
|
Buckets: []float64{30, 60, 120, 300, 600, 900, 1200, 1800, 3600},
|
||||||
|
},
|
||||||
|
[]string{"repo", "host"},
|
||||||
|
),
|
||||||
|
buildLastTimestamp: prometheus.NewGaugeVec(
|
||||||
|
prometheus.GaugeOpts{
|
||||||
|
Name: "homelab_deploy_build_last_timestamp",
|
||||||
|
Help: "Timestamp of last build attempt",
|
||||||
|
},
|
||||||
|
[]string{"repo"},
|
||||||
|
),
|
||||||
|
buildLastSuccessTime: prometheus.NewGaugeVec(
|
||||||
|
prometheus.GaugeOpts{
|
||||||
|
Name: "homelab_deploy_build_last_success_timestamp",
|
||||||
|
Help: "Timestamp of last successful build",
|
||||||
|
},
|
||||||
|
[]string{"repo"},
|
||||||
|
),
|
||||||
|
buildLastFailureTime: prometheus.NewGaugeVec(
|
||||||
|
prometheus.GaugeOpts{
|
||||||
|
Name: "homelab_deploy_build_last_failure_timestamp",
|
||||||
|
Help: "Timestamp of last failed build",
|
||||||
|
},
|
||||||
|
[]string{"repo"},
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
reg.MustRegister(c.buildsTotal)
|
||||||
|
reg.MustRegister(c.buildHostTotal)
|
||||||
|
reg.MustRegister(c.buildDuration)
|
||||||
|
reg.MustRegister(c.buildLastTimestamp)
|
||||||
|
reg.MustRegister(c.buildLastSuccessTime)
|
||||||
|
reg.MustRegister(c.buildLastFailureTime)
|
||||||
|
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
|
||||||
|
// RecordBuildSuccess records a successful build.
|
||||||
|
func (c *BuildCollector) RecordBuildSuccess(repo string) {
|
||||||
|
c.buildsTotal.WithLabelValues(repo, "success").Inc()
|
||||||
|
c.buildLastTimestamp.WithLabelValues(repo).SetToCurrentTime()
|
||||||
|
c.buildLastSuccessTime.WithLabelValues(repo).SetToCurrentTime()
|
||||||
|
}
|
||||||
|
|
||||||
|
// RecordBuildFailure records a failed build.
|
||||||
|
func (c *BuildCollector) RecordBuildFailure(repo, errorCode string) {
|
||||||
|
c.buildsTotal.WithLabelValues(repo, "failure").Inc()
|
||||||
|
c.buildLastTimestamp.WithLabelValues(repo).SetToCurrentTime()
|
||||||
|
c.buildLastFailureTime.WithLabelValues(repo).SetToCurrentTime()
|
||||||
|
}
|
||||||
|
|
||||||
|
// RecordHostBuildSuccess records a successful host build.
|
||||||
|
func (c *BuildCollector) RecordHostBuildSuccess(repo, host string, durationSeconds float64) {
|
||||||
|
c.buildHostTotal.WithLabelValues(repo, host, "success").Inc()
|
||||||
|
c.buildDuration.WithLabelValues(repo, host).Observe(durationSeconds)
|
||||||
|
}
|
||||||
|
|
||||||
|
// RecordHostBuildFailure records a failed host build.
|
||||||
|
func (c *BuildCollector) RecordHostBuildFailure(repo, host string, durationSeconds float64) {
|
||||||
|
c.buildHostTotal.WithLabelValues(repo, host, "failure").Inc()
|
||||||
|
c.buildDuration.WithLabelValues(repo, host).Observe(durationSeconds)
|
||||||
|
}
|
||||||
@@ -74,6 +74,11 @@ func (s *Server) Collector() *Collector {
|
|||||||
return s.collector
|
return s.collector
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Registry returns the Prometheus registry.
|
||||||
|
func (s *Server) Registry() *prometheus.Registry {
|
||||||
|
return s.registry
|
||||||
|
}
|
||||||
|
|
||||||
// ScrapeCh returns a channel that receives a signal each time the metrics endpoint is scraped.
|
// ScrapeCh returns a channel that receives a signal each time the metrics endpoint is scraped.
|
||||||
func (s *Server) ScrapeCh() <-chan struct{} {
|
func (s *Server) ScrapeCh() <-chan struct{} {
|
||||||
return s.scrapeCh
|
return s.scrapeCh
|
||||||
|
|||||||
210
nixos/module.nix
210
nixos/module.nix
@@ -2,32 +2,47 @@
|
|||||||
{ config, lib, pkgs, ... }:
|
{ config, lib, pkgs, ... }:
|
||||||
|
|
||||||
let
|
let
|
||||||
cfg = config.services.homelab-deploy.listener;
|
listenerCfg = config.services.homelab-deploy.listener;
|
||||||
|
builderCfg = config.services.homelab-deploy.builder;
|
||||||
|
|
||||||
# Build command line arguments from configuration
|
# Build command line arguments for listener from configuration
|
||||||
args = lib.concatStringsSep " " ([
|
listenerArgs = lib.concatStringsSep " " ([
|
||||||
"--hostname ${lib.escapeShellArg cfg.hostname}"
|
"--hostname ${lib.escapeShellArg listenerCfg.hostname}"
|
||||||
"--tier ${cfg.tier}"
|
"--tier ${listenerCfg.tier}"
|
||||||
"--nats-url ${lib.escapeShellArg cfg.natsUrl}"
|
"--nats-url ${lib.escapeShellArg listenerCfg.natsUrl}"
|
||||||
"--nkey-file ${lib.escapeShellArg cfg.nkeyFile}"
|
"--nkey-file ${lib.escapeShellArg listenerCfg.nkeyFile}"
|
||||||
"--flake-url ${lib.escapeShellArg cfg.flakeUrl}"
|
"--flake-url ${lib.escapeShellArg listenerCfg.flakeUrl}"
|
||||||
"--timeout ${toString cfg.timeout}"
|
"--timeout ${toString listenerCfg.timeout}"
|
||||||
"--discover-subject ${lib.escapeShellArg cfg.discoverSubject}"
|
"--discover-subject ${lib.escapeShellArg listenerCfg.discoverSubject}"
|
||||||
]
|
]
|
||||||
++ lib.optional (cfg.role != null) "--role ${lib.escapeShellArg cfg.role}"
|
++ lib.optional (listenerCfg.role != null) "--role ${lib.escapeShellArg listenerCfg.role}"
|
||||||
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") cfg.deploySubjects
|
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") listenerCfg.deploySubjects
|
||||||
++ lib.optionals cfg.metrics.enable [
|
++ lib.optionals listenerCfg.metrics.enable [
|
||||||
"--metrics-enabled"
|
"--metrics-enabled"
|
||||||
"--metrics-addr ${lib.escapeShellArg cfg.metrics.address}"
|
"--metrics-addr ${lib.escapeShellArg listenerCfg.metrics.address}"
|
||||||
|
]);
|
||||||
|
|
||||||
|
# Build command line arguments for builder from configuration
|
||||||
|
builderArgs = lib.concatStringsSep " " ([
|
||||||
|
"--nats-url ${lib.escapeShellArg builderCfg.natsUrl}"
|
||||||
|
"--nkey-file ${lib.escapeShellArg builderCfg.nkeyFile}"
|
||||||
|
"--config ${lib.escapeShellArg builderCfg.configFile}"
|
||||||
|
"--timeout ${toString builderCfg.timeout}"
|
||||||
|
]
|
||||||
|
++ lib.optionals builderCfg.metrics.enable [
|
||||||
|
"--metrics-enabled"
|
||||||
|
"--metrics-addr ${lib.escapeShellArg builderCfg.metrics.address}"
|
||||||
]);
|
]);
|
||||||
|
|
||||||
# Extract port from metrics address for firewall rule
|
# Extract port from metrics address for firewall rule
|
||||||
metricsPort = let
|
extractPort = addr: let
|
||||||
addr = cfg.metrics.address;
|
|
||||||
# Handle both ":9972" and "0.0.0.0:9972" formats
|
# Handle both ":9972" and "0.0.0.0:9972" formats
|
||||||
parts = lib.splitString ":" addr;
|
parts = lib.splitString ":" addr;
|
||||||
in lib.toInt (lib.last parts);
|
in lib.toInt (lib.last parts);
|
||||||
|
|
||||||
|
listenerMetricsPort = extractPort listenerCfg.metrics.address;
|
||||||
|
builderMetricsPort = extractPort builderCfg.metrics.address;
|
||||||
|
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
options.services.homelab-deploy.listener = {
|
options.services.homelab-deploy.listener = {
|
||||||
@@ -124,43 +139,136 @@ in
|
|||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
config = lib.mkIf cfg.enable {
|
options.services.homelab-deploy.builder = {
|
||||||
systemd.services.homelab-deploy-listener = {
|
enable = lib.mkEnableOption "homelab-deploy builder service";
|
||||||
description = "homelab-deploy listener";
|
|
||||||
wantedBy = [ "multi-user.target" ];
|
|
||||||
after = [ "network-online.target" ];
|
|
||||||
wants = [ "network-online.target" ];
|
|
||||||
|
|
||||||
# Prevent self-interruption during nixos-rebuild switch
|
package = lib.mkOption {
|
||||||
# The service will continue running the old version until manually restarted
|
type = lib.types.package;
|
||||||
stopIfChanged = false;
|
default = self.packages.${pkgs.system}.homelab-deploy;
|
||||||
restartIfChanged = false;
|
description = "The homelab-deploy package to use";
|
||||||
|
|
||||||
environment = cfg.environment // {
|
|
||||||
# Nix needs a writable cache for git flake fetching
|
|
||||||
XDG_CACHE_HOME = "/var/cache/homelab-deploy";
|
|
||||||
};
|
|
||||||
|
|
||||||
path = [ pkgs.git config.system.build.nixos-rebuild ];
|
|
||||||
|
|
||||||
serviceConfig = {
|
|
||||||
CacheDirectory = "homelab-deploy";
|
|
||||||
Type = "simple";
|
|
||||||
ExecStart = "${cfg.package}/bin/homelab-deploy listener ${args}";
|
|
||||||
Restart = "always";
|
|
||||||
RestartSec = 10;
|
|
||||||
|
|
||||||
# Minimal hardening - nixos-rebuild requires broad system access:
|
|
||||||
# - Write access to /nix/store for building
|
|
||||||
# - Kernel namespace support for nix sandbox builds
|
|
||||||
# - Ability to activate system configurations
|
|
||||||
# - Network access for fetching from git/cache
|
|
||||||
# Following the approach of nixos auto-upgrade which has no hardening
|
|
||||||
};
|
|
||||||
};
|
};
|
||||||
|
|
||||||
networking.firewall.allowedTCPPorts = lib.mkIf (cfg.metrics.enable && cfg.metrics.openFirewall) [
|
natsUrl = lib.mkOption {
|
||||||
metricsPort
|
type = lib.types.str;
|
||||||
];
|
description = "NATS server URL";
|
||||||
|
example = "nats://nats.example.com:4222";
|
||||||
|
};
|
||||||
|
|
||||||
|
nkeyFile = lib.mkOption {
|
||||||
|
type = lib.types.path;
|
||||||
|
description = "Path to NKey seed file for NATS authentication";
|
||||||
|
example = "/run/secrets/homelab-deploy-builder-nkey";
|
||||||
|
};
|
||||||
|
|
||||||
|
configFile = lib.mkOption {
|
||||||
|
type = lib.types.path;
|
||||||
|
description = "Path to builder configuration file (YAML)";
|
||||||
|
example = "/etc/homelab-deploy/builder.yaml";
|
||||||
|
};
|
||||||
|
|
||||||
|
timeout = lib.mkOption {
|
||||||
|
type = lib.types.int;
|
||||||
|
default = 1800;
|
||||||
|
description = "Build timeout in seconds per host";
|
||||||
|
};
|
||||||
|
|
||||||
|
environment = lib.mkOption {
|
||||||
|
type = lib.types.attrsOf lib.types.str;
|
||||||
|
default = { };
|
||||||
|
description = "Additional environment variables for the service";
|
||||||
|
example = { GIT_SSH_COMMAND = "ssh -i /run/secrets/deploy-key"; };
|
||||||
|
};
|
||||||
|
|
||||||
|
metrics = {
|
||||||
|
enable = lib.mkEnableOption "Prometheus metrics endpoint";
|
||||||
|
|
||||||
|
address = lib.mkOption {
|
||||||
|
type = lib.types.str;
|
||||||
|
default = ":9973";
|
||||||
|
description = "Address for Prometheus metrics HTTP server";
|
||||||
|
example = "127.0.0.1:9973";
|
||||||
|
};
|
||||||
|
|
||||||
|
openFirewall = lib.mkOption {
|
||||||
|
type = lib.types.bool;
|
||||||
|
default = false;
|
||||||
|
description = "Open firewall for metrics port";
|
||||||
|
};
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
config = lib.mkMerge [
|
||||||
|
(lib.mkIf listenerCfg.enable {
|
||||||
|
systemd.services.homelab-deploy-listener = {
|
||||||
|
description = "homelab-deploy listener";
|
||||||
|
wantedBy = [ "multi-user.target" ];
|
||||||
|
after = [ "network-online.target" ];
|
||||||
|
wants = [ "network-online.target" ];
|
||||||
|
|
||||||
|
# Prevent self-interruption during nixos-rebuild switch
|
||||||
|
# The service will continue running the old version until manually restarted
|
||||||
|
stopIfChanged = false;
|
||||||
|
restartIfChanged = false;
|
||||||
|
|
||||||
|
environment = listenerCfg.environment // {
|
||||||
|
# Nix needs a writable cache for git flake fetching
|
||||||
|
XDG_CACHE_HOME = "/var/cache/homelab-deploy";
|
||||||
|
};
|
||||||
|
|
||||||
|
path = [ pkgs.git config.system.build.nixos-rebuild ];
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
CacheDirectory = "homelab-deploy";
|
||||||
|
Type = "simple";
|
||||||
|
ExecStart = "${listenerCfg.package}/bin/homelab-deploy listener ${listenerArgs}";
|
||||||
|
Restart = "always";
|
||||||
|
RestartSec = 10;
|
||||||
|
|
||||||
|
# Minimal hardening - nixos-rebuild requires broad system access:
|
||||||
|
# - Write access to /nix/store for building
|
||||||
|
# - Kernel namespace support for nix sandbox builds
|
||||||
|
# - Ability to activate system configurations
|
||||||
|
# - Network access for fetching from git/cache
|
||||||
|
# Following the approach of nixos auto-upgrade which has no hardening
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
networking.firewall.allowedTCPPorts = lib.mkIf (listenerCfg.metrics.enable && listenerCfg.metrics.openFirewall) [
|
||||||
|
listenerMetricsPort
|
||||||
|
];
|
||||||
|
})
|
||||||
|
|
||||||
|
(lib.mkIf builderCfg.enable {
|
||||||
|
systemd.services.homelab-deploy-builder = {
|
||||||
|
description = "homelab-deploy builder";
|
||||||
|
wantedBy = [ "multi-user.target" ];
|
||||||
|
after = [ "network-online.target" ];
|
||||||
|
wants = [ "network-online.target" ];
|
||||||
|
|
||||||
|
environment = builderCfg.environment // {
|
||||||
|
# Nix needs a writable cache for git flake fetching
|
||||||
|
XDG_CACHE_HOME = "/var/cache/homelab-deploy-builder";
|
||||||
|
};
|
||||||
|
|
||||||
|
path = [ pkgs.git pkgs.nix ];
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
CacheDirectory = "homelab-deploy-builder";
|
||||||
|
Type = "simple";
|
||||||
|
ExecStart = "${builderCfg.package}/bin/homelab-deploy builder ${builderArgs}";
|
||||||
|
Restart = "always";
|
||||||
|
RestartSec = 10;
|
||||||
|
|
||||||
|
# Minimal hardening - nix build requires broad system access:
|
||||||
|
# - Write access to /nix/store for building
|
||||||
|
# - Kernel namespace support for nix sandbox builds
|
||||||
|
# - Network access for fetching from git/cache
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
networking.firewall.allowedTCPPorts = lib.mkIf (builderCfg.metrics.enable && builderCfg.metrics.openFirewall) [
|
||||||
|
builderMetricsPort
|
||||||
|
];
|
||||||
|
})
|
||||||
|
];
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user