Compare commits
4 Commits
master
...
fix/metric
| Author | SHA1 | Date | |
|---|---|---|---|
|
c272ce6903
|
|||
|
c934d1ba38
|
|||
|
723a1f769f
|
|||
|
46fc6a7e96
|
263
README.md
263
README.md
@@ -4,12 +4,11 @@ A message-based deployment system for NixOS configurations using NATS for messag
|
||||
|
||||
## Overview
|
||||
|
||||
The `homelab-deploy` binary provides four operational modes:
|
||||
The `homelab-deploy` binary provides three operational modes:
|
||||
|
||||
1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive
|
||||
2. **Builder mode** - Runs on a dedicated build host, subscribing to NATS subjects and executing `nix build` to pre-build configurations
|
||||
3. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
|
||||
4. **CLI mode** - Manual deployment and build commands for administrators
|
||||
2. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
|
||||
3. **CLI mode** - Manual deployment commands for administrators
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -64,6 +63,8 @@ homelab-deploy listener \
|
||||
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
|
||||
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
|
||||
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
|
||||
| `--heartbeat-interval` | No | Status update interval in seconds during deployment (default: 15) |
|
||||
| `--debug` | No | Enable debug logging for troubleshooting |
|
||||
|
||||
#### Subject Templates
|
||||
|
||||
@@ -129,82 +130,6 @@ homelab-deploy deploy prod-dns --nats-url ... --nkey-file ...
|
||||
|
||||
Alias lookup: `HOMELAB_DEPLOY_ALIAS_<NAME>` where name is uppercased and hyphens become underscores.
|
||||
|
||||
### Builder Mode
|
||||
|
||||
Run on a dedicated build host to pre-build NixOS configurations:
|
||||
|
||||
```bash
|
||||
homelab-deploy builder \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/builder.nkey \
|
||||
--config /etc/homelab-deploy/builder.yaml \
|
||||
--timeout 1800 \
|
||||
--metrics-enabled \
|
||||
--metrics-addr :9973
|
||||
```
|
||||
|
||||
#### Builder Configuration File
|
||||
|
||||
The builder uses a YAML configuration file to define allowed repositories:
|
||||
|
||||
```yaml
|
||||
repos:
|
||||
nixos-servers:
|
||||
url: "git+https://git.example.com/org/nixos-servers.git"
|
||||
default_branch: "master"
|
||||
homelab:
|
||||
url: "git+ssh://git@github.com/user/homelab.git"
|
||||
default_branch: "main"
|
||||
```
|
||||
|
||||
#### Builder Flags
|
||||
|
||||
| Flag | Required | Description |
|
||||
|------|----------|-------------|
|
||||
| `--nats-url` | Yes | NATS server URL |
|
||||
| `--nkey-file` | Yes | Path to NKey seed file |
|
||||
| `--config` | Yes | Path to builder configuration file |
|
||||
| `--timeout` | No | Build timeout per host in seconds (default: 1800) |
|
||||
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
|
||||
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9973`) |
|
||||
|
||||
### Build Command
|
||||
|
||||
Trigger a build on the build server:
|
||||
|
||||
```bash
|
||||
# Build all hosts in a repository
|
||||
homelab-deploy build nixos-servers --all \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/deployer.nkey
|
||||
|
||||
# Build a specific host
|
||||
homelab-deploy build nixos-servers myhost \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/deployer.nkey
|
||||
|
||||
# Build with a specific branch
|
||||
homelab-deploy build nixos-servers --all --branch feature-x \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/deployer.nkey
|
||||
|
||||
# JSON output for scripting
|
||||
homelab-deploy build nixos-servers --all --json \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/deployer.nkey
|
||||
```
|
||||
|
||||
#### Build Flags
|
||||
|
||||
| Flag | Required | Env Var | Description |
|
||||
|------|----------|---------|-------------|
|
||||
| `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL |
|
||||
| `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file |
|
||||
| `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch (uses repo default if not specified) |
|
||||
| `--all` | No | - | Build all hosts in the repository |
|
||||
| `--timeout` | No | `HOMELAB_DEPLOY_BUILD_TIMEOUT` | Response timeout in seconds (default: 3600) |
|
||||
| `--json` | No | - | Output results as JSON |
|
||||
|
||||
### MCP Server Mode
|
||||
|
||||
Run as an MCP server for AI assistant integration:
|
||||
@@ -221,12 +146,6 @@ homelab-deploy mcp \
|
||||
--nkey-file /run/secrets/mcp.nkey \
|
||||
--enable-admin \
|
||||
--admin-nkey-file /run/secrets/admin.nkey
|
||||
|
||||
# With build tool enabled
|
||||
homelab-deploy mcp \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/mcp.nkey \
|
||||
--enable-builds
|
||||
```
|
||||
|
||||
#### MCP Tools
|
||||
@@ -236,7 +155,6 @@ homelab-deploy mcp \
|
||||
| `deploy` | Deploy to test-tier hosts only |
|
||||
| `deploy_admin` | Deploy to any tier (requires `--enable-admin`) |
|
||||
| `list_hosts` | Discover available deployment targets |
|
||||
| `build` | Trigger builds on the build server (requires `--enable-builds`) |
|
||||
|
||||
#### Tool Parameters
|
||||
|
||||
@@ -251,12 +169,6 @@ homelab-deploy mcp \
|
||||
**list_hosts:**
|
||||
- `tier` - Filter by tier (optional)
|
||||
|
||||
**build:**
|
||||
- `repo` - Repository name (required, must match builder config)
|
||||
- `target` - Target hostname (optional, defaults to all)
|
||||
- `all` - Build all hosts (default if no target specified)
|
||||
- `branch` - Git branch (uses repo default if not specified)
|
||||
|
||||
## NixOS Module
|
||||
|
||||
Add the module to your NixOS configuration:
|
||||
@@ -304,6 +216,7 @@ Add the module to your NixOS configuration:
|
||||
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
|
||||
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
|
||||
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
|
||||
| `extraArgs` | list of string | `[]` | Extra command line arguments (e.g., `["--debug"]`) |
|
||||
|
||||
Default `deploySubjects`:
|
||||
```nix
|
||||
@@ -314,65 +227,6 @@ Default `deploySubjects`:
|
||||
]
|
||||
```
|
||||
|
||||
### Builder Module Options
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `enable` | bool | `false` | Enable the builder service |
|
||||
| `package` | package | from flake | Package to use |
|
||||
| `natsUrl` | string | required | NATS server URL |
|
||||
| `nkeyFile` | path | required | Path to NKey seed file |
|
||||
| `configFile` | path | `null` | Path to builder config file (alternative to `settings`) |
|
||||
| `settings.repos` | attrs | `{}` | Repository configuration (see below) |
|
||||
| `timeout` | int | `1800` | Build timeout per host in seconds |
|
||||
| `environment` | attrs | `{}` | Additional environment variables |
|
||||
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
|
||||
| `metrics.address` | string | `":9973"` | Metrics HTTP server address |
|
||||
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
|
||||
|
||||
Each entry in `settings.repos` is an attribute set with:
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `url` | string | required | Git flake URL (must start with `git+https://`, `git+ssh://`, or `git+file://`) |
|
||||
| `defaultBranch` | string | `"master"` | Default branch to build when not specified |
|
||||
|
||||
Example builder configuration using `settings`:
|
||||
|
||||
```nix
|
||||
services.homelab-deploy.builder = {
|
||||
enable = true;
|
||||
natsUrl = "nats://nats.example.com:4222";
|
||||
nkeyFile = "/run/secrets/homelab-deploy-builder-nkey";
|
||||
settings.repos = {
|
||||
nixos-servers = {
|
||||
url = "git+https://git.example.com/org/nixos-servers.git";
|
||||
defaultBranch = "master";
|
||||
};
|
||||
homelab = {
|
||||
url = "git+ssh://git@github.com/user/homelab.git";
|
||||
defaultBranch = "main";
|
||||
};
|
||||
};
|
||||
metrics = {
|
||||
enable = true;
|
||||
address = ":9973";
|
||||
openFirewall = true;
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
Alternatively, you can use `configFile` to point to an external YAML file:
|
||||
|
||||
```nix
|
||||
services.homelab-deploy.builder = {
|
||||
enable = true;
|
||||
natsUrl = "nats://nats.example.com:4222";
|
||||
nkeyFile = "/run/secrets/homelab-deploy-builder-nkey";
|
||||
configFile = "/etc/homelab-deploy/builder.yaml";
|
||||
};
|
||||
```
|
||||
|
||||
## Prometheus Metrics
|
||||
|
||||
The listener can expose Prometheus metrics for monitoring deployment operations.
|
||||
@@ -447,23 +301,56 @@ histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[
|
||||
sum(homelab_deploy_deployment_in_progress)
|
||||
```
|
||||
|
||||
### Builder Metrics
|
||||
## Troubleshooting
|
||||
|
||||
When running in builder mode, additional metrics are available:
|
||||
### Debug Logging
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `homelab_deploy_builds_total` | Counter | `repo`, `status` | Total builds processed |
|
||||
| `homelab_deploy_build_host_total` | Counter | `repo`, `host`, `status` | Total host builds processed |
|
||||
| `homelab_deploy_build_duration_seconds` | Histogram | `repo`, `host` | Build execution time per host |
|
||||
| `homelab_deploy_build_last_timestamp` | Gauge | `repo` | Timestamp of last build attempt |
|
||||
| `homelab_deploy_build_last_success_timestamp` | Gauge | `repo` | Timestamp of last successful build |
|
||||
| `homelab_deploy_build_last_failure_timestamp` | Gauge | `repo` | Timestamp of last failed build |
|
||||
Enable debug logging to diagnose issues with deployments or metrics:
|
||||
|
||||
**Label values:**
|
||||
- `status`: `success`, `failure`
|
||||
- `repo`: Repository name from config
|
||||
- `host`: Host name being built
|
||||
**CLI:**
|
||||
```bash
|
||||
homelab-deploy listener --debug \
|
||||
--hostname myhost \
|
||||
--tier prod \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/listener.nkey \
|
||||
--flake-url git+https://git.example.com/user/nixos-configs.git \
|
||||
--metrics-enabled
|
||||
```
|
||||
|
||||
**NixOS module:**
|
||||
```nix
|
||||
services.homelab-deploy.listener = {
|
||||
enable = true;
|
||||
tier = "prod";
|
||||
natsUrl = "nats://nats.example.com:4222";
|
||||
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
||||
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
|
||||
metrics.enable = true;
|
||||
extraArgs = [ "--debug" ];
|
||||
};
|
||||
```
|
||||
|
||||
With debug logging enabled, the listener outputs detailed information about metrics recording:
|
||||
|
||||
```json
|
||||
{"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true}
|
||||
{"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5}
|
||||
```
|
||||
|
||||
### Metrics Showing Zero
|
||||
|
||||
If deployment metrics remain at zero after deployments:
|
||||
|
||||
1. **Check metrics are enabled**: Verify `--metrics-enabled` is set and the metrics endpoint is accessible at `/metrics`
|
||||
|
||||
2. **Enable debug logging**: Use `--debug` to confirm metrics recording is being called
|
||||
|
||||
3. **Check deployment status**: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with `status="rejected"` but don't record duration
|
||||
|
||||
4. **Check after restart**: After a successful `switch` deployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics
|
||||
|
||||
5. **Verify Prometheus scrape timing**: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts
|
||||
|
||||
## Message Protocol
|
||||
|
||||
@@ -492,37 +379,6 @@ When running in builder mode, additional metrics are available:
|
||||
|
||||
**Error codes:** `invalid_revision`, `invalid_action`, `already_running`, `build_failed`, `timeout`
|
||||
|
||||
### Build Request
|
||||
|
||||
```json
|
||||
{
|
||||
"repo": "nixos-servers",
|
||||
"target": "all",
|
||||
"branch": "main",
|
||||
"reply_to": "build.responses.abc123"
|
||||
}
|
||||
```
|
||||
|
||||
### Build Response
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "completed",
|
||||
"message": "built 5/5 hosts successfully",
|
||||
"results": [
|
||||
{"host": "host1", "success": true, "duration_seconds": 120.5},
|
||||
{"host": "host2", "success": true, "duration_seconds": 95.3}
|
||||
],
|
||||
"total_duration_seconds": 450.2,
|
||||
"succeeded": 5,
|
||||
"failed": 0
|
||||
}
|
||||
```
|
||||
|
||||
**Status values:** `started`, `progress`, `completed`, `failed`, `rejected`
|
||||
|
||||
Progress updates include `host`, `host_success`, `hosts_completed`, and `hosts_total` fields.
|
||||
|
||||
## NATS Authentication
|
||||
|
||||
All connections use NKey authentication. Generate keys with:
|
||||
@@ -552,22 +408,13 @@ The deployment system uses the following NATS subject hierarchy:
|
||||
- `deploy.prod.all` - Deploy to all production hosts
|
||||
- `deploy.prod.role.dns` - Deploy to all DNS servers in production
|
||||
|
||||
### Build Subjects
|
||||
|
||||
| Subject Pattern | Purpose |
|
||||
|-----------------|---------|
|
||||
| `build.<repo>.*` | Build requests for a repository |
|
||||
| `build.<repo>.all` | Build all hosts in a repository |
|
||||
| `build.<repo>.<hostname>` | Build a specific host |
|
||||
|
||||
### Response Subjects
|
||||
|
||||
| Subject Pattern | Purpose |
|
||||
|-----------------|---------|
|
||||
| `deploy.responses.<uuid>` | Unique reply subject for each deployment request |
|
||||
| `build.responses.<uuid>` | Unique reply subject for each build request |
|
||||
|
||||
Deployers and build clients create a unique response subject for each request and include it in the `reply_to` field. Listeners and builders publish status updates to this subject.
|
||||
Deployers create a unique response subject for each request and include it in the `reply_to` field. Listeners publish status updates to this subject.
|
||||
|
||||
### Discovery Subject
|
||||
|
||||
@@ -658,9 +505,7 @@ authorization {
|
||||
| Credential Type | Publish | Subscribe |
|
||||
|-----------------|---------|-----------|
|
||||
| Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` |
|
||||
| Builder | `build.responses.>` | `build.<repo>.*` for each configured repo |
|
||||
| Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` |
|
||||
| Build client | `build.<repo>.*` | `build.responses.>` |
|
||||
| Admin deployer | `deploy.>` | `deploy.>` |
|
||||
|
||||
### Generating NKeys
|
||||
|
||||
@@ -9,15 +9,14 @@ import (
|
||||
"syscall"
|
||||
"time"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/builder"
|
||||
deploycli "code.t-juice.club/torjus/homelab-deploy/internal/cli"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/listener"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/mcp"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
deploycli "git.t-juice.club/torjus/homelab-deploy/internal/cli"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/listener"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/mcp"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"github.com/urfave/cli/v3"
|
||||
)
|
||||
|
||||
const version = "0.2.5"
|
||||
const version = "0.1.14"
|
||||
|
||||
func main() {
|
||||
app := &cli.Command{
|
||||
@@ -26,10 +25,8 @@ func main() {
|
||||
Version: version,
|
||||
Commands: []*cli.Command{
|
||||
listenerCommand(),
|
||||
builderCommand(),
|
||||
mcpCommand(),
|
||||
deployCommand(),
|
||||
buildCommand(),
|
||||
listHostsCommand(),
|
||||
},
|
||||
}
|
||||
@@ -45,6 +42,10 @@ func listenerCommand() *cli.Command {
|
||||
Name: "listener",
|
||||
Usage: "Run as a deployment listener (systemd service mode)",
|
||||
Flags: []cli.Flag{
|
||||
&cli.BoolFlag{
|
||||
Name: "debug",
|
||||
Usage: "Enable debug logging for troubleshooting",
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "hostname",
|
||||
Usage: "Hostname for this listener",
|
||||
@@ -128,10 +129,16 @@ func listenerCommand() *cli.Command {
|
||||
MetricsEnabled: c.Bool("metrics-enabled"),
|
||||
MetricsAddr: c.String("metrics-addr"),
|
||||
Version: version,
|
||||
Debug: c.Bool("debug"),
|
||||
}
|
||||
|
||||
logLevel := slog.LevelInfo
|
||||
if c.Bool("debug") {
|
||||
logLevel = slog.LevelDebug
|
||||
}
|
||||
|
||||
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
|
||||
Level: slog.LevelInfo,
|
||||
Level: logLevel,
|
||||
}))
|
||||
|
||||
l := listener.New(cfg, logger)
|
||||
@@ -178,10 +185,6 @@ func mcpCommand() *cli.Command {
|
||||
Usage: "Timeout in seconds for deployment operations",
|
||||
Value: 900,
|
||||
},
|
||||
&cli.BoolFlag{
|
||||
Name: "enable-builds",
|
||||
Usage: "Enable build tool",
|
||||
},
|
||||
},
|
||||
Action: func(_ context.Context, c *cli.Command) error {
|
||||
enableAdmin := c.Bool("enable-admin")
|
||||
@@ -196,7 +199,6 @@ func mcpCommand() *cli.Command {
|
||||
NKeyFile: c.String("nkey-file"),
|
||||
EnableAdmin: enableAdmin,
|
||||
AdminNKeyFile: adminNKeyFile,
|
||||
EnableBuilds: c.Bool("enable-builds"),
|
||||
DiscoverSubject: c.String("discover-subject"),
|
||||
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||
}
|
||||
@@ -382,204 +384,3 @@ func listHostsCommand() *cli.Command {
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
func builderCommand() *cli.Command {
|
||||
return &cli.Command{
|
||||
Name: "builder",
|
||||
Usage: "Run as a build server (systemd service mode)",
|
||||
Flags: []cli.Flag{
|
||||
&cli.StringFlag{
|
||||
Name: "nats-url",
|
||||
Usage: "NATS server URL",
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "nkey-file",
|
||||
Usage: "Path to NKey seed file for NATS authentication",
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "config",
|
||||
Usage: "Path to builder configuration file",
|
||||
Required: true,
|
||||
},
|
||||
&cli.IntFlag{
|
||||
Name: "timeout",
|
||||
Usage: "Build timeout in seconds per host",
|
||||
Value: 1800,
|
||||
},
|
||||
&cli.BoolFlag{
|
||||
Name: "metrics-enabled",
|
||||
Usage: "Enable Prometheus metrics endpoint",
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "metrics-addr",
|
||||
Usage: "Address for Prometheus metrics HTTP server",
|
||||
Value: ":9973",
|
||||
},
|
||||
},
|
||||
Action: func(ctx context.Context, c *cli.Command) error {
|
||||
repoCfg, err := builder.LoadConfig(c.String("config"))
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to load config: %w", err)
|
||||
}
|
||||
|
||||
cfg := builder.BuilderConfig{
|
||||
NATSUrl: c.String("nats-url"),
|
||||
NKeyFile: c.String("nkey-file"),
|
||||
ConfigFile: c.String("config"),
|
||||
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||
MetricsEnabled: c.Bool("metrics-enabled"),
|
||||
MetricsAddr: c.String("metrics-addr"),
|
||||
}
|
||||
|
||||
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
|
||||
Level: slog.LevelInfo,
|
||||
}))
|
||||
|
||||
b := builder.New(cfg, repoCfg, logger)
|
||||
|
||||
// Handle shutdown signals
|
||||
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
|
||||
defer cancel()
|
||||
|
||||
return b.Run(ctx)
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
func buildCommand() *cli.Command {
|
||||
return &cli.Command{
|
||||
Name: "build",
|
||||
Usage: "Trigger a build on the build server",
|
||||
ArgsUsage: "<repo> [hostname]",
|
||||
Flags: []cli.Flag{
|
||||
&cli.StringFlag{
|
||||
Name: "nats-url",
|
||||
Usage: "NATS server URL",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_NATS_URL"),
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "nkey-file",
|
||||
Usage: "Path to NKey seed file for NATS authentication",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_NKEY_FILE"),
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "branch",
|
||||
Usage: "Git branch to build (uses repo default if not specified)",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_BRANCH"),
|
||||
},
|
||||
&cli.BoolFlag{
|
||||
Name: "all",
|
||||
Usage: "Build all hosts in the repo",
|
||||
},
|
||||
&cli.IntFlag{
|
||||
Name: "timeout",
|
||||
Usage: "Timeout in seconds for collecting responses",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_BUILD_TIMEOUT"),
|
||||
Value: 3600,
|
||||
},
|
||||
&cli.BoolFlag{
|
||||
Name: "json",
|
||||
Usage: "Output results as JSON",
|
||||
},
|
||||
},
|
||||
Action: func(ctx context.Context, c *cli.Command) error {
|
||||
if c.Args().Len() < 1 {
|
||||
return fmt.Errorf("repo argument required")
|
||||
}
|
||||
|
||||
repo := c.Args().First()
|
||||
target := c.Args().Get(1)
|
||||
all := c.Bool("all")
|
||||
|
||||
if target == "" && !all {
|
||||
return fmt.Errorf("must specify hostname or --all")
|
||||
}
|
||||
if target != "" && all {
|
||||
return fmt.Errorf("cannot specify both hostname and --all")
|
||||
}
|
||||
if all {
|
||||
target = "all"
|
||||
}
|
||||
|
||||
cfg := deploycli.BuildConfig{
|
||||
NATSUrl: c.String("nats-url"),
|
||||
NKeyFile: c.String("nkey-file"),
|
||||
Repo: repo,
|
||||
Target: target,
|
||||
Branch: c.String("branch"),
|
||||
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||
}
|
||||
|
||||
jsonOutput := c.Bool("json")
|
||||
if !jsonOutput {
|
||||
branchStr := cfg.Branch
|
||||
if branchStr == "" {
|
||||
branchStr = "(default)"
|
||||
}
|
||||
fmt.Printf("Building %s target=%s branch=%s\n", repo, target, branchStr)
|
||||
}
|
||||
|
||||
// Handle shutdown signals
|
||||
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
|
||||
defer cancel()
|
||||
|
||||
result, err := deploycli.Build(ctx, cfg, func(resp *messages.BuildResponse) {
|
||||
if jsonOutput {
|
||||
return
|
||||
}
|
||||
switch resp.Status {
|
||||
case messages.BuildStatusStarted:
|
||||
fmt.Printf("Started: %s\n", resp.Message)
|
||||
case messages.BuildStatusProgress:
|
||||
successStr := "..."
|
||||
if resp.HostSuccess != nil {
|
||||
if *resp.HostSuccess {
|
||||
successStr = "success"
|
||||
} else {
|
||||
successStr = "failed"
|
||||
}
|
||||
}
|
||||
fmt.Printf("[%d/%d] %s: %s\n", resp.HostsCompleted, resp.HostsTotal, resp.Host, successStr)
|
||||
case messages.BuildStatusCompleted, messages.BuildStatusFailed:
|
||||
fmt.Printf("\n%s\n", resp.Message)
|
||||
case messages.BuildStatusRejected:
|
||||
fmt.Printf("Rejected: %s\n", resp.Message)
|
||||
}
|
||||
})
|
||||
if err != nil {
|
||||
return fmt.Errorf("build failed: %w", err)
|
||||
}
|
||||
|
||||
if jsonOutput {
|
||||
data, err := result.MarshalJSON()
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to marshal result: %w", err)
|
||||
}
|
||||
fmt.Println(string(data))
|
||||
} else if result.FinalResponse != nil {
|
||||
fmt.Printf("\nBuild complete: %d succeeded, %d failed (%.1fs)\n",
|
||||
result.FinalResponse.Succeeded,
|
||||
result.FinalResponse.Failed,
|
||||
result.FinalResponse.TotalDurationSeconds)
|
||||
for _, hr := range result.FinalResponse.Results {
|
||||
if !hr.Success {
|
||||
fmt.Printf("\n--- %s (error: %s) ---\n", hr.Host, hr.Error)
|
||||
if hr.Output != "" {
|
||||
fmt.Println(hr.Output)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !result.AllSucceeded() {
|
||||
return fmt.Errorf("some builds failed")
|
||||
}
|
||||
|
||||
return nil
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
6
flake.lock
generated
6
flake.lock
generated
@@ -2,11 +2,11 @@
|
||||
"nodes": {
|
||||
"nixpkgs": {
|
||||
"locked": {
|
||||
"lastModified": 1770562336,
|
||||
"narHash": "sha256-ub1gpAONMFsT/GU2hV6ZWJjur8rJ6kKxdm9IlCT0j84=",
|
||||
"lastModified": 1770197578,
|
||||
"narHash": "sha256-AYqlWrX09+HvGs8zM6ebZ1pwUqjkfpnv8mewYwAo+iM=",
|
||||
"owner": "nixos",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "d6c71932130818840fc8fe9509cf50be8c64634f",
|
||||
"rev": "00c21e4c93d963c50d4c0c89bfa84ed6e0694df2",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
|
||||
4
go.mod
4
go.mod
@@ -1,4 +1,4 @@
|
||||
module code.t-juice.club/torjus/homelab-deploy
|
||||
module git.t-juice.club/torjus/homelab-deploy
|
||||
|
||||
go 1.25.5
|
||||
|
||||
@@ -9,7 +9,6 @@ require (
|
||||
github.com/nats-io/nkeys v0.4.15
|
||||
github.com/prometheus/client_golang v1.23.2
|
||||
github.com/urfave/cli/v3 v3.6.2
|
||||
gopkg.in/yaml.v3 v3.0.1
|
||||
)
|
||||
|
||||
require (
|
||||
@@ -33,4 +32,5 @@ require (
|
||||
golang.org/x/crypto v0.47.0 // indirect
|
||||
golang.org/x/sys v0.40.0 // indirect
|
||||
google.golang.org/protobuf v1.36.8 // indirect
|
||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
||||
)
|
||||
|
||||
@@ -1,377 +0,0 @@
|
||||
package builder
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"regexp"
|
||||
"sort"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/metrics"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||
)
|
||||
|
||||
// hostnameRegex validates hostnames from flake output.
|
||||
// Allows: alphanumeric, dashes, underscores, dots.
|
||||
var hostnameRegex = regexp.MustCompile(`^[a-zA-Z0-9._-]+$`)
|
||||
|
||||
// truncateOutputLines truncates output to the first and last N lines if it exceeds 2*N lines,
|
||||
// returning the result as a slice of strings.
|
||||
func truncateOutputLines(output string, keepLines int) []string {
|
||||
lines := strings.Split(output, "\n")
|
||||
if len(lines) <= keepLines*2 {
|
||||
return lines
|
||||
}
|
||||
head := lines[:keepLines]
|
||||
tail := lines[len(lines)-keepLines:]
|
||||
omitted := len(lines) - keepLines*2
|
||||
result := make([]string, 0, keepLines*2+1)
|
||||
result = append(result, head...)
|
||||
result = append(result, fmt.Sprintf("... (%d lines omitted) ...", omitted))
|
||||
result = append(result, tail...)
|
||||
return result
|
||||
}
|
||||
|
||||
// truncateOutput truncates output to the first and last N lines if it exceeds 2*N lines.
|
||||
func truncateOutput(output string, keepLines int) string {
|
||||
lines := strings.Split(output, "\n")
|
||||
if len(lines) <= keepLines*2 {
|
||||
return output
|
||||
}
|
||||
head := lines[:keepLines]
|
||||
tail := lines[len(lines)-keepLines:]
|
||||
omitted := len(lines) - keepLines*2
|
||||
return strings.Join(head, "\n") + fmt.Sprintf("\n\n... (%d lines omitted) ...\n\n", omitted) + strings.Join(tail, "\n")
|
||||
}
|
||||
|
||||
// BuilderConfig holds the configuration for the builder.
|
||||
type BuilderConfig struct {
|
||||
NATSUrl string
|
||||
NKeyFile string
|
||||
ConfigFile string
|
||||
Timeout time.Duration
|
||||
MetricsEnabled bool
|
||||
MetricsAddr string
|
||||
}
|
||||
|
||||
// Builder handles build requests from NATS.
|
||||
type Builder struct {
|
||||
cfg BuilderConfig
|
||||
repoCfg *Config
|
||||
client *nats.Client
|
||||
executor *Executor
|
||||
lock sync.Mutex
|
||||
busy bool
|
||||
logger *slog.Logger
|
||||
|
||||
// metrics server and collector (nil if metrics disabled)
|
||||
metricsServer *metrics.Server
|
||||
metrics *metrics.BuildCollector
|
||||
}
|
||||
|
||||
// New creates a new builder with the given configuration.
|
||||
func New(cfg BuilderConfig, repoCfg *Config, logger *slog.Logger) *Builder {
|
||||
if logger == nil {
|
||||
logger = slog.Default()
|
||||
}
|
||||
|
||||
b := &Builder{
|
||||
cfg: cfg,
|
||||
repoCfg: repoCfg,
|
||||
executor: NewExecutor(cfg.Timeout),
|
||||
logger: logger,
|
||||
}
|
||||
|
||||
if cfg.MetricsEnabled {
|
||||
b.metricsServer = metrics.NewServer(metrics.ServerConfig{
|
||||
Addr: cfg.MetricsAddr,
|
||||
Logger: logger,
|
||||
})
|
||||
b.metrics = metrics.NewBuildCollector(b.metricsServer.Registry())
|
||||
}
|
||||
|
||||
return b
|
||||
}
|
||||
|
||||
// Run starts the builder and blocks until the context is cancelled.
|
||||
func (b *Builder) Run(ctx context.Context) error {
|
||||
// Start metrics server if enabled
|
||||
if b.metricsServer != nil {
|
||||
if err := b.metricsServer.Start(); err != nil {
|
||||
return fmt.Errorf("failed to start metrics server: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
_ = b.metricsServer.Shutdown(shutdownCtx)
|
||||
}()
|
||||
}
|
||||
|
||||
// Connect to NATS
|
||||
b.logger.Info("connecting to NATS", "url", b.cfg.NATSUrl)
|
||||
|
||||
client, err := nats.Connect(nats.Config{
|
||||
URL: b.cfg.NATSUrl,
|
||||
NKeyFile: b.cfg.NKeyFile,
|
||||
Name: "homelab-deploy-builder",
|
||||
})
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to connect to NATS: %w", err)
|
||||
}
|
||||
b.client = client
|
||||
defer b.client.Close()
|
||||
|
||||
b.logger.Info("connected to NATS")
|
||||
|
||||
// Subscribe to build subjects for each repo
|
||||
for repoName := range b.repoCfg.Repos {
|
||||
// Subscribe to build.<repo>.all and build.<repo>.<hostname>
|
||||
allSubject := fmt.Sprintf("build.%s.*", repoName)
|
||||
b.logger.Info("subscribing to build subject", "subject", allSubject)
|
||||
if _, err := b.client.Subscribe(allSubject, b.handleBuildRequest); err != nil {
|
||||
return fmt.Errorf("failed to subscribe to %s: %w", allSubject, err)
|
||||
}
|
||||
}
|
||||
|
||||
b.logger.Info("builder started", "repos", len(b.repoCfg.Repos))
|
||||
|
||||
// Wait for context cancellation
|
||||
<-ctx.Done()
|
||||
b.logger.Info("shutting down builder")
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (b *Builder) handleBuildRequest(subject string, data []byte) {
|
||||
req, err := messages.UnmarshalBuildRequest(data)
|
||||
if err != nil {
|
||||
b.logger.Error("failed to unmarshal build request",
|
||||
"subject", subject,
|
||||
"error", err,
|
||||
)
|
||||
return
|
||||
}
|
||||
|
||||
b.logger.Info("received build request",
|
||||
"subject", subject,
|
||||
"repo", req.Repo,
|
||||
"target", req.Target,
|
||||
"branch", req.Branch,
|
||||
"reply_to", req.ReplyTo,
|
||||
)
|
||||
|
||||
// Validate request
|
||||
if err := req.Validate(); err != nil {
|
||||
b.logger.Warn("invalid build request", "error", err)
|
||||
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||
messages.BuildStatusRejected,
|
||||
err.Error(),
|
||||
))
|
||||
return
|
||||
}
|
||||
|
||||
// Get repo config
|
||||
repo, err := b.repoCfg.GetRepo(req.Repo)
|
||||
if err != nil {
|
||||
b.logger.Warn("unknown repo", "repo", req.Repo)
|
||||
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||
messages.BuildStatusRejected,
|
||||
fmt.Sprintf("unknown repo: %s", req.Repo),
|
||||
))
|
||||
return
|
||||
}
|
||||
|
||||
// Try to acquire lock
|
||||
b.lock.Lock()
|
||||
if b.busy {
|
||||
b.lock.Unlock()
|
||||
b.logger.Warn("build already in progress")
|
||||
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||
messages.BuildStatusRejected,
|
||||
"another build is already in progress",
|
||||
))
|
||||
return
|
||||
}
|
||||
b.busy = true
|
||||
b.lock.Unlock()
|
||||
|
||||
defer func() {
|
||||
b.lock.Lock()
|
||||
b.busy = false
|
||||
b.lock.Unlock()
|
||||
}()
|
||||
|
||||
// Use default branch if not specified
|
||||
branch := req.Branch
|
||||
if branch == "" {
|
||||
branch = repo.DefaultBranch
|
||||
}
|
||||
|
||||
// Determine hosts to build
|
||||
var hosts []string
|
||||
if req.Target == "all" {
|
||||
// List hosts from flake
|
||||
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||
messages.BuildStatusStarted,
|
||||
"discovering hosts...",
|
||||
))
|
||||
|
||||
hosts, err = b.executor.ListHosts(context.Background(), repo.URL, branch)
|
||||
if err != nil {
|
||||
b.logger.Error("failed to list hosts", "error", err)
|
||||
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||
messages.BuildStatusFailed,
|
||||
fmt.Sprintf("failed to list hosts: %v", err),
|
||||
).WithError(err.Error()))
|
||||
if b.metrics != nil {
|
||||
b.metrics.RecordBuildFailure(req.Repo, "")
|
||||
}
|
||||
return
|
||||
}
|
||||
// Filter out hostnames with invalid characters (security: prevent injection)
|
||||
validHosts := make([]string, 0, len(hosts))
|
||||
for _, host := range hosts {
|
||||
if hostnameRegex.MatchString(host) {
|
||||
validHosts = append(validHosts, host)
|
||||
} else {
|
||||
b.logger.Warn("skipping hostname with invalid characters", "hostname", host)
|
||||
}
|
||||
}
|
||||
hosts = validHosts
|
||||
// Sort hosts for consistent ordering
|
||||
sort.Strings(hosts)
|
||||
} else {
|
||||
hosts = []string{req.Target}
|
||||
}
|
||||
|
||||
if len(hosts) == 0 {
|
||||
b.sendResponse(req.ReplyTo, messages.NewBuildResponse(
|
||||
messages.BuildStatusFailed,
|
||||
"no hosts to build",
|
||||
))
|
||||
return
|
||||
}
|
||||
|
||||
// Send started response
|
||||
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
|
||||
Status: messages.BuildStatusStarted,
|
||||
Message: fmt.Sprintf("building %d host(s)", len(hosts)),
|
||||
HostsTotal: len(hosts),
|
||||
})
|
||||
|
||||
// Build each host sequentially
|
||||
startTime := time.Now()
|
||||
results := make([]messages.BuildHostResult, 0, len(hosts))
|
||||
succeeded := 0
|
||||
failed := 0
|
||||
|
||||
for i, host := range hosts {
|
||||
hostStart := time.Now()
|
||||
b.logger.Info("building host",
|
||||
"host", host,
|
||||
"repo", req.Repo,
|
||||
"rev", branch,
|
||||
"progress", fmt.Sprintf("%d/%d", i+1, len(hosts)),
|
||||
"command", b.executor.BuildCommand(repo.URL, branch, host),
|
||||
)
|
||||
|
||||
result := b.executor.Build(context.Background(), repo.URL, branch, host)
|
||||
hostDuration := time.Since(hostStart).Seconds()
|
||||
|
||||
hostResult := messages.BuildHostResult{
|
||||
Host: host,
|
||||
Success: result.Success,
|
||||
DurationSeconds: hostDuration,
|
||||
}
|
||||
if !result.Success {
|
||||
if result.Error != nil {
|
||||
hostResult.Error = result.Error.Error()
|
||||
}
|
||||
if result.Stderr != "" {
|
||||
hostResult.Output = truncateOutput(result.Stderr, 50)
|
||||
}
|
||||
}
|
||||
results = append(results, hostResult)
|
||||
|
||||
if result.Success {
|
||||
succeeded++
|
||||
b.logger.Info("host build succeeded", "host", host, "repo", req.Repo, "rev", branch, "duration", hostDuration)
|
||||
if b.metrics != nil {
|
||||
b.metrics.RecordHostBuildSuccess(req.Repo, host, hostDuration)
|
||||
}
|
||||
} else {
|
||||
failed++
|
||||
b.logger.Error("host build failed", "host", host, "repo", req.Repo, "rev", branch, "error", hostResult.Error)
|
||||
if result.Stderr != "" {
|
||||
for _, line := range truncateOutputLines(result.Stderr, 50) {
|
||||
b.logger.Warn("build output", "host", host, "repo", req.Repo, "line", line)
|
||||
}
|
||||
}
|
||||
if b.metrics != nil {
|
||||
b.metrics.RecordHostBuildFailure(req.Repo, host, hostDuration)
|
||||
}
|
||||
}
|
||||
|
||||
// Send progress update
|
||||
success := result.Success
|
||||
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
|
||||
Status: messages.BuildStatusProgress,
|
||||
Host: host,
|
||||
HostSuccess: &success,
|
||||
HostsCompleted: i + 1,
|
||||
HostsTotal: len(hosts),
|
||||
})
|
||||
}
|
||||
|
||||
totalDuration := time.Since(startTime).Seconds()
|
||||
|
||||
// Send final response
|
||||
status := messages.BuildStatusCompleted
|
||||
message := fmt.Sprintf("built %d/%d hosts successfully", succeeded, len(hosts))
|
||||
if failed > 0 {
|
||||
status = messages.BuildStatusFailed
|
||||
message = fmt.Sprintf("build failed: %d/%d hosts failed", failed, len(hosts))
|
||||
}
|
||||
|
||||
b.sendResponse(req.ReplyTo, &messages.BuildResponse{
|
||||
Status: status,
|
||||
Message: message,
|
||||
Results: results,
|
||||
TotalDurationSeconds: totalDuration,
|
||||
Succeeded: succeeded,
|
||||
Failed: failed,
|
||||
})
|
||||
|
||||
// Record overall build metrics
|
||||
if b.metrics != nil {
|
||||
if failed == 0 {
|
||||
b.metrics.RecordBuildSuccess(req.Repo)
|
||||
} else {
|
||||
b.metrics.RecordBuildFailure(req.Repo, "")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (b *Builder) sendResponse(replyTo string, resp *messages.BuildResponse) {
|
||||
data, err := resp.Marshal()
|
||||
if err != nil {
|
||||
b.logger.Error("failed to marshal build response", "error", err)
|
||||
return
|
||||
}
|
||||
|
||||
if err := b.client.Publish(replyTo, data); err != nil {
|
||||
b.logger.Error("failed to publish build response",
|
||||
"reply_to", replyTo,
|
||||
"error", err,
|
||||
)
|
||||
}
|
||||
|
||||
// Flush to ensure response is sent immediately
|
||||
if err := b.client.Flush(); err != nil {
|
||||
b.logger.Error("failed to flush", "error", err)
|
||||
}
|
||||
}
|
||||
@@ -1,164 +0,0 @@
|
||||
package builder
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestTruncateOutput(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
input string
|
||||
keepLines int
|
||||
wantLines int
|
||||
wantOmit bool
|
||||
}{
|
||||
{
|
||||
name: "short output unchanged",
|
||||
input: "line1\nline2\nline3",
|
||||
keepLines: 50,
|
||||
wantLines: 3,
|
||||
wantOmit: false,
|
||||
},
|
||||
{
|
||||
name: "exactly at threshold unchanged",
|
||||
input: strings.Join(makeLines(100), "\n"),
|
||||
keepLines: 50,
|
||||
wantLines: 100,
|
||||
wantOmit: false,
|
||||
},
|
||||
{
|
||||
name: "over threshold truncated",
|
||||
input: strings.Join(makeLines(150), "\n"),
|
||||
keepLines: 50,
|
||||
wantLines: 103, // 50 + 1 (empty) + 1 (omitted msg) + 1 (empty) + 50
|
||||
wantOmit: true,
|
||||
},
|
||||
{
|
||||
name: "large output truncated",
|
||||
input: strings.Join(makeLines(1000), "\n"),
|
||||
keepLines: 50,
|
||||
wantLines: 103,
|
||||
wantOmit: true,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
got := truncateOutput(tt.input, tt.keepLines)
|
||||
gotLines := strings.Split(got, "\n")
|
||||
|
||||
if len(gotLines) != tt.wantLines {
|
||||
t.Errorf("got %d lines, want %d", len(gotLines), tt.wantLines)
|
||||
}
|
||||
|
||||
hasOmit := strings.Contains(got, "lines omitted")
|
||||
if hasOmit != tt.wantOmit {
|
||||
t.Errorf("got omit marker = %v, want %v", hasOmit, tt.wantOmit)
|
||||
}
|
||||
|
||||
if tt.wantOmit {
|
||||
// Verify first and last lines are preserved
|
||||
inputLines := strings.Split(tt.input, "\n")
|
||||
firstLine := inputLines[0]
|
||||
lastLine := inputLines[len(inputLines)-1]
|
||||
if !strings.HasPrefix(got, firstLine+"\n") {
|
||||
t.Errorf("first line not preserved, got prefix %q, want %q",
|
||||
gotLines[0], firstLine)
|
||||
}
|
||||
if !strings.HasSuffix(got, lastLine) {
|
||||
t.Errorf("last line not preserved, got suffix %q, want %q",
|
||||
gotLines[len(gotLines)-1], lastLine)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func makeLines(n int) []string {
|
||||
lines := make([]string, n)
|
||||
for i := range lines {
|
||||
lines[i] = "line " + strings.Repeat("x", i%80)
|
||||
}
|
||||
return lines
|
||||
}
|
||||
|
||||
func TestTruncateOutputLines(t *testing.T) {
|
||||
t.Run("short output returns all lines", func(t *testing.T) {
|
||||
input := "line1\nline2\nline3"
|
||||
got := truncateOutputLines(input, 50)
|
||||
if len(got) != 3 {
|
||||
t.Errorf("got %d lines, want 3", len(got))
|
||||
}
|
||||
if got[0] != "line1" || got[1] != "line2" || got[2] != "line3" {
|
||||
t.Errorf("unexpected lines: %v", got)
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("over threshold returns head + marker + tail", func(t *testing.T) {
|
||||
lines := makeLines(200)
|
||||
input := strings.Join(lines, "\n")
|
||||
got := truncateOutputLines(input, 50)
|
||||
|
||||
// Should be 50 head + 1 marker + 50 tail = 101
|
||||
if len(got) != 101 {
|
||||
t.Errorf("got %d lines, want 101", len(got))
|
||||
}
|
||||
|
||||
// Check first and last lines preserved
|
||||
if got[0] != lines[0] {
|
||||
t.Errorf("first line = %q, want %q", got[0], lines[0])
|
||||
}
|
||||
if got[len(got)-1] != lines[len(lines)-1] {
|
||||
t.Errorf("last line = %q, want %q", got[len(got)-1], lines[len(lines)-1])
|
||||
}
|
||||
|
||||
// Check omitted marker
|
||||
marker := got[50]
|
||||
expected := fmt.Sprintf("... (%d lines omitted) ...", 100)
|
||||
if marker != expected {
|
||||
t.Errorf("marker = %q, want %q", marker, expected)
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("exactly at threshold returns all lines", func(t *testing.T) {
|
||||
lines := makeLines(100)
|
||||
input := strings.Join(lines, "\n")
|
||||
got := truncateOutputLines(input, 50)
|
||||
if len(got) != 100 {
|
||||
t.Errorf("got %d lines, want 100", len(got))
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
func TestTruncateOutputPreservesContent(t *testing.T) {
|
||||
// Create input with distinct first and last lines
|
||||
lines := make([]string, 200)
|
||||
for i := range lines {
|
||||
lines[i] = "middle"
|
||||
}
|
||||
lines[0] = "FIRST"
|
||||
lines[49] = "LAST_OF_HEAD"
|
||||
lines[150] = "FIRST_OF_TAIL"
|
||||
lines[199] = "LAST"
|
||||
|
||||
input := strings.Join(lines, "\n")
|
||||
got := truncateOutput(input, 50)
|
||||
|
||||
if !strings.Contains(got, "FIRST") {
|
||||
t.Error("missing FIRST")
|
||||
}
|
||||
if !strings.Contains(got, "LAST_OF_HEAD") {
|
||||
t.Error("missing LAST_OF_HEAD")
|
||||
}
|
||||
if !strings.Contains(got, "FIRST_OF_TAIL") {
|
||||
t.Error("missing FIRST_OF_TAIL")
|
||||
}
|
||||
if !strings.Contains(got, "LAST") {
|
||||
t.Error("missing LAST")
|
||||
}
|
||||
if !strings.Contains(got, "(100 lines omitted)") {
|
||||
t.Errorf("wrong omitted count, got: %s", got)
|
||||
}
|
||||
}
|
||||
@@ -1,96 +0,0 @@
|
||||
package builder
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"regexp"
|
||||
"strings"
|
||||
|
||||
"gopkg.in/yaml.v3"
|
||||
)
|
||||
|
||||
// repoNameRegex validates repository names for safe use in NATS subjects.
|
||||
// Only allows alphanumeric, dashes, and underscores (no dots or wildcards).
|
||||
var repoNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
|
||||
|
||||
// validURLPrefixes are the allowed prefixes for repository URLs.
|
||||
var validURLPrefixes = []string{
|
||||
"git+https://",
|
||||
"git+ssh://",
|
||||
"git+file://",
|
||||
}
|
||||
|
||||
// RepoConfig holds configuration for a single repository.
|
||||
type RepoConfig struct {
|
||||
URL string `yaml:"url"`
|
||||
DefaultBranch string `yaml:"default_branch"`
|
||||
}
|
||||
|
||||
// Config holds the builder configuration.
|
||||
type Config struct {
|
||||
Repos map[string]RepoConfig `yaml:"repos"`
|
||||
}
|
||||
|
||||
// LoadConfig loads configuration from a YAML file.
|
||||
func LoadConfig(path string) (*Config, error) {
|
||||
data, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to read config file: %w", err)
|
||||
}
|
||||
|
||||
var cfg Config
|
||||
if err := yaml.Unmarshal(data, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("failed to parse config file: %w", err)
|
||||
}
|
||||
|
||||
if err := cfg.Validate(); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return &cfg, nil
|
||||
}
|
||||
|
||||
// Validate checks that the configuration is valid.
|
||||
func (c *Config) Validate() error {
|
||||
if len(c.Repos) == 0 {
|
||||
return fmt.Errorf("no repos configured")
|
||||
}
|
||||
|
||||
for name, repo := range c.Repos {
|
||||
// Validate repo name for safe use in NATS subjects
|
||||
if !repoNameRegex.MatchString(name) {
|
||||
return fmt.Errorf("repo name %q contains invalid characters (only alphanumeric, dash, underscore allowed)", name)
|
||||
}
|
||||
|
||||
if repo.URL == "" {
|
||||
return fmt.Errorf("repo %q: url is required", name)
|
||||
}
|
||||
|
||||
// Validate URL format
|
||||
validURL := false
|
||||
for _, prefix := range validURLPrefixes {
|
||||
if strings.HasPrefix(repo.URL, prefix) {
|
||||
validURL = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !validURL {
|
||||
return fmt.Errorf("repo %q: url must start with git+https://, git+ssh://, or git+file://", name)
|
||||
}
|
||||
|
||||
if repo.DefaultBranch == "" {
|
||||
return fmt.Errorf("repo %q: default_branch is required", name)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetRepo returns the configuration for a repository, or an error if not found.
|
||||
func (c *Config) GetRepo(name string) (*RepoConfig, error) {
|
||||
repo, ok := c.Repos[name]
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("repo %q not found in configuration", name)
|
||||
}
|
||||
return &repo, nil
|
||||
}
|
||||
@@ -1,116 +0,0 @@
|
||||
package builder
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os/exec"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Executor handles the execution of nix build commands.
|
||||
type Executor struct {
|
||||
timeout time.Duration
|
||||
}
|
||||
|
||||
// NewExecutor creates a new build executor.
|
||||
func NewExecutor(timeout time.Duration) *Executor {
|
||||
return &Executor{
|
||||
timeout: timeout,
|
||||
}
|
||||
}
|
||||
|
||||
// BuildResult contains the result of a build execution.
|
||||
type BuildResult struct {
|
||||
Success bool
|
||||
ExitCode int
|
||||
Stdout string
|
||||
Stderr string
|
||||
Error error
|
||||
}
|
||||
|
||||
// FlakeShowResult contains the parsed output of nix flake show.
|
||||
type FlakeShowResult struct {
|
||||
NixosConfigurations map[string]any `json:"nixosConfigurations"`
|
||||
}
|
||||
|
||||
// ListHosts returns the list of hosts (nixosConfigurations) available in a flake.
|
||||
func (e *Executor) ListHosts(ctx context.Context, flakeURL, branch string) ([]string, error) {
|
||||
ctx, cancel := context.WithTimeout(ctx, 60*time.Second)
|
||||
defer cancel()
|
||||
|
||||
flakeRef := fmt.Sprintf("%s?ref=%s", flakeURL, branch)
|
||||
cmd := exec.CommandContext(ctx, "nix", "flake", "show", "--json", flakeRef)
|
||||
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
|
||||
if err := cmd.Run(); err != nil {
|
||||
if ctx.Err() == context.DeadlineExceeded {
|
||||
return nil, fmt.Errorf("timeout listing hosts")
|
||||
}
|
||||
return nil, fmt.Errorf("failed to list hosts: %w\n%s", err, stderr.String())
|
||||
}
|
||||
|
||||
var result FlakeShowResult
|
||||
if err := json.Unmarshal(stdout.Bytes(), &result); err != nil {
|
||||
return nil, fmt.Errorf("failed to parse flake show output: %w", err)
|
||||
}
|
||||
|
||||
hosts := make([]string, 0, len(result.NixosConfigurations))
|
||||
for host := range result.NixosConfigurations {
|
||||
hosts = append(hosts, host)
|
||||
}
|
||||
|
||||
return hosts, nil
|
||||
}
|
||||
|
||||
// Build builds a single host's system configuration.
|
||||
func (e *Executor) Build(ctx context.Context, flakeURL, branch, host string) *BuildResult {
|
||||
ctx, cancel := context.WithTimeout(ctx, e.timeout)
|
||||
defer cancel()
|
||||
|
||||
// Build the flake reference for the system toplevel
|
||||
flakeRef := fmt.Sprintf("%s?ref=%s#nixosConfigurations.%s.config.system.build.toplevel", flakeURL, branch, host)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "nix", "build", "--no-link", flakeRef)
|
||||
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
|
||||
err := cmd.Run()
|
||||
|
||||
result := &BuildResult{
|
||||
Stdout: stdout.String(),
|
||||
Stderr: stderr.String(),
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
result.Success = false
|
||||
result.Error = err
|
||||
|
||||
if ctx.Err() == context.DeadlineExceeded {
|
||||
result.Error = fmt.Errorf("build timed out after %v", e.timeout)
|
||||
}
|
||||
|
||||
if exitErr, ok := err.(*exec.ExitError); ok {
|
||||
result.ExitCode = exitErr.ExitCode()
|
||||
} else {
|
||||
result.ExitCode = -1
|
||||
}
|
||||
} else {
|
||||
result.Success = true
|
||||
result.ExitCode = 0
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
// BuildCommand returns the command that would be executed (for logging/debugging).
|
||||
func (e *Executor) BuildCommand(flakeURL, branch, host string) string {
|
||||
flakeRef := fmt.Sprintf("%s?ref=%s#nixosConfigurations.%s.config.system.build.toplevel", flakeURL, branch, host)
|
||||
return fmt.Sprintf("nix build --no-link %s", flakeRef)
|
||||
}
|
||||
@@ -1,140 +0,0 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/google/uuid"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||
)
|
||||
|
||||
// BuildConfig holds configuration for a build operation.
|
||||
type BuildConfig struct {
|
||||
NATSUrl string
|
||||
NKeyFile string
|
||||
Repo string
|
||||
Target string
|
||||
Branch string
|
||||
Timeout time.Duration
|
||||
}
|
||||
|
||||
// BuildResult contains the aggregated results from a build.
|
||||
type BuildResult struct {
|
||||
Responses []*messages.BuildResponse
|
||||
FinalResponse *messages.BuildResponse
|
||||
Errors []error
|
||||
}
|
||||
|
||||
// AllSucceeded returns true if the build completed successfully.
|
||||
func (r *BuildResult) AllSucceeded() bool {
|
||||
if len(r.Errors) > 0 {
|
||||
return false
|
||||
}
|
||||
if r.FinalResponse == nil {
|
||||
return false
|
||||
}
|
||||
return r.FinalResponse.Status == messages.BuildStatusCompleted && r.FinalResponse.Failed == 0
|
||||
}
|
||||
|
||||
// MarshalJSON returns the JSON representation of the build result.
|
||||
func (r *BuildResult) MarshalJSON() ([]byte, error) {
|
||||
if r.FinalResponse != nil {
|
||||
return json.Marshal(r.FinalResponse)
|
||||
}
|
||||
return json.Marshal(map[string]any{
|
||||
"status": "unknown",
|
||||
"responses": r.Responses,
|
||||
"errors": r.Errors,
|
||||
})
|
||||
}
|
||||
|
||||
// Build triggers a build and collects responses.
|
||||
func Build(ctx context.Context, cfg BuildConfig, onResponse func(*messages.BuildResponse)) (*BuildResult, error) {
|
||||
// Connect to NATS
|
||||
client, err := nats.Connect(nats.Config{
|
||||
URL: cfg.NATSUrl,
|
||||
NKeyFile: cfg.NKeyFile,
|
||||
Name: "homelab-deploy-build-cli",
|
||||
})
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to connect to NATS: %w", err)
|
||||
}
|
||||
defer client.Close()
|
||||
|
||||
// Generate unique reply subject
|
||||
requestID := uuid.New().String()
|
||||
replySubject := fmt.Sprintf("build.responses.%s", requestID)
|
||||
|
||||
var mu sync.Mutex
|
||||
result := &BuildResult{}
|
||||
done := make(chan struct{})
|
||||
|
||||
// Subscribe to reply subject
|
||||
sub, err := client.Subscribe(replySubject, func(subject string, data []byte) {
|
||||
resp, err := messages.UnmarshalBuildResponse(data)
|
||||
if err != nil {
|
||||
mu.Lock()
|
||||
result.Errors = append(result.Errors, fmt.Errorf("failed to unmarshal response: %w", err))
|
||||
mu.Unlock()
|
||||
return
|
||||
}
|
||||
|
||||
mu.Lock()
|
||||
result.Responses = append(result.Responses, resp)
|
||||
if resp.Status.IsFinal() {
|
||||
result.FinalResponse = resp
|
||||
select {
|
||||
case <-done:
|
||||
default:
|
||||
close(done)
|
||||
}
|
||||
}
|
||||
mu.Unlock()
|
||||
|
||||
if onResponse != nil {
|
||||
onResponse(resp)
|
||||
}
|
||||
})
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to subscribe to reply subject: %w", err)
|
||||
}
|
||||
defer func() { _ = sub.Unsubscribe() }()
|
||||
|
||||
// Build and send request
|
||||
req := &messages.BuildRequest{
|
||||
Repo: cfg.Repo,
|
||||
Target: cfg.Target,
|
||||
Branch: cfg.Branch,
|
||||
ReplyTo: replySubject,
|
||||
}
|
||||
|
||||
data, err := req.Marshal()
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to marshal request: %w", err)
|
||||
}
|
||||
|
||||
// Publish to build.<repo>.<target>
|
||||
buildSubject := fmt.Sprintf("build.%s.%s", cfg.Repo, cfg.Target)
|
||||
if err := client.Publish(buildSubject, data); err != nil {
|
||||
return nil, fmt.Errorf("failed to publish request: %w", err)
|
||||
}
|
||||
|
||||
if err := client.Flush(); err != nil {
|
||||
return nil, fmt.Errorf("failed to flush: %w", err)
|
||||
}
|
||||
|
||||
// Wait for final response or timeout
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return result, ctx.Err()
|
||||
case <-done:
|
||||
return result, nil
|
||||
case <-time.After(cfg.Timeout):
|
||||
return result, nil
|
||||
}
|
||||
}
|
||||
@@ -8,8 +8,8 @@ import (
|
||||
|
||||
"github.com/google/uuid"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||
)
|
||||
|
||||
// DeployConfig holds configuration for a deploy operation.
|
||||
|
||||
@@ -3,7 +3,7 @@ package cli
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
)
|
||||
|
||||
func TestDeployResult_AllSucceeded(t *testing.T) {
|
||||
|
||||
@@ -7,7 +7,7 @@ import (
|
||||
"os/exec"
|
||||
"time"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
)
|
||||
|
||||
// Executor handles the execution of nixos-rebuild commands.
|
||||
|
||||
@@ -4,7 +4,7 @@ import (
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
)
|
||||
|
||||
func TestExecutor_BuildCommand(t *testing.T) {
|
||||
|
||||
@@ -6,10 +6,10 @@ import (
|
||||
"log/slog"
|
||||
"time"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/deploy"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/metrics"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/deploy"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/metrics"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||
)
|
||||
|
||||
// Config holds the configuration for the listener.
|
||||
@@ -27,6 +27,7 @@ type Config struct {
|
||||
MetricsEnabled bool
|
||||
MetricsAddr string
|
||||
Version string
|
||||
Debug bool
|
||||
}
|
||||
|
||||
// Listener handles deployment requests from NATS.
|
||||
@@ -203,7 +204,14 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
|
||||
// Record deployment start for metrics
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment start metric",
|
||||
"metrics_enabled", true,
|
||||
)
|
||||
l.metrics.RecordDeploymentStart()
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment start metric",
|
||||
"metrics_enabled", false,
|
||||
)
|
||||
}
|
||||
startTime := time.Now()
|
||||
|
||||
@@ -219,9 +227,19 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
messages.StatusFailed,
|
||||
fmt.Sprintf("revision validation failed: %v", err),
|
||||
).WithError(messages.ErrorInvalidRevision))
|
||||
if l.metrics != nil {
|
||||
duration := time.Since(startTime).Seconds()
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment failure metric (revision validation)",
|
||||
"action", req.Action,
|
||||
"error_code", messages.ErrorInvalidRevision,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
l.metrics.RecordDeploymentFailure(req.Action, messages.ErrorInvalidRevision, duration)
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment failure metric",
|
||||
"metrics_enabled", false,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
}
|
||||
return
|
||||
}
|
||||
@@ -265,7 +283,17 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
l.logger.Error("failed to flush completed response", "error", err)
|
||||
}
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment end metric (success)",
|
||||
"action", req.Action,
|
||||
"success", true,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
l.metrics.RecordDeploymentEnd(req.Action, true, duration)
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment end metric",
|
||||
"metrics_enabled", false,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
}
|
||||
|
||||
// After a successful switch, signal restart so we pick up any new version
|
||||
@@ -305,7 +333,17 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
fmt.Sprintf("deployment failed (exit code %d): %s", result.ExitCode, result.Stderr),
|
||||
).WithError(errorCode))
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment failure metric",
|
||||
"action", req.Action,
|
||||
"error_code", errorCode,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
l.metrics.RecordDeploymentFailure(req.Action, errorCode, duration)
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment failure metric",
|
||||
"metrics_enabled", false,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,8 +2,14 @@ package listener
|
||||
|
||||
import (
|
||||
"log/slog"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/metrics"
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
"github.com/prometheus/client_golang/prometheus/testutil"
|
||||
)
|
||||
|
||||
func TestNew(t *testing.T) {
|
||||
@@ -51,3 +57,148 @@ func TestNew_WithLogger(t *testing.T) {
|
||||
t.Error("should use provided logger")
|
||||
}
|
||||
}
|
||||
|
||||
func TestNew_WithMetricsEnabled(t *testing.T) {
|
||||
cfg := Config{
|
||||
Hostname: "test-host",
|
||||
Tier: "test",
|
||||
MetricsEnabled: true,
|
||||
MetricsAddr: ":0",
|
||||
}
|
||||
|
||||
l := New(cfg, nil)
|
||||
|
||||
if l.metricsServer == nil {
|
||||
t.Error("metricsServer should not be nil when MetricsEnabled is true")
|
||||
}
|
||||
if l.metrics == nil {
|
||||
t.Error("metrics should not be nil when MetricsEnabled is true")
|
||||
}
|
||||
}
|
||||
|
||||
func TestListener_MetricsRecordedOnDeployment(t *testing.T) {
|
||||
// This test verifies that the listener correctly calls metrics functions
|
||||
// when processing deployments. We test this by directly calling the internal
|
||||
// metrics recording logic that handleDeployRequest uses.
|
||||
|
||||
reg := prometheus.NewRegistry()
|
||||
collector := metrics.NewCollector(reg)
|
||||
|
||||
// Simulate what handleDeployRequest does for a successful deployment
|
||||
collector.RecordDeploymentStart()
|
||||
collector.RecordDeploymentEnd(messages.ActionSwitch, true, 120.5)
|
||||
|
||||
// Verify counter was incremented
|
||||
counterExpected := `
|
||||
# HELP homelab_deploy_deployments_total Total deployment requests processed
|
||||
# TYPE homelab_deploy_deployments_total counter
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 1
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Verify histogram was updated (120.5 seconds falls into le="300" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 120.5
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,109 +0,0 @@
|
||||
package mcp
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
"github.com/mark3labs/mcp-go/mcp"
|
||||
|
||||
deploycli "code.t-juice.club/torjus/homelab-deploy/internal/cli"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
)
|
||||
|
||||
// BuildTool creates the build tool definition.
|
||||
func BuildTool() mcp.Tool {
|
||||
return mcp.NewTool(
|
||||
"build",
|
||||
mcp.WithDescription("Trigger a Nix build on the build server"),
|
||||
mcp.WithString("repo",
|
||||
mcp.Required(),
|
||||
mcp.Description("Repository name (must match builder config)"),
|
||||
),
|
||||
mcp.WithString("target",
|
||||
mcp.Description("Target hostname, or omit to build all hosts"),
|
||||
),
|
||||
mcp.WithBoolean("all",
|
||||
mcp.Description("Build all hosts in the repository (default if no target specified)"),
|
||||
),
|
||||
mcp.WithString("branch",
|
||||
mcp.Description("Git branch to build (uses repo default if not specified)"),
|
||||
),
|
||||
)
|
||||
}
|
||||
|
||||
// HandleBuild handles the build tool.
|
||||
func (h *ToolHandler) HandleBuild(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
|
||||
repo, err := request.RequireString("repo")
|
||||
if err != nil {
|
||||
return mcp.NewToolResultError("repo is required"), nil
|
||||
}
|
||||
|
||||
target := request.GetString("target", "")
|
||||
all := request.GetBool("all", false)
|
||||
branch := request.GetString("branch", "")
|
||||
|
||||
// Default to "all" if no target specified
|
||||
if target == "" {
|
||||
if !all {
|
||||
all = true
|
||||
}
|
||||
target = "all"
|
||||
}
|
||||
if all && target != "all" {
|
||||
return mcp.NewToolResultError("cannot specify both target and all"), nil
|
||||
}
|
||||
|
||||
cfg := deploycli.BuildConfig{
|
||||
NATSUrl: h.cfg.NATSUrl,
|
||||
NKeyFile: h.cfg.NKeyFile,
|
||||
Repo: repo,
|
||||
Target: target,
|
||||
Branch: branch,
|
||||
Timeout: h.cfg.Timeout,
|
||||
}
|
||||
|
||||
var output strings.Builder
|
||||
branchStr := branch
|
||||
if branchStr == "" {
|
||||
branchStr = "(default)"
|
||||
}
|
||||
output.WriteString(fmt.Sprintf("Building %s target=%s branch=%s\n\n", repo, target, branchStr))
|
||||
|
||||
result, err := deploycli.Build(ctx, cfg, func(resp *messages.BuildResponse) {
|
||||
switch resp.Status {
|
||||
case messages.BuildStatusStarted:
|
||||
output.WriteString(fmt.Sprintf("Started: %s\n", resp.Message))
|
||||
case messages.BuildStatusProgress:
|
||||
successStr := "..."
|
||||
if resp.HostSuccess != nil {
|
||||
if *resp.HostSuccess {
|
||||
successStr = "success"
|
||||
} else {
|
||||
successStr = "failed"
|
||||
}
|
||||
}
|
||||
output.WriteString(fmt.Sprintf("[%d/%d] %s: %s\n", resp.HostsCompleted, resp.HostsTotal, resp.Host, successStr))
|
||||
case messages.BuildStatusCompleted, messages.BuildStatusFailed:
|
||||
output.WriteString(fmt.Sprintf("\n%s\n", resp.Message))
|
||||
case messages.BuildStatusRejected:
|
||||
output.WriteString(fmt.Sprintf("Rejected: %s\n", resp.Message))
|
||||
}
|
||||
})
|
||||
if err != nil {
|
||||
return mcp.NewToolResultError(fmt.Sprintf("build failed: %v", err)), nil
|
||||
}
|
||||
|
||||
if result.FinalResponse != nil {
|
||||
output.WriteString(fmt.Sprintf("\nBuild complete: %d succeeded, %d failed (%.1fs)\n",
|
||||
result.FinalResponse.Succeeded,
|
||||
result.FinalResponse.Failed,
|
||||
result.FinalResponse.TotalDurationSeconds))
|
||||
}
|
||||
|
||||
if !result.AllSucceeded() {
|
||||
output.WriteString("WARNING: Some builds failed\n")
|
||||
}
|
||||
|
||||
return mcp.NewToolResultText(output.String()), nil
|
||||
}
|
||||
@@ -12,7 +12,6 @@ type ServerConfig struct {
|
||||
NKeyFile string
|
||||
EnableAdmin bool
|
||||
AdminNKeyFile string
|
||||
EnableBuilds bool
|
||||
DiscoverSubject string
|
||||
Timeout time.Duration
|
||||
}
|
||||
@@ -50,11 +49,6 @@ func New(cfg ServerConfig) *Server {
|
||||
s.AddTool(DeployAdminTool(), handler.HandleDeployAdmin)
|
||||
}
|
||||
|
||||
// Optionally register build tool
|
||||
if cfg.EnableBuilds {
|
||||
s.AddTool(BuildTool(), handler.HandleBuild)
|
||||
}
|
||||
|
||||
return &Server{
|
||||
cfg: cfg,
|
||||
server: s,
|
||||
|
||||
@@ -9,8 +9,8 @@ import (
|
||||
|
||||
"github.com/mark3labs/mcp-go/mcp"
|
||||
|
||||
deploycli "code.t-juice.club/torjus/homelab-deploy/internal/cli"
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
deploycli "git.t-juice.club/torjus/homelab-deploy/internal/cli"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
)
|
||||
|
||||
// ToolConfig holds configuration for the MCP tools.
|
||||
|
||||
@@ -1,135 +0,0 @@
|
||||
package messages
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// BuildStatus represents the status of a build response.
|
||||
type BuildStatus string
|
||||
|
||||
const (
|
||||
BuildStatusStarted BuildStatus = "started"
|
||||
BuildStatusProgress BuildStatus = "progress"
|
||||
BuildStatusCompleted BuildStatus = "completed"
|
||||
BuildStatusFailed BuildStatus = "failed"
|
||||
BuildStatusRejected BuildStatus = "rejected"
|
||||
)
|
||||
|
||||
// IsFinal returns true if this status indicates a terminal state.
|
||||
func (s BuildStatus) IsFinal() bool {
|
||||
switch s {
|
||||
case BuildStatusCompleted, BuildStatusFailed, BuildStatusRejected:
|
||||
return true
|
||||
default:
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
// BuildRequest is the message sent to request a build.
|
||||
type BuildRequest struct {
|
||||
Repo string `json:"repo"` // Must match config
|
||||
Target string `json:"target"` // Hostname or "all"
|
||||
Branch string `json:"branch,omitempty"` // Optional, uses repo default
|
||||
ReplyTo string `json:"reply_to"`
|
||||
}
|
||||
|
||||
// Validate checks that the request is valid.
|
||||
func (r *BuildRequest) Validate() error {
|
||||
if r.Repo == "" {
|
||||
return fmt.Errorf("repo is required")
|
||||
}
|
||||
if !revisionRegex.MatchString(r.Repo) {
|
||||
return fmt.Errorf("invalid repo name format: %q", r.Repo)
|
||||
}
|
||||
if r.Target == "" {
|
||||
return fmt.Errorf("target is required")
|
||||
}
|
||||
// Target must be "all" or a valid hostname (same format as revision/branch)
|
||||
if r.Target != "all" && !revisionRegex.MatchString(r.Target) {
|
||||
return fmt.Errorf("invalid target format: %q", r.Target)
|
||||
}
|
||||
if r.Branch != "" && !revisionRegex.MatchString(r.Branch) {
|
||||
return fmt.Errorf("invalid branch format: %q", r.Branch)
|
||||
}
|
||||
if r.ReplyTo == "" {
|
||||
return fmt.Errorf("reply_to is required")
|
||||
}
|
||||
// Validate reply_to format to prevent publishing to arbitrary subjects
|
||||
if !strings.HasPrefix(r.ReplyTo, "build.responses.") {
|
||||
return fmt.Errorf("invalid reply_to format: must start with 'build.responses.'")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Marshal serializes the request to JSON.
|
||||
func (r *BuildRequest) Marshal() ([]byte, error) {
|
||||
return json.Marshal(r)
|
||||
}
|
||||
|
||||
// UnmarshalBuildRequest deserializes a request from JSON.
|
||||
func UnmarshalBuildRequest(data []byte) (*BuildRequest, error) {
|
||||
var r BuildRequest
|
||||
if err := json.Unmarshal(data, &r); err != nil {
|
||||
return nil, fmt.Errorf("failed to unmarshal build request: %w", err)
|
||||
}
|
||||
return &r, nil
|
||||
}
|
||||
|
||||
// BuildHostResult contains the result of building a single host.
|
||||
type BuildHostResult struct {
|
||||
Host string `json:"host"`
|
||||
Success bool `json:"success"`
|
||||
Error string `json:"error,omitempty"`
|
||||
Output string `json:"output,omitempty"`
|
||||
DurationSeconds float64 `json:"duration_seconds"`
|
||||
}
|
||||
|
||||
// BuildResponse is the message sent in response to a build request.
|
||||
type BuildResponse struct {
|
||||
Status BuildStatus `json:"status"`
|
||||
Message string `json:"message,omitempty"`
|
||||
|
||||
// Progress updates
|
||||
Host string `json:"host,omitempty"`
|
||||
HostSuccess *bool `json:"host_success,omitempty"`
|
||||
HostsCompleted int `json:"hosts_completed,omitempty"`
|
||||
HostsTotal int `json:"hosts_total,omitempty"`
|
||||
|
||||
// Final response
|
||||
Results []BuildHostResult `json:"results,omitempty"`
|
||||
TotalDurationSeconds float64 `json:"total_duration_seconds,omitempty"`
|
||||
Succeeded int `json:"succeeded,omitempty"`
|
||||
Failed int `json:"failed,omitempty"`
|
||||
|
||||
Error string `json:"error,omitempty"`
|
||||
}
|
||||
|
||||
// NewBuildResponse creates a new response with the given status and message.
|
||||
func NewBuildResponse(status BuildStatus, message string) *BuildResponse {
|
||||
return &BuildResponse{
|
||||
Status: status,
|
||||
Message: message,
|
||||
}
|
||||
}
|
||||
|
||||
// WithError adds an error message to the response.
|
||||
func (r *BuildResponse) WithError(err string) *BuildResponse {
|
||||
r.Error = err
|
||||
return r
|
||||
}
|
||||
|
||||
// Marshal serializes the response to JSON.
|
||||
func (r *BuildResponse) Marshal() ([]byte, error) {
|
||||
return json.Marshal(r)
|
||||
}
|
||||
|
||||
// UnmarshalBuildResponse deserializes a response from JSON.
|
||||
func UnmarshalBuildResponse(data []byte) (*BuildResponse, error) {
|
||||
var r BuildResponse
|
||||
if err := json.Unmarshal(data, &r); err != nil {
|
||||
return nil, fmt.Errorf("failed to unmarshal build response: %w", err)
|
||||
}
|
||||
return &r, nil
|
||||
}
|
||||
@@ -1,99 +0,0 @@
|
||||
package metrics
|
||||
|
||||
import (
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
)
|
||||
|
||||
// BuildCollector holds all Prometheus metrics for the builder.
|
||||
type BuildCollector struct {
|
||||
buildsTotal *prometheus.CounterVec
|
||||
buildHostTotal *prometheus.CounterVec
|
||||
buildDuration *prometheus.HistogramVec
|
||||
buildLastTimestamp *prometheus.GaugeVec
|
||||
buildLastSuccessTime *prometheus.GaugeVec
|
||||
buildLastFailureTime *prometheus.GaugeVec
|
||||
}
|
||||
|
||||
// NewBuildCollector creates a new build metrics collector and registers it with the given registerer.
|
||||
func NewBuildCollector(reg prometheus.Registerer) *BuildCollector {
|
||||
c := &BuildCollector{
|
||||
buildsTotal: prometheus.NewCounterVec(
|
||||
prometheus.CounterOpts{
|
||||
Name: "homelab_deploy_builds_total",
|
||||
Help: "Total builds processed",
|
||||
},
|
||||
[]string{"repo", "status"},
|
||||
),
|
||||
buildHostTotal: prometheus.NewCounterVec(
|
||||
prometheus.CounterOpts{
|
||||
Name: "homelab_deploy_build_host_total",
|
||||
Help: "Total host builds processed",
|
||||
},
|
||||
[]string{"repo", "host", "status"},
|
||||
),
|
||||
buildDuration: prometheus.NewHistogramVec(
|
||||
prometheus.HistogramOpts{
|
||||
Name: "homelab_deploy_build_duration_seconds",
|
||||
Help: "Build execution time per host",
|
||||
Buckets: []float64{5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 14400},
|
||||
},
|
||||
[]string{"repo", "host"},
|
||||
),
|
||||
buildLastTimestamp: prometheus.NewGaugeVec(
|
||||
prometheus.GaugeOpts{
|
||||
Name: "homelab_deploy_build_last_timestamp",
|
||||
Help: "Timestamp of last build attempt",
|
||||
},
|
||||
[]string{"repo"},
|
||||
),
|
||||
buildLastSuccessTime: prometheus.NewGaugeVec(
|
||||
prometheus.GaugeOpts{
|
||||
Name: "homelab_deploy_build_last_success_timestamp",
|
||||
Help: "Timestamp of last successful build",
|
||||
},
|
||||
[]string{"repo"},
|
||||
),
|
||||
buildLastFailureTime: prometheus.NewGaugeVec(
|
||||
prometheus.GaugeOpts{
|
||||
Name: "homelab_deploy_build_last_failure_timestamp",
|
||||
Help: "Timestamp of last failed build",
|
||||
},
|
||||
[]string{"repo"},
|
||||
),
|
||||
}
|
||||
|
||||
reg.MustRegister(c.buildsTotal)
|
||||
reg.MustRegister(c.buildHostTotal)
|
||||
reg.MustRegister(c.buildDuration)
|
||||
reg.MustRegister(c.buildLastTimestamp)
|
||||
reg.MustRegister(c.buildLastSuccessTime)
|
||||
reg.MustRegister(c.buildLastFailureTime)
|
||||
|
||||
return c
|
||||
}
|
||||
|
||||
// RecordBuildSuccess records a successful build.
|
||||
func (c *BuildCollector) RecordBuildSuccess(repo string) {
|
||||
c.buildsTotal.WithLabelValues(repo, "success").Inc()
|
||||
c.buildLastTimestamp.WithLabelValues(repo).SetToCurrentTime()
|
||||
c.buildLastSuccessTime.WithLabelValues(repo).SetToCurrentTime()
|
||||
}
|
||||
|
||||
// RecordBuildFailure records a failed build.
|
||||
func (c *BuildCollector) RecordBuildFailure(repo, errorCode string) {
|
||||
c.buildsTotal.WithLabelValues(repo, "failure").Inc()
|
||||
c.buildLastTimestamp.WithLabelValues(repo).SetToCurrentTime()
|
||||
c.buildLastFailureTime.WithLabelValues(repo).SetToCurrentTime()
|
||||
}
|
||||
|
||||
// RecordHostBuildSuccess records a successful host build.
|
||||
func (c *BuildCollector) RecordHostBuildSuccess(repo, host string, durationSeconds float64) {
|
||||
c.buildHostTotal.WithLabelValues(repo, host, "success").Inc()
|
||||
c.buildDuration.WithLabelValues(repo, host).Observe(durationSeconds)
|
||||
}
|
||||
|
||||
// RecordHostBuildFailure records a failed host build.
|
||||
func (c *BuildCollector) RecordHostBuildFailure(repo, host string, durationSeconds float64) {
|
||||
c.buildHostTotal.WithLabelValues(repo, host, "failure").Inc()
|
||||
c.buildDuration.WithLabelValues(repo, host).Observe(durationSeconds)
|
||||
}
|
||||
@@ -2,7 +2,7 @@
|
||||
package metrics
|
||||
|
||||
import (
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
)
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@ import (
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"code.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
"github.com/prometheus/client_golang/prometheus/testutil"
|
||||
)
|
||||
@@ -78,6 +78,103 @@ homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Check histogram recorded the duration (120.5 seconds falls into le="300" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 120.5
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordDeploymentEnd_Failure(t *testing.T) {
|
||||
@@ -102,6 +199,103 @@ homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Check histogram recorded the duration (60.0 seconds falls into le="60" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 60
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordDeploymentFailure(t *testing.T) {
|
||||
@@ -127,6 +321,103 @@ homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Check histogram recorded the duration (300.0 seconds falls into le="300" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 300
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordRejection(t *testing.T) {
|
||||
|
||||
@@ -36,7 +36,7 @@ func NewServer(cfg ServerConfig) *Server {
|
||||
registry := prometheus.NewRegistry()
|
||||
collector := NewCollector(registry)
|
||||
|
||||
scrapeCh := make(chan struct{})
|
||||
scrapeCh := make(chan struct{}, 1)
|
||||
|
||||
metricsHandler := promhttp.HandlerFor(registry, promhttp.HandlerOpts{
|
||||
Registry: registry,
|
||||
@@ -74,11 +74,6 @@ func (s *Server) Collector() *Collector {
|
||||
return s.collector
|
||||
}
|
||||
|
||||
// Registry returns the Prometheus registry.
|
||||
func (s *Server) Registry() *prometheus.Registry {
|
||||
return s.registry
|
||||
}
|
||||
|
||||
// ScrapeCh returns a channel that receives a signal each time the metrics endpoint is scraped.
|
||||
func (s *Server) ScrapeCh() <-chan struct{} {
|
||||
return s.scrapeCh
|
||||
|
||||
218
nixos/module.nix
218
nixos/module.nix
@@ -2,61 +2,33 @@
|
||||
{ config, lib, pkgs, ... }:
|
||||
|
||||
let
|
||||
listenerCfg = config.services.homelab-deploy.listener;
|
||||
builderCfg = config.services.homelab-deploy.builder;
|
||||
cfg = config.services.homelab-deploy.listener;
|
||||
|
||||
# Generate YAML config from settings
|
||||
generatedConfigFile = pkgs.writeText "builder.yaml" (lib.generators.toYAML {} {
|
||||
repos = lib.mapAttrs (name: repo: {
|
||||
url = repo.url;
|
||||
default_branch = repo.defaultBranch;
|
||||
}) builderCfg.settings.repos;
|
||||
});
|
||||
|
||||
# Use provided configFile or generate from settings
|
||||
builderConfigFile =
|
||||
if builderCfg.configFile != null
|
||||
then builderCfg.configFile
|
||||
else generatedConfigFile;
|
||||
|
||||
# Build command line arguments for listener from configuration
|
||||
listenerArgs = lib.concatStringsSep " " ([
|
||||
"--hostname ${lib.escapeShellArg listenerCfg.hostname}"
|
||||
"--tier ${listenerCfg.tier}"
|
||||
"--nats-url ${lib.escapeShellArg listenerCfg.natsUrl}"
|
||||
"--nkey-file ${lib.escapeShellArg listenerCfg.nkeyFile}"
|
||||
"--flake-url ${lib.escapeShellArg listenerCfg.flakeUrl}"
|
||||
"--timeout ${toString listenerCfg.timeout}"
|
||||
"--discover-subject ${lib.escapeShellArg listenerCfg.discoverSubject}"
|
||||
# Build command line arguments from configuration
|
||||
args = lib.concatStringsSep " " ([
|
||||
"--hostname ${lib.escapeShellArg cfg.hostname}"
|
||||
"--tier ${cfg.tier}"
|
||||
"--nats-url ${lib.escapeShellArg cfg.natsUrl}"
|
||||
"--nkey-file ${lib.escapeShellArg cfg.nkeyFile}"
|
||||
"--flake-url ${lib.escapeShellArg cfg.flakeUrl}"
|
||||
"--timeout ${toString cfg.timeout}"
|
||||
"--discover-subject ${lib.escapeShellArg cfg.discoverSubject}"
|
||||
]
|
||||
++ lib.optional (listenerCfg.role != null) "--role ${lib.escapeShellArg listenerCfg.role}"
|
||||
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") listenerCfg.deploySubjects
|
||||
++ lib.optionals listenerCfg.metrics.enable [
|
||||
++ lib.optional (cfg.role != null) "--role ${lib.escapeShellArg cfg.role}"
|
||||
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") cfg.deploySubjects
|
||||
++ lib.optionals cfg.metrics.enable [
|
||||
"--metrics-enabled"
|
||||
"--metrics-addr ${lib.escapeShellArg listenerCfg.metrics.address}"
|
||||
]);
|
||||
|
||||
# Build command line arguments for builder from configuration
|
||||
builderArgs = lib.concatStringsSep " " ([
|
||||
"--nats-url ${lib.escapeShellArg builderCfg.natsUrl}"
|
||||
"--nkey-file ${lib.escapeShellArg builderCfg.nkeyFile}"
|
||||
"--config ${builderConfigFile}"
|
||||
"--timeout ${toString builderCfg.timeout}"
|
||||
"--metrics-addr ${lib.escapeShellArg cfg.metrics.address}"
|
||||
]
|
||||
++ lib.optionals builderCfg.metrics.enable [
|
||||
"--metrics-enabled"
|
||||
"--metrics-addr ${lib.escapeShellArg builderCfg.metrics.address}"
|
||||
]);
|
||||
++ cfg.extraArgs);
|
||||
|
||||
# Extract port from metrics address for firewall rule
|
||||
extractPort = addr: let
|
||||
metricsPort = let
|
||||
addr = cfg.metrics.address;
|
||||
# Handle both ":9972" and "0.0.0.0:9972" formats
|
||||
parts = lib.splitString ":" addr;
|
||||
in lib.toInt (lib.last parts);
|
||||
|
||||
listenerMetricsPort = extractPort listenerCfg.metrics.address;
|
||||
builderMetricsPort = extractPort builderCfg.metrics.address;
|
||||
|
||||
in
|
||||
{
|
||||
options.services.homelab-deploy.listener = {
|
||||
@@ -151,118 +123,16 @@ in
|
||||
description = "Open firewall for metrics port";
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
options.services.homelab-deploy.builder = {
|
||||
enable = lib.mkEnableOption "homelab-deploy builder service";
|
||||
|
||||
package = lib.mkOption {
|
||||
type = lib.types.package;
|
||||
default = self.packages.${pkgs.system}.homelab-deploy;
|
||||
description = "The homelab-deploy package to use";
|
||||
};
|
||||
|
||||
natsUrl = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
description = "NATS server URL";
|
||||
example = "nats://nats.example.com:4222";
|
||||
};
|
||||
|
||||
nkeyFile = lib.mkOption {
|
||||
type = lib.types.path;
|
||||
description = "Path to NKey seed file for NATS authentication";
|
||||
example = "/run/secrets/homelab-deploy-builder-nkey";
|
||||
};
|
||||
|
||||
configFile = lib.mkOption {
|
||||
type = lib.types.nullOr lib.types.path;
|
||||
default = null;
|
||||
description = ''
|
||||
Path to builder configuration file (YAML).
|
||||
If not specified, a config file will be generated from the `settings` option.
|
||||
'';
|
||||
example = "/etc/homelab-deploy/builder.yaml";
|
||||
};
|
||||
|
||||
settings = {
|
||||
repos = lib.mkOption {
|
||||
type = lib.types.attrsOf (lib.types.submodule {
|
||||
options = {
|
||||
url = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
description = "Git flake URL for the repository";
|
||||
example = "git+https://git.example.com/org/nixos-configs.git";
|
||||
};
|
||||
defaultBranch = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
default = "master";
|
||||
description = "Default branch to build when not specified in request";
|
||||
example = "main";
|
||||
};
|
||||
};
|
||||
});
|
||||
default = {};
|
||||
description = ''
|
||||
Repository configuration for the builder.
|
||||
Each key is the repository name used in build requests.
|
||||
'';
|
||||
example = lib.literalExpression ''
|
||||
{
|
||||
nixos-servers = {
|
||||
url = "git+https://git.example.com/org/nixos-servers.git";
|
||||
defaultBranch = "master";
|
||||
};
|
||||
homelab = {
|
||||
url = "git+ssh://git@github.com/user/homelab.git";
|
||||
defaultBranch = "main";
|
||||
};
|
||||
}
|
||||
'';
|
||||
extraArgs = lib.mkOption {
|
||||
type = lib.types.listOf lib.types.str;
|
||||
default = [ ];
|
||||
description = "Extra command line arguments to pass to the listener";
|
||||
example = [ "--debug" ];
|
||||
};
|
||||
};
|
||||
|
||||
timeout = lib.mkOption {
|
||||
type = lib.types.int;
|
||||
default = 1800;
|
||||
description = "Build timeout in seconds per host";
|
||||
};
|
||||
|
||||
environment = lib.mkOption {
|
||||
type = lib.types.attrsOf lib.types.str;
|
||||
default = { };
|
||||
description = "Additional environment variables for the service";
|
||||
example = { GIT_SSH_COMMAND = "ssh -i /run/secrets/deploy-key"; };
|
||||
};
|
||||
|
||||
metrics = {
|
||||
enable = lib.mkEnableOption "Prometheus metrics endpoint";
|
||||
|
||||
address = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
default = ":9973";
|
||||
description = "Address for Prometheus metrics HTTP server";
|
||||
example = "127.0.0.1:9973";
|
||||
};
|
||||
|
||||
openFirewall = lib.mkOption {
|
||||
type = lib.types.bool;
|
||||
default = false;
|
||||
description = "Open firewall for metrics port";
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
config = lib.mkMerge [
|
||||
(lib.mkIf builderCfg.enable {
|
||||
assertions = [
|
||||
{
|
||||
assertion = builderCfg.configFile != null || builderCfg.settings.repos != {};
|
||||
message = "services.homelab-deploy.builder: either configFile or settings.repos must be specified";
|
||||
}
|
||||
];
|
||||
})
|
||||
|
||||
(lib.mkIf listenerCfg.enable {
|
||||
config = lib.mkIf cfg.enable {
|
||||
systemd.services.homelab-deploy-listener = {
|
||||
description = "homelab-deploy listener";
|
||||
wantedBy = [ "multi-user.target" ];
|
||||
@@ -274,7 +144,7 @@ in
|
||||
stopIfChanged = false;
|
||||
restartIfChanged = false;
|
||||
|
||||
environment = listenerCfg.environment // {
|
||||
environment = cfg.environment // {
|
||||
# Nix needs a writable cache for git flake fetching
|
||||
XDG_CACHE_HOME = "/var/cache/homelab-deploy";
|
||||
};
|
||||
@@ -284,7 +154,7 @@ in
|
||||
serviceConfig = {
|
||||
CacheDirectory = "homelab-deploy";
|
||||
Type = "simple";
|
||||
ExecStart = "${listenerCfg.package}/bin/homelab-deploy listener ${listenerArgs}";
|
||||
ExecStart = "${cfg.package}/bin/homelab-deploy listener ${args}";
|
||||
Restart = "always";
|
||||
RestartSec = 10;
|
||||
|
||||
@@ -297,42 +167,8 @@ in
|
||||
};
|
||||
};
|
||||
|
||||
networking.firewall.allowedTCPPorts = lib.mkIf (listenerCfg.metrics.enable && listenerCfg.metrics.openFirewall) [
|
||||
listenerMetricsPort
|
||||
networking.firewall.allowedTCPPorts = lib.mkIf (cfg.metrics.enable && cfg.metrics.openFirewall) [
|
||||
metricsPort
|
||||
];
|
||||
})
|
||||
|
||||
(lib.mkIf builderCfg.enable {
|
||||
systemd.services.homelab-deploy-builder = {
|
||||
description = "homelab-deploy builder";
|
||||
wantedBy = [ "multi-user.target" ];
|
||||
after = [ "network-online.target" ];
|
||||
wants = [ "network-online.target" ];
|
||||
|
||||
environment = builderCfg.environment // {
|
||||
# Nix needs a writable cache for git flake fetching
|
||||
XDG_CACHE_HOME = "/var/cache/homelab-deploy-builder";
|
||||
};
|
||||
|
||||
path = [ pkgs.git pkgs.nix ];
|
||||
|
||||
serviceConfig = {
|
||||
CacheDirectory = "homelab-deploy-builder";
|
||||
Type = "simple";
|
||||
ExecStart = "${builderCfg.package}/bin/homelab-deploy builder ${builderArgs}";
|
||||
Restart = "always";
|
||||
RestartSec = 10;
|
||||
|
||||
# Minimal hardening - nix build requires broad system access:
|
||||
# - Write access to /nix/store for building
|
||||
# - Kernel namespace support for nix sandbox builds
|
||||
# - Network access for fetching from git/cache
|
||||
};
|
||||
};
|
||||
|
||||
networking.firewall.allowedTCPPorts = lib.mkIf (builderCfg.metrics.enable && builderCfg.metrics.openFirewall) [
|
||||
builderMetricsPort
|
||||
];
|
||||
})
|
||||
];
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user