Compare commits
23 Commits
a19ed50394
...
fix/metric
| Author | SHA1 | Date | |
|---|---|---|---|
|
c272ce6903
|
|||
|
c934d1ba38
|
|||
|
723a1f769f
|
|||
|
46fc6a7e96
|
|||
|
746e30b24f
|
|||
|
fd0d63b103
|
|||
|
36a74b8cf9
|
|||
|
79db119d1c
|
|||
|
56365835c7
|
|||
|
95b795dcfd
|
|||
|
71d6aa8b61
|
|||
|
2c97b6140c
|
|||
|
efacb13b86
|
|||
|
ac3c9c7de6
|
|||
|
9f205fee5e
|
|||
|
5f3cfc3d21
|
|||
|
c9b85435ba
|
|||
|
cf3b1ce2c9
|
|||
|
9237814fed
|
|||
|
f03eb5f7dc
|
|||
|
f51058964d
|
|||
|
95fbfb2339
|
|||
|
e1ab4599a8
|
22
CLAUDE.md
22
CLAUDE.md
@@ -56,13 +56,14 @@ Key Go libraries:
|
||||
- `github.com/nats-io/nats.go` - NATS client
|
||||
- `github.com/nats-io/nkeys` - NKey authentication
|
||||
- `github.com/mark3labs/mcp-go` - MCP server implementation
|
||||
- `github.com/google/uuid` - UUID generation for reply subjects
|
||||
|
||||
## Build Commands
|
||||
|
||||
Run commands through the Nix development shell using `nix develop -c`:
|
||||
|
||||
```bash
|
||||
# Build
|
||||
# Build (for quick syntax checking)
|
||||
nix develop -c go build ./...
|
||||
|
||||
# Run tests
|
||||
@@ -77,10 +78,7 @@ nix develop -c golangci-lint run
|
||||
# Vulnerability check
|
||||
nix develop -c govulncheck ./...
|
||||
|
||||
# Test Nix build
|
||||
nix build
|
||||
|
||||
# Run the binary (prefer this over go build + running binary)
|
||||
# Run the binary (preferred method - builds and runs via Nix)
|
||||
# To pass arguments, use -- before them: nix run .#default -- --help
|
||||
nix run .#default
|
||||
```
|
||||
@@ -92,7 +90,7 @@ Before committing, run the following checks:
|
||||
1. `nix develop -c go test ./...` - Unit tests
|
||||
2. `nix develop -c golangci-lint run` - Linting
|
||||
3. `nix develop -c govulncheck ./...` - Vulnerability scanning
|
||||
4. `nix build` - Verify nix build works
|
||||
4. `nix run .#default -- --version` - Verify nix build works
|
||||
|
||||
## Commit Message Format
|
||||
|
||||
@@ -115,6 +113,16 @@ Follow semantic versioning:
|
||||
- **Minor** (0.x.0): Non-breaking changes adding features
|
||||
- **Major** (x.0.0): Breaking changes
|
||||
|
||||
Update the `const version` in `main.go`. The Nix build extracts the version from there automatically.
|
||||
Update `const version` in `cmd/homelab-deploy/main.go`. The Nix build extracts the version from there automatically.
|
||||
|
||||
**When to bump**: If any Go code has changed, bump the version before committing. Do this automatically when asked to commit. On feature branches, only bump once per branch (check if version has already been bumped compared to master).
|
||||
|
||||
## Updating Dependencies
|
||||
|
||||
When adding or updating Go dependencies:
|
||||
|
||||
1. Run `go get <package>` or `go mod tidy`
|
||||
2. Update `vendorHash` in `flake.nix`:
|
||||
- Set to a fake hash: `vendorHash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";`
|
||||
- Run `nix run .#default -- --version` - the error will show the correct hash
|
||||
- Replace with the correct hash from the error message
|
||||
|
||||
282
README.md
282
README.md
@@ -61,6 +61,10 @@ homelab-deploy listener \
|
||||
| `--timeout` | No | Deployment timeout in seconds (default: 600) |
|
||||
| `--deploy-subject` | No | NATS subjects to subscribe to (repeatable) |
|
||||
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
|
||||
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
|
||||
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
|
||||
| `--heartbeat-interval` | No | Status update interval in seconds during deployment (default: 15) |
|
||||
| `--debug` | No | Enable debug logging for troubleshooting |
|
||||
|
||||
#### Subject Templates
|
||||
|
||||
@@ -102,13 +106,13 @@ homelab-deploy deploy deploy.prod.role.dns \
|
||||
|
||||
#### Deploy Flags
|
||||
|
||||
| Flag | Required | Description |
|
||||
|------|----------|-------------|
|
||||
| `--nats-url` | Yes | NATS server URL |
|
||||
| `--nkey-file` | Yes | Path to NKey seed file |
|
||||
| `--branch` | No | Git branch or commit (default: `master`) |
|
||||
| `--action` | No | nixos-rebuild action (default: `switch`) |
|
||||
| `--timeout` | No | Response timeout in seconds (default: 900) |
|
||||
| Flag | Required | Env Var | Description |
|
||||
|------|----------|---------|-------------|
|
||||
| `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL |
|
||||
| `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file |
|
||||
| `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch or commit (default: `master`) |
|
||||
| `--action` | No | `HOMELAB_DEPLOY_ACTION` | nixos-rebuild action (default: `switch`) |
|
||||
| `--timeout` | No | `HOMELAB_DEPLOY_TIMEOUT` | Response timeout in seconds (default: 900) |
|
||||
|
||||
#### Subject Aliases
|
||||
|
||||
@@ -198,7 +202,7 @@ Add the module to your NixOS configuration:
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `enable` | bool | `false` | Enable the listener service |
|
||||
| `package` | package | `pkgs.homelab-deploy` | Package to use |
|
||||
| `package` | package | from flake | Package to use |
|
||||
| `hostname` | string | `config.networking.hostName` | Hostname for subject templates |
|
||||
| `tier` | enum | required | `"test"` or `"prod"` |
|
||||
| `role` | string | `null` | Role for role-based targeting |
|
||||
@@ -209,6 +213,10 @@ Add the module to your NixOS configuration:
|
||||
| `deploySubjects` | list of string | see below | Subjects to subscribe to |
|
||||
| `discoverSubject` | string | `"deploy.discover"` | Discovery subject |
|
||||
| `environment` | attrs | `{}` | Additional environment variables |
|
||||
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
|
||||
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
|
||||
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
|
||||
| `extraArgs` | list of string | `[]` | Extra command line arguments (e.g., `["--debug"]`) |
|
||||
|
||||
Default `deploySubjects`:
|
||||
```nix
|
||||
@@ -219,6 +227,131 @@ Default `deploySubjects`:
|
||||
]
|
||||
```
|
||||
|
||||
## Prometheus Metrics
|
||||
|
||||
The listener can expose Prometheus metrics for monitoring deployment operations.
|
||||
|
||||
### Enabling Metrics
|
||||
|
||||
**CLI:**
|
||||
```bash
|
||||
homelab-deploy listener \
|
||||
--hostname myhost \
|
||||
--tier prod \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/listener.nkey \
|
||||
--flake-url git+https://git.example.com/user/nixos-configs.git \
|
||||
--metrics-enabled \
|
||||
--metrics-addr :9972
|
||||
```
|
||||
|
||||
**NixOS module:**
|
||||
```nix
|
||||
services.homelab-deploy.listener = {
|
||||
enable = true;
|
||||
tier = "prod";
|
||||
natsUrl = "nats://nats.example.com:4222";
|
||||
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
||||
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
|
||||
metrics = {
|
||||
enable = true;
|
||||
address = ":9972";
|
||||
openFirewall = true; # Optional: open firewall for Prometheus scraping
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
### Available Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `homelab_deploy_deployments_total` | Counter | `status`, `action`, `error_code` | Total deployment requests processed |
|
||||
| `homelab_deploy_deployment_duration_seconds` | Histogram | `action`, `success` | Deployment execution time |
|
||||
| `homelab_deploy_deployment_in_progress` | Gauge | - | 1 if deployment running, 0 otherwise |
|
||||
| `homelab_deploy_info` | Gauge | `hostname`, `tier`, `role`, `version` | Static instance metadata |
|
||||
|
||||
**Label values:**
|
||||
- `status`: `completed`, `failed`, `rejected`
|
||||
- `action`: `switch`, `boot`, `test`, `dry-activate`
|
||||
- `error_code`: `invalid_action`, `invalid_revision`, `already_running`, `build_failed`, `timeout`, or empty
|
||||
- `success`: `true`, `false`
|
||||
|
||||
### HTTP Endpoints
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `/metrics` | Prometheus metrics in text format |
|
||||
| `/health` | Health check (returns `ok`) |
|
||||
|
||||
### Example Prometheus Queries
|
||||
|
||||
```promql
|
||||
# Average deployment duration (last hour)
|
||||
rate(homelab_deploy_deployment_duration_seconds_sum[1h]) /
|
||||
rate(homelab_deploy_deployment_duration_seconds_count[1h])
|
||||
|
||||
# Deployment success rate (last 24 hours)
|
||||
sum(rate(homelab_deploy_deployments_total{status="completed"}[24h])) /
|
||||
sum(rate(homelab_deploy_deployments_total{status=~"completed|failed"}[24h]))
|
||||
|
||||
# 95th percentile deployment time
|
||||
histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[1h]))
|
||||
|
||||
# Currently running deployments across all hosts
|
||||
sum(homelab_deploy_deployment_in_progress)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Debug Logging
|
||||
|
||||
Enable debug logging to diagnose issues with deployments or metrics:
|
||||
|
||||
**CLI:**
|
||||
```bash
|
||||
homelab-deploy listener --debug \
|
||||
--hostname myhost \
|
||||
--tier prod \
|
||||
--nats-url nats://nats.example.com:4222 \
|
||||
--nkey-file /run/secrets/listener.nkey \
|
||||
--flake-url git+https://git.example.com/user/nixos-configs.git \
|
||||
--metrics-enabled
|
||||
```
|
||||
|
||||
**NixOS module:**
|
||||
```nix
|
||||
services.homelab-deploy.listener = {
|
||||
enable = true;
|
||||
tier = "prod";
|
||||
natsUrl = "nats://nats.example.com:4222";
|
||||
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
||||
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
|
||||
metrics.enable = true;
|
||||
extraArgs = [ "--debug" ];
|
||||
};
|
||||
```
|
||||
|
||||
With debug logging enabled, the listener outputs detailed information about metrics recording:
|
||||
|
||||
```json
|
||||
{"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true}
|
||||
{"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5}
|
||||
```
|
||||
|
||||
### Metrics Showing Zero
|
||||
|
||||
If deployment metrics remain at zero after deployments:
|
||||
|
||||
1. **Check metrics are enabled**: Verify `--metrics-enabled` is set and the metrics endpoint is accessible at `/metrics`
|
||||
|
||||
2. **Enable debug logging**: Use `--debug` to confirm metrics recording is being called
|
||||
|
||||
3. **Check deployment status**: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with `status="rejected"` but don't record duration
|
||||
|
||||
4. **Check after restart**: After a successful `switch` deployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics
|
||||
|
||||
5. **Verify Prometheus scrape timing**: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts
|
||||
|
||||
## Message Protocol
|
||||
|
||||
### Deploy Request
|
||||
@@ -256,6 +389,139 @@ nk -gen user -pubout
|
||||
|
||||
Configure appropriate publish/subscribe permissions in your NATS server for each credential type.
|
||||
|
||||
## NATS Subject Structure
|
||||
|
||||
The deployment system uses the following NATS subject hierarchy:
|
||||
|
||||
### Deploy Subjects
|
||||
|
||||
| Subject Pattern | Purpose |
|
||||
|-----------------|---------|
|
||||
| `deploy.<tier>.<hostname>` | Deploy to a specific host |
|
||||
| `deploy.<tier>.all` | Deploy to all hosts in a tier |
|
||||
| `deploy.<tier>.role.<role>` | Deploy to hosts with a specific role in a tier |
|
||||
|
||||
**Tier values:** `test`, `prod`
|
||||
|
||||
**Examples:**
|
||||
- `deploy.test.myhost` - Deploy to myhost in test tier
|
||||
- `deploy.prod.all` - Deploy to all production hosts
|
||||
- `deploy.prod.role.dns` - Deploy to all DNS servers in production
|
||||
|
||||
### Response Subjects
|
||||
|
||||
| Subject Pattern | Purpose |
|
||||
|-----------------|---------|
|
||||
| `deploy.responses.<uuid>` | Unique reply subject for each deployment request |
|
||||
|
||||
Deployers create a unique response subject for each request and include it in the `reply_to` field. Listeners publish status updates to this subject.
|
||||
|
||||
### Discovery Subject
|
||||
|
||||
| Subject Pattern | Purpose |
|
||||
|-----------------|---------|
|
||||
| `deploy.discover` | Host discovery requests and responses |
|
||||
|
||||
Used by the `list_hosts` MCP tool and for discovering available deployment targets.
|
||||
|
||||
## Example NATS Configuration
|
||||
|
||||
Below is an example NATS server configuration implementing tiered authentication. This setup provides:
|
||||
|
||||
- **Listeners** - Each host has credentials to subscribe to its own subjects and publish responses
|
||||
- **Test deployer** - Can deploy to test tier only (suitable for MCP without admin access)
|
||||
- **Admin deployer** - Can deploy to all tiers (for CLI or MCP with admin access)
|
||||
|
||||
```conf
|
||||
authorization {
|
||||
users = [
|
||||
# Listener for a test-tier host
|
||||
{
|
||||
nkey: "UTEST_HOST1_PUBLIC_KEY_HERE"
|
||||
permissions: {
|
||||
subscribe: [
|
||||
"deploy.test.testhost1"
|
||||
"deploy.test.all"
|
||||
"deploy.test.role.>"
|
||||
"deploy.discover"
|
||||
]
|
||||
publish: [
|
||||
"deploy.responses.>"
|
||||
"deploy.discover"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# Listener for a prod-tier host with 'dns' role
|
||||
{
|
||||
nkey: "UPROD_DNS1_PUBLIC_KEY_HERE"
|
||||
permissions: {
|
||||
subscribe: [
|
||||
"deploy.prod.dns1"
|
||||
"deploy.prod.all"
|
||||
"deploy.prod.role.dns"
|
||||
"deploy.discover"
|
||||
]
|
||||
publish: [
|
||||
"deploy.responses.>"
|
||||
"deploy.discover"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# Test-tier deployer (MCP without admin)
|
||||
{
|
||||
nkey: "UTEST_DEPLOYER_PUBLIC_KEY_HERE"
|
||||
permissions: {
|
||||
publish: [
|
||||
"deploy.test.>"
|
||||
"deploy.discover"
|
||||
]
|
||||
subscribe: [
|
||||
"deploy.responses.>"
|
||||
"deploy.discover"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# Admin deployer (full access to all tiers)
|
||||
{
|
||||
nkey: "UADMIN_DEPLOYER_PUBLIC_KEY_HERE"
|
||||
permissions: {
|
||||
publish: [
|
||||
"deploy.>"
|
||||
]
|
||||
subscribe: [
|
||||
"deploy.>"
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Key Permission Patterns
|
||||
|
||||
| Credential Type | Publish | Subscribe |
|
||||
|-----------------|---------|-----------|
|
||||
| Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` |
|
||||
| Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` |
|
||||
| Admin deployer | `deploy.>` | `deploy.>` |
|
||||
|
||||
### Generating NKeys
|
||||
|
||||
```bash
|
||||
# Generate a keypair (outputs public key, saves seed to file)
|
||||
nk -gen user -pubout > mykey.pub
|
||||
# The seed (private key) is printed to stderr - save it securely
|
||||
|
||||
# Or generate and save seed directly
|
||||
nk -gen user > mykey.seed
|
||||
nk -inkey mykey.seed -pubout # Get public key from seed
|
||||
```
|
||||
|
||||
The public key (starting with `U`) goes in the NATS server config. The seed file (starting with `SU`) is used by homelab-deploy via `--nkey-file`.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
@@ -16,7 +16,7 @@ import (
|
||||
"github.com/urfave/cli/v3"
|
||||
)
|
||||
|
||||
const version = "0.1.0"
|
||||
const version = "0.1.14"
|
||||
|
||||
func main() {
|
||||
app := &cli.Command{
|
||||
@@ -27,6 +27,7 @@ func main() {
|
||||
listenerCommand(),
|
||||
mcpCommand(),
|
||||
deployCommand(),
|
||||
listHostsCommand(),
|
||||
},
|
||||
}
|
||||
|
||||
@@ -41,6 +42,10 @@ func listenerCommand() *cli.Command {
|
||||
Name: "listener",
|
||||
Usage: "Run as a deployment listener (systemd service mode)",
|
||||
Flags: []cli.Flag{
|
||||
&cli.BoolFlag{
|
||||
Name: "debug",
|
||||
Usage: "Enable debug logging for troubleshooting",
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "hostname",
|
||||
Usage: "Hostname for this listener",
|
||||
@@ -89,6 +94,20 @@ func listenerCommand() *cli.Command {
|
||||
Usage: "NATS subject for host discovery requests",
|
||||
Value: "deploy.discover",
|
||||
},
|
||||
&cli.BoolFlag{
|
||||
Name: "metrics-enabled",
|
||||
Usage: "Enable Prometheus metrics endpoint",
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "metrics-addr",
|
||||
Usage: "Address for Prometheus metrics HTTP server",
|
||||
Value: ":9972",
|
||||
},
|
||||
&cli.IntFlag{
|
||||
Name: "heartbeat-interval",
|
||||
Usage: "Interval in seconds for sending status updates during deployment (0 to disable)",
|
||||
Value: 15,
|
||||
},
|
||||
},
|
||||
Action: func(ctx context.Context, c *cli.Command) error {
|
||||
tier := c.String("tier")
|
||||
@@ -97,19 +116,29 @@ func listenerCommand() *cli.Command {
|
||||
}
|
||||
|
||||
cfg := listener.Config{
|
||||
Hostname: c.String("hostname"),
|
||||
Tier: tier,
|
||||
Role: c.String("role"),
|
||||
NATSUrl: c.String("nats-url"),
|
||||
NKeyFile: c.String("nkey-file"),
|
||||
FlakeURL: c.String("flake-url"),
|
||||
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||
DeploySubjects: c.StringSlice("deploy-subject"),
|
||||
DiscoverSubject: c.String("discover-subject"),
|
||||
Hostname: c.String("hostname"),
|
||||
Tier: tier,
|
||||
Role: c.String("role"),
|
||||
NATSUrl: c.String("nats-url"),
|
||||
NKeyFile: c.String("nkey-file"),
|
||||
FlakeURL: c.String("flake-url"),
|
||||
Timeout: time.Duration(c.Int("timeout")) * time.Second,
|
||||
HeartbeatInterval: time.Duration(c.Int("heartbeat-interval")) * time.Second,
|
||||
DeploySubjects: c.StringSlice("deploy-subject"),
|
||||
DiscoverSubject: c.String("discover-subject"),
|
||||
MetricsEnabled: c.Bool("metrics-enabled"),
|
||||
MetricsAddr: c.String("metrics-addr"),
|
||||
Version: version,
|
||||
Debug: c.Bool("debug"),
|
||||
}
|
||||
|
||||
logLevel := slog.LevelInfo
|
||||
if c.Bool("debug") {
|
||||
logLevel = slog.LevelDebug
|
||||
}
|
||||
|
||||
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
|
||||
Level: slog.LevelInfo,
|
||||
Level: logLevel,
|
||||
}))
|
||||
|
||||
l := listener.New(cfg, logger)
|
||||
@@ -189,27 +218,32 @@ func deployCommand() *cli.Command {
|
||||
&cli.StringFlag{
|
||||
Name: "nats-url",
|
||||
Usage: "NATS server URL",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_NATS_URL"),
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "nkey-file",
|
||||
Usage: "Path to NKey seed file for NATS authentication",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_NKEY_FILE"),
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "branch",
|
||||
Usage: "Git branch or commit to deploy",
|
||||
Value: "master",
|
||||
Name: "branch",
|
||||
Usage: "Git branch or commit to deploy",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_BRANCH"),
|
||||
Value: "master",
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "action",
|
||||
Usage: "nixos-rebuild action (switch, boot, test, dry-activate)",
|
||||
Value: "switch",
|
||||
Name: "action",
|
||||
Usage: "nixos-rebuild action (switch, boot, test, dry-activate)",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_ACTION"),
|
||||
Value: "switch",
|
||||
},
|
||||
&cli.IntFlag{
|
||||
Name: "timeout",
|
||||
Usage: "Timeout in seconds for collecting responses",
|
||||
Value: 900,
|
||||
Name: "timeout",
|
||||
Usage: "Timeout in seconds for collecting responses",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_TIMEOUT"),
|
||||
Value: 900,
|
||||
},
|
||||
},
|
||||
Action: func(ctx context.Context, c *cli.Command) error {
|
||||
@@ -265,3 +299,88 @@ func deployCommand() *cli.Command {
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
func listHostsCommand() *cli.Command {
|
||||
return &cli.Command{
|
||||
Name: "list-hosts",
|
||||
Usage: "List available deployment targets",
|
||||
Flags: []cli.Flag{
|
||||
&cli.StringFlag{
|
||||
Name: "nats-url",
|
||||
Usage: "NATS server URL",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_NATS_URL"),
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "nkey-file",
|
||||
Usage: "Path to NKey seed file for NATS authentication",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_NKEY_FILE"),
|
||||
Required: true,
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "tier",
|
||||
Usage: "Filter by tier (test or prod)",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_TIER"),
|
||||
},
|
||||
&cli.StringFlag{
|
||||
Name: "discover-subject",
|
||||
Usage: "NATS subject for host discovery",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_DISCOVER_SUBJECT"),
|
||||
Value: "deploy.discover",
|
||||
},
|
||||
&cli.IntFlag{
|
||||
Name: "timeout",
|
||||
Usage: "Timeout in seconds for discovery",
|
||||
Sources: cli.EnvVars("HOMELAB_DEPLOY_DISCOVER_TIMEOUT"),
|
||||
Value: 5,
|
||||
},
|
||||
},
|
||||
Action: func(ctx context.Context, c *cli.Command) error {
|
||||
tierFilter := c.String("tier")
|
||||
if tierFilter != "" && tierFilter != "test" && tierFilter != "prod" {
|
||||
return fmt.Errorf("tier must be 'test' or 'prod', got %q", tierFilter)
|
||||
}
|
||||
|
||||
// Handle shutdown signals
|
||||
ctx, cancel := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
|
||||
defer cancel()
|
||||
|
||||
responses, err := deploycli.Discover(
|
||||
ctx,
|
||||
c.String("nats-url"),
|
||||
c.String("nkey-file"),
|
||||
c.String("discover-subject"),
|
||||
time.Duration(c.Int("timeout"))*time.Second,
|
||||
)
|
||||
if err != nil {
|
||||
return fmt.Errorf("discovery failed: %w", err)
|
||||
}
|
||||
|
||||
if len(responses) == 0 {
|
||||
fmt.Println("No hosts responded to discovery request")
|
||||
return nil
|
||||
}
|
||||
|
||||
fmt.Println("Available deployment targets:")
|
||||
fmt.Println()
|
||||
|
||||
for _, resp := range responses {
|
||||
if tierFilter != "" && resp.Tier != tierFilter {
|
||||
continue
|
||||
}
|
||||
|
||||
role := resp.Role
|
||||
if role == "" {
|
||||
role = "(none)"
|
||||
}
|
||||
|
||||
fmt.Printf("- %s (tier=%s, role=%s)\n", resp.Hostname, resp.Tier, role)
|
||||
for _, subj := range resp.DeploySubjects {
|
||||
fmt.Printf(" %s\n", subj)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
11
flake.nix
11
flake.nix
@@ -15,13 +15,18 @@
|
||||
packages = forAllSystems (system:
|
||||
let
|
||||
pkgs = pkgsFor system;
|
||||
# Extract version from main.go
|
||||
version = builtins.head (
|
||||
builtins.match ''.*const version = "([^"]+)".*''
|
||||
(builtins.readFile ./cmd/homelab-deploy/main.go)
|
||||
);
|
||||
in
|
||||
{
|
||||
homelab-deploy = pkgs.buildGoModule {
|
||||
pname = "homelab-deploy";
|
||||
version = "0.1.0";
|
||||
inherit version;
|
||||
src = ./.;
|
||||
vendorHash = "sha256-JXa+obN62zrrwXlplqojY7dvEunUqDdSTee6N8c5JTg=";
|
||||
vendorHash = "sha256-CN+l0JbQu+HDfotkt3PUFzBexHCHpCKIIZpAQRyojBk=";
|
||||
subPackages = [ "cmd/homelab-deploy" ];
|
||||
};
|
||||
default = self.packages.${system}.homelab-deploy;
|
||||
@@ -44,7 +49,7 @@
|
||||
};
|
||||
});
|
||||
|
||||
nixosModules.default = import ./nixos/module.nix;
|
||||
nixosModules.default = import ./nixos/module.nix { inherit self; };
|
||||
nixosModules.homelab-deploy = self.nixosModules.default;
|
||||
};
|
||||
}
|
||||
|
||||
10
go.mod
10
go.mod
@@ -7,20 +7,30 @@ require (
|
||||
github.com/mark3labs/mcp-go v0.43.2
|
||||
github.com/nats-io/nats.go v1.48.0
|
||||
github.com/nats-io/nkeys v0.4.15
|
||||
github.com/prometheus/client_golang v1.23.2
|
||||
github.com/urfave/cli/v3 v3.6.2
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/bahlo/generic-list-go v0.2.0 // indirect
|
||||
github.com/beorn7/perks v1.0.1 // indirect
|
||||
github.com/buger/jsonparser v1.1.1 // indirect
|
||||
github.com/cespare/xxhash/v2 v2.3.0 // indirect
|
||||
github.com/invopop/jsonschema v0.13.0 // indirect
|
||||
github.com/klauspost/compress v1.18.0 // indirect
|
||||
github.com/kylelemons/godebug v1.1.0 // indirect
|
||||
github.com/mailru/easyjson v0.7.7 // indirect
|
||||
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
|
||||
github.com/nats-io/nuid v1.0.1 // indirect
|
||||
github.com/prometheus/client_model v0.6.2 // indirect
|
||||
github.com/prometheus/common v0.66.1 // indirect
|
||||
github.com/prometheus/procfs v0.16.1 // indirect
|
||||
github.com/spf13/cast v1.7.1 // indirect
|
||||
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
|
||||
github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
|
||||
go.yaml.in/yaml/v2 v2.4.2 // indirect
|
||||
golang.org/x/crypto v0.47.0 // indirect
|
||||
golang.org/x/sys v0.40.0 // indirect
|
||||
google.golang.org/protobuf v1.36.8 // indirect
|
||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
||||
)
|
||||
|
||||
33
go.sum
33
go.sum
@@ -1,13 +1,17 @@
|
||||
github.com/bahlo/generic-list-go v0.2.0 h1:5sz/EEAK+ls5wF+NeqDpk5+iNdMDXrh3z3nPnH1Wvgk=
|
||||
github.com/bahlo/generic-list-go v0.2.0/go.mod h1:2KvAjgMlE5NNynlg/5iLrrCCZ2+5xWbdbCW3pNTGyYg=
|
||||
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
|
||||
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
|
||||
github.com/buger/jsonparser v1.1.1 h1:2PnMjfWD7wBILjqQbt530v576A/cAbQvEW9gGIpYMUs=
|
||||
github.com/buger/jsonparser v1.1.1/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
|
||||
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
|
||||
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
|
||||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8=
|
||||
github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0=
|
||||
github.com/google/go-cmp v0.5.9 h1:O2Tfq5qg4qc4AmwVlvv0oLiVAGB7enBSJ2x2DqQFi38=
|
||||
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
|
||||
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
|
||||
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
|
||||
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
|
||||
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
||||
github.com/invopop/jsonschema v0.13.0 h1:KvpoAJWEjR3uD9Kbm2HWJmqsEaHt8lBUpd0qHcIi21E=
|
||||
@@ -19,10 +23,14 @@ github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
|
||||
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
|
||||
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
|
||||
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
|
||||
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
|
||||
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
|
||||
github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0=
|
||||
github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc=
|
||||
github.com/mark3labs/mcp-go v0.43.2 h1:21PUSlWWiSbUPQwXIJ5WKlETixpFpq+WBpbMGDSVy/I=
|
||||
github.com/mark3labs/mcp-go v0.43.2/go.mod h1:YnJfOL382MIWDx1kMY+2zsRHU/q78dBg9aFb8W6Thdw=
|
||||
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
|
||||
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
|
||||
github.com/nats-io/nats.go v1.48.0 h1:pSFyXApG+yWU/TgbKCjmm5K4wrHu86231/w84qRVR+U=
|
||||
github.com/nats-io/nats.go v1.48.0/go.mod h1:iRWIPokVIFbVijxuMQq4y9ttaBTMe0SFdlZfMDd+33g=
|
||||
github.com/nats-io/nkeys v0.4.15 h1:JACV5jRVO9V856KOapQ7x+EY8Jo3qw1vJt/9Jpwzkk4=
|
||||
@@ -31,8 +39,16 @@ github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw=
|
||||
github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c=
|
||||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
|
||||
github.com/rogpeppe/go-internal v1.9.0 h1:73kH8U+JUqXU8lRuOHeVHaa/SZPifC7BkcraZVejAe8=
|
||||
github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
|
||||
github.com/prometheus/client_golang v1.23.2 h1:Je96obch5RDVy3FDMndoUsjAhG5Edi49h0RJWRi/o0o=
|
||||
github.com/prometheus/client_golang v1.23.2/go.mod h1:Tb1a6LWHB3/SPIzCoaDXI4I8UHKeFTEQ1YCr+0Gyqmg=
|
||||
github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
|
||||
github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE=
|
||||
github.com/prometheus/common v0.66.1 h1:h5E0h5/Y8niHc5DlaLlWLArTQI7tMrsfQjHV+d9ZoGs=
|
||||
github.com/prometheus/common v0.66.1/go.mod h1:gcaUsgf3KfRSwHY4dIMXLPV0K/Wg1oZ8+SbZk/HH/dA=
|
||||
github.com/prometheus/procfs v0.16.1 h1:hZ15bTNuirocR6u0JZ6BAHHmwS1p8B4P6MRqxtzMyRg=
|
||||
github.com/prometheus/procfs v0.16.1/go.mod h1:teAbpZRB1iIAJYREa1LsoWUXykVXA1KlTmWl8x/U+Is=
|
||||
github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ=
|
||||
github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog=
|
||||
github.com/spf13/cast v1.7.1 h1:cuNEagBQEHWN1FnbGEjCXL2szYEXqfJPbP2HNUaca9Y=
|
||||
github.com/spf13/cast v1.7.1/go.mod h1:ancEpBxwJDODSW/UG4rDrAqiKolqNNh2DX3mk86cAdo=
|
||||
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
|
||||
@@ -43,11 +59,18 @@ github.com/wk8/go-ordered-map/v2 v2.1.8 h1:5h/BUHu93oj4gIdvHHHGsScSTMijfx5PeYkE/
|
||||
github.com/wk8/go-ordered-map/v2 v2.1.8/go.mod h1:5nJHM5DyteebpVlHnWMV0rPz6Zp7+xBAnxjb1X5vnTw=
|
||||
github.com/yosida95/uritemplate/v3 v3.0.2 h1:Ed3Oyj9yrmi9087+NczuL5BwkIc4wvTb5zIM+UJPGz4=
|
||||
github.com/yosida95/uritemplate/v3 v3.0.2/go.mod h1:ILOh0sOhIJR3+L/8afwt/kE++YT040gmv5BQTMR2HP4=
|
||||
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
|
||||
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
|
||||
go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI=
|
||||
go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
|
||||
golang.org/x/crypto v0.47.0 h1:V6e3FRj+n4dbpw86FJ8Fv7XVOql7TEwpHapKoMJ/GO8=
|
||||
golang.org/x/crypto v0.47.0/go.mod h1:ff3Y9VzzKbwSSEzWqJsJVBnWmRwRSHt/6Op5n9bQc4A=
|
||||
golang.org/x/sys v0.40.0 h1:DBZZqJ2Rkml6QMQsZywtnjnnGvHza6BTfYFWY9kjEWQ=
|
||||
golang.org/x/sys v0.40.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
|
||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
|
||||
google.golang.org/protobuf v1.36.8 h1:xHScyCOEuuwZEc6UtSOvPbAT4zRh0xcNRYekJwfqyMc=
|
||||
google.golang.org/protobuf v1.36.8/go.mod h1:fuxRtAxBytpl4zzqUh6/eyUujkJdNiuEkXntxiD/uRU=
|
||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
|
||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
|
||||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||
|
||||
@@ -28,14 +28,32 @@ type DeployResult struct {
|
||||
Errors []error
|
||||
}
|
||||
|
||||
// AllSucceeded returns true if all responses indicate success.
|
||||
// AllSucceeded returns true if all hosts' final responses indicate success.
|
||||
func (r *DeployResult) AllSucceeded() bool {
|
||||
if len(r.Errors) > 0 {
|
||||
return false
|
||||
}
|
||||
|
||||
// Track the final status for each host
|
||||
finalStatus := make(map[string]messages.Status)
|
||||
for _, resp := range r.Responses {
|
||||
if resp.Status != messages.StatusCompleted {
|
||||
if resp.Status.IsFinal() {
|
||||
finalStatus[resp.Hostname] = resp.Status
|
||||
}
|
||||
}
|
||||
|
||||
// Need at least one host with a final status
|
||||
if len(finalStatus) == 0 {
|
||||
return false
|
||||
}
|
||||
|
||||
// All final statuses must be completed
|
||||
for _, status := range finalStatus {
|
||||
if status != messages.StatusCompleted {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return len(r.Responses) > 0 && len(r.Errors) == 0
|
||||
return true
|
||||
}
|
||||
|
||||
// HostCount returns the number of unique hosts that responded.
|
||||
@@ -67,7 +85,9 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
|
||||
// Track responses by hostname to handle multiple messages per host
|
||||
var mu sync.Mutex
|
||||
result := &DeployResult{}
|
||||
hostFinal := make(map[string]bool) // track which hosts have sent final status
|
||||
hostFinal := make(map[string]bool) // track which hosts have sent final status
|
||||
hostSeen := make(map[string]bool) // track all hosts that have responded
|
||||
lastResponse := time.Now()
|
||||
|
||||
// Subscribe to reply subject
|
||||
sub, err := client.Subscribe(replySubject, func(subject string, data []byte) {
|
||||
@@ -81,9 +101,11 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
|
||||
|
||||
mu.Lock()
|
||||
result.Responses = append(result.Responses, resp)
|
||||
hostSeen[resp.Hostname] = true
|
||||
if resp.Status.IsFinal() {
|
||||
hostFinal[resp.Hostname] = true
|
||||
}
|
||||
lastResponse = time.Now()
|
||||
mu.Unlock()
|
||||
|
||||
if onResponse != nil {
|
||||
@@ -119,8 +141,7 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
|
||||
// Use a dynamic timeout: wait for initial responses, then extend
|
||||
// timeout after each response until no new responses or max timeout
|
||||
deadline := time.Now().Add(cfg.Timeout)
|
||||
lastResponse := time.Now()
|
||||
idleTimeout := 30 * time.Second // wait this long after last response
|
||||
idleTimeout := 30 * time.Second // wait this long after last response for new hosts
|
||||
|
||||
for {
|
||||
select {
|
||||
@@ -128,7 +149,9 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
|
||||
return result, ctx.Err()
|
||||
case <-time.After(1 * time.Second):
|
||||
mu.Lock()
|
||||
responseCount := len(result.Responses)
|
||||
seenCount := len(hostSeen)
|
||||
finalCount := len(hostFinal)
|
||||
lastResponseTime := lastResponse
|
||||
mu.Unlock()
|
||||
|
||||
now := time.Now()
|
||||
@@ -138,21 +161,19 @@ func Deploy(ctx context.Context, cfg DeployConfig, onResponse func(*messages.Dep
|
||||
return result, nil
|
||||
}
|
||||
|
||||
// If we have responses, use idle timeout
|
||||
if responseCount > 0 {
|
||||
mu.Lock()
|
||||
lastResponseTime := lastResponse
|
||||
// Update lastResponse time if we got new responses
|
||||
if responseCount > 0 {
|
||||
// Simple approximation - in practice you'd track this more precisely
|
||||
lastResponseTime = now
|
||||
}
|
||||
mu.Unlock()
|
||||
|
||||
if now.Sub(lastResponseTime) > idleTimeout {
|
||||
// If all hosts that responded have sent final status, we're done
|
||||
// Add a short grace period for late arrivals from other hosts
|
||||
if seenCount > 0 && seenCount == finalCount {
|
||||
// Wait a bit for any other hosts to respond
|
||||
if now.Sub(lastResponseTime) > 2*time.Second {
|
||||
return result, nil
|
||||
}
|
||||
}
|
||||
|
||||
// If we have responses but waiting for more hosts, use idle timeout
|
||||
if seenCount > 0 && now.Sub(lastResponseTime) > idleTimeout {
|
||||
return result, nil
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -49,6 +49,40 @@ func TestDeployResult_AllSucceeded(t *testing.T) {
|
||||
errors: []error{nil}, // placeholder error
|
||||
want: false,
|
||||
},
|
||||
{
|
||||
name: "with intermediate responses - success",
|
||||
responses: []*messages.DeployResponse{
|
||||
{Hostname: "host1", Status: messages.StatusStarted},
|
||||
{Hostname: "host1", Status: messages.StatusCompleted},
|
||||
},
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "with intermediate responses - failure",
|
||||
responses: []*messages.DeployResponse{
|
||||
{Hostname: "host1", Status: messages.StatusStarted},
|
||||
{Hostname: "host1", Status: messages.StatusFailed},
|
||||
},
|
||||
want: false,
|
||||
},
|
||||
{
|
||||
name: "multiple hosts with intermediate responses",
|
||||
responses: []*messages.DeployResponse{
|
||||
{Hostname: "host1", Status: messages.StatusStarted},
|
||||
{Hostname: "host2", Status: messages.StatusStarted},
|
||||
{Hostname: "host1", Status: messages.StatusCompleted},
|
||||
{Hostname: "host2", Status: messages.StatusCompleted},
|
||||
},
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "only intermediate responses - no final",
|
||||
responses: []*messages.DeployResponse{
|
||||
{Hostname: "host1", Status: messages.StatusStarted},
|
||||
{Hostname: "host1", Status: messages.StatusAccepted},
|
||||
},
|
||||
want: false,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tc := range tests {
|
||||
|
||||
@@ -35,6 +35,15 @@ type Result struct {
|
||||
Error error
|
||||
}
|
||||
|
||||
// ExecuteOptions contains optional settings for Execute.
|
||||
type ExecuteOptions struct {
|
||||
// HeartbeatInterval is how often to call the heartbeat callback.
|
||||
// If zero, no heartbeat is sent.
|
||||
HeartbeatInterval time.Duration
|
||||
// HeartbeatCallback is called periodically with elapsed time while the command runs.
|
||||
HeartbeatCallback func(elapsed time.Duration)
|
||||
}
|
||||
|
||||
// ValidateRevision checks if a revision exists in the remote repository.
|
||||
// It uses git ls-remote to verify the ref exists.
|
||||
func (e *Executor) ValidateRevision(ctx context.Context, revision string) error {
|
||||
@@ -65,6 +74,11 @@ func (e *Executor) ValidateRevision(ctx context.Context, revision string) error
|
||||
|
||||
// Execute runs nixos-rebuild with the specified action and revision.
|
||||
func (e *Executor) Execute(ctx context.Context, action messages.Action, revision string) *Result {
|
||||
return e.ExecuteWithOptions(ctx, action, revision, nil)
|
||||
}
|
||||
|
||||
// ExecuteWithOptions runs nixos-rebuild with the specified action, revision, and options.
|
||||
func (e *Executor) ExecuteWithOptions(ctx context.Context, action messages.Action, revision string, opts *ExecuteOptions) *Result {
|
||||
ctx, cancel := context.WithTimeout(ctx, e.timeout)
|
||||
defer cancel()
|
||||
|
||||
@@ -77,7 +91,41 @@ func (e *Executor) Execute(ctx context.Context, action messages.Action, revision
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
|
||||
err := cmd.Run()
|
||||
// Start the command
|
||||
startTime := time.Now()
|
||||
if err := cmd.Start(); err != nil {
|
||||
return &Result{
|
||||
Success: false,
|
||||
ExitCode: -1,
|
||||
Error: fmt.Errorf("failed to start command: %w", err),
|
||||
}
|
||||
}
|
||||
|
||||
// Set up heartbeat if configured
|
||||
var heartbeatDone chan struct{}
|
||||
if opts != nil && opts.HeartbeatInterval > 0 && opts.HeartbeatCallback != nil {
|
||||
heartbeatDone = make(chan struct{})
|
||||
go func() {
|
||||
ticker := time.NewTicker(opts.HeartbeatInterval)
|
||||
defer ticker.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatDone:
|
||||
return
|
||||
case <-ticker.C:
|
||||
opts.HeartbeatCallback(time.Since(startTime))
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// Wait for command to complete
|
||||
err := cmd.Wait()
|
||||
|
||||
// Stop heartbeat goroutine
|
||||
if heartbeatDone != nil {
|
||||
close(heartbeatDone)
|
||||
}
|
||||
|
||||
result := &Result{
|
||||
Stdout: stdout.String(),
|
||||
|
||||
@@ -8,20 +8,26 @@ import (
|
||||
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/deploy"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/metrics"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/nats"
|
||||
)
|
||||
|
||||
// Config holds the configuration for the listener.
|
||||
type Config struct {
|
||||
Hostname string
|
||||
Tier string
|
||||
Role string
|
||||
NATSUrl string
|
||||
NKeyFile string
|
||||
FlakeURL string
|
||||
Timeout time.Duration
|
||||
DeploySubjects []string
|
||||
DiscoverSubject string
|
||||
Hostname string
|
||||
Tier string
|
||||
Role string
|
||||
NATSUrl string
|
||||
NKeyFile string
|
||||
FlakeURL string
|
||||
Timeout time.Duration
|
||||
HeartbeatInterval time.Duration
|
||||
DeploySubjects []string
|
||||
DiscoverSubject string
|
||||
MetricsEnabled bool
|
||||
MetricsAddr string
|
||||
Version string
|
||||
Debug bool
|
||||
}
|
||||
|
||||
// Listener handles deployment requests from NATS.
|
||||
@@ -34,6 +40,14 @@ type Listener struct {
|
||||
|
||||
// Expanded subjects for discovery responses
|
||||
expandedSubjects []string
|
||||
|
||||
// restartCh signals that the listener should exit for restart
|
||||
// (e.g., after a successful switch deployment)
|
||||
restartCh chan struct{}
|
||||
|
||||
// metrics server and collector (nil if metrics disabled)
|
||||
metricsServer *metrics.Server
|
||||
metrics *metrics.Collector
|
||||
}
|
||||
|
||||
// New creates a new listener with the given configuration.
|
||||
@@ -42,16 +56,42 @@ func New(cfg Config, logger *slog.Logger) *Listener {
|
||||
logger = slog.Default()
|
||||
}
|
||||
|
||||
return &Listener{
|
||||
cfg: cfg,
|
||||
executor: deploy.NewExecutor(cfg.FlakeURL, cfg.Hostname, cfg.Timeout),
|
||||
lock: deploy.NewLock(),
|
||||
logger: logger,
|
||||
l := &Listener{
|
||||
cfg: cfg,
|
||||
executor: deploy.NewExecutor(cfg.FlakeURL, cfg.Hostname, cfg.Timeout),
|
||||
lock: deploy.NewLock(),
|
||||
logger: logger,
|
||||
restartCh: make(chan struct{}, 1),
|
||||
}
|
||||
|
||||
if cfg.MetricsEnabled {
|
||||
l.metricsServer = metrics.NewServer(metrics.ServerConfig{
|
||||
Addr: cfg.MetricsAddr,
|
||||
Logger: logger,
|
||||
})
|
||||
l.metrics = l.metricsServer.Collector()
|
||||
}
|
||||
|
||||
return l
|
||||
}
|
||||
|
||||
// Run starts the listener and blocks until the context is cancelled.
|
||||
func (l *Listener) Run(ctx context.Context) error {
|
||||
// Start metrics server if enabled
|
||||
if l.metricsServer != nil {
|
||||
if err := l.metricsServer.Start(); err != nil {
|
||||
return fmt.Errorf("failed to start metrics server: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
_ = l.metricsServer.Shutdown(shutdownCtx)
|
||||
}()
|
||||
|
||||
// Set instance info metric
|
||||
l.metrics.SetInfo(l.cfg.Hostname, l.cfg.Tier, l.cfg.Role, l.cfg.Version)
|
||||
}
|
||||
|
||||
// Connect to NATS
|
||||
l.logger.Info("connecting to NATS",
|
||||
"url", l.cfg.NATSUrl,
|
||||
@@ -93,9 +133,13 @@ func (l *Listener) Run(ctx context.Context) error {
|
||||
|
||||
l.logger.Info("listener started", "deploy_subjects", l.expandedSubjects, "discover_subject", discoverSubject)
|
||||
|
||||
// Wait for context cancellation
|
||||
<-ctx.Done()
|
||||
l.logger.Info("shutting down listener")
|
||||
// Wait for context cancellation or restart signal
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
l.logger.Info("shutting down listener")
|
||||
case <-l.restartCh:
|
||||
l.logger.Info("exiting for restart after successful switch deployment")
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
@@ -127,6 +171,9 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
messages.StatusRejected,
|
||||
err.Error(),
|
||||
).WithError(messages.ErrorInvalidAction))
|
||||
if l.metrics != nil {
|
||||
l.metrics.RecordRejection(req.Action, messages.ErrorInvalidAction)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
@@ -141,6 +188,9 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
messages.StatusRejected,
|
||||
"another deployment is already in progress",
|
||||
).WithError(messages.ErrorAlreadyRunning))
|
||||
if l.metrics != nil {
|
||||
l.metrics.RecordRejection(req.Action, messages.ErrorAlreadyRunning)
|
||||
}
|
||||
return
|
||||
}
|
||||
defer l.lock.Release()
|
||||
@@ -152,6 +202,19 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
fmt.Sprintf("starting deployment: %s", l.executor.BuildCommand(req.Action, req.Revision)),
|
||||
))
|
||||
|
||||
// Record deployment start for metrics
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment start metric",
|
||||
"metrics_enabled", true,
|
||||
)
|
||||
l.metrics.RecordDeploymentStart()
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment start metric",
|
||||
"metrics_enabled", false,
|
||||
)
|
||||
}
|
||||
startTime := time.Now()
|
||||
|
||||
// Validate revision
|
||||
ctx := context.Background()
|
||||
if err := l.executor.ValidateRevision(ctx, req.Revision); err != nil {
|
||||
@@ -164,6 +227,20 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
messages.StatusFailed,
|
||||
fmt.Sprintf("revision validation failed: %v", err),
|
||||
).WithError(messages.ErrorInvalidRevision))
|
||||
duration := time.Since(startTime).Seconds()
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment failure metric (revision validation)",
|
||||
"action", req.Action,
|
||||
"error_code", messages.ErrorInvalidRevision,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
l.metrics.RecordDeploymentFailure(req.Action, messages.ErrorInvalidRevision, duration)
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment failure metric",
|
||||
"metrics_enabled", false,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
@@ -174,7 +251,23 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
"command", l.executor.BuildCommand(req.Action, req.Revision),
|
||||
)
|
||||
|
||||
result := l.executor.Execute(ctx, req.Action, req.Revision)
|
||||
// Set up heartbeat options to send periodic status updates
|
||||
var opts *deploy.ExecuteOptions
|
||||
if l.cfg.HeartbeatInterval > 0 {
|
||||
opts = &deploy.ExecuteOptions{
|
||||
HeartbeatInterval: l.cfg.HeartbeatInterval,
|
||||
HeartbeatCallback: func(elapsed time.Duration) {
|
||||
l.sendResponse(req.ReplyTo, messages.NewDeployResponse(
|
||||
l.cfg.Hostname,
|
||||
messages.StatusRunning,
|
||||
fmt.Sprintf("deployment in progress (%s elapsed)", elapsed.Round(time.Second)),
|
||||
))
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
result := l.executor.ExecuteWithOptions(ctx, req.Action, req.Revision, opts)
|
||||
duration := time.Since(startTime).Seconds()
|
||||
|
||||
if result.Success {
|
||||
l.logger.Info("deployment completed successfully",
|
||||
@@ -185,6 +278,43 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
messages.StatusCompleted,
|
||||
"deployment completed successfully",
|
||||
))
|
||||
// Flush to ensure the completed response is sent before we potentially restart
|
||||
if err := l.client.Flush(); err != nil {
|
||||
l.logger.Error("failed to flush completed response", "error", err)
|
||||
}
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment end metric (success)",
|
||||
"action", req.Action,
|
||||
"success", true,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
l.metrics.RecordDeploymentEnd(req.Action, true, duration)
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment end metric",
|
||||
"metrics_enabled", false,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
}
|
||||
|
||||
// After a successful switch, signal restart so we pick up any new version
|
||||
if req.Action == messages.ActionSwitch {
|
||||
// Wait for metrics scrape before restarting (if metrics enabled)
|
||||
if l.metricsServer != nil {
|
||||
l.logger.Info("waiting for metrics scrape before restart")
|
||||
select {
|
||||
case <-l.metricsServer.ScrapeCh():
|
||||
l.logger.Info("metrics scraped, proceeding with restart")
|
||||
case <-time.After(60 * time.Second):
|
||||
l.logger.Warn("no metrics scrape within timeout, proceeding with restart anyway")
|
||||
}
|
||||
}
|
||||
|
||||
select {
|
||||
case l.restartCh <- struct{}{}:
|
||||
default:
|
||||
// Channel already has a signal pending
|
||||
}
|
||||
}
|
||||
} else {
|
||||
l.logger.Error("deployment failed",
|
||||
"exit_code", result.ExitCode,
|
||||
@@ -202,6 +332,19 @@ func (l *Listener) handleDeployRequest(subject string, data []byte) {
|
||||
messages.StatusFailed,
|
||||
fmt.Sprintf("deployment failed (exit code %d): %s", result.ExitCode, result.Stderr),
|
||||
).WithError(errorCode))
|
||||
if l.metrics != nil {
|
||||
l.logger.Debug("recording deployment failure metric",
|
||||
"action", req.Action,
|
||||
"error_code", errorCode,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
l.metrics.RecordDeploymentFailure(req.Action, errorCode, duration)
|
||||
} else {
|
||||
l.logger.Debug("skipping deployment failure metric",
|
||||
"metrics_enabled", false,
|
||||
"duration_seconds", duration,
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -2,8 +2,14 @@ package listener
|
||||
|
||||
import (
|
||||
"log/slog"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/metrics"
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
"github.com/prometheus/client_golang/prometheus/testutil"
|
||||
)
|
||||
|
||||
func TestNew(t *testing.T) {
|
||||
@@ -51,3 +57,148 @@ func TestNew_WithLogger(t *testing.T) {
|
||||
t.Error("should use provided logger")
|
||||
}
|
||||
}
|
||||
|
||||
func TestNew_WithMetricsEnabled(t *testing.T) {
|
||||
cfg := Config{
|
||||
Hostname: "test-host",
|
||||
Tier: "test",
|
||||
MetricsEnabled: true,
|
||||
MetricsAddr: ":0",
|
||||
}
|
||||
|
||||
l := New(cfg, nil)
|
||||
|
||||
if l.metricsServer == nil {
|
||||
t.Error("metricsServer should not be nil when MetricsEnabled is true")
|
||||
}
|
||||
if l.metrics == nil {
|
||||
t.Error("metrics should not be nil when MetricsEnabled is true")
|
||||
}
|
||||
}
|
||||
|
||||
func TestListener_MetricsRecordedOnDeployment(t *testing.T) {
|
||||
// This test verifies that the listener correctly calls metrics functions
|
||||
// when processing deployments. We test this by directly calling the internal
|
||||
// metrics recording logic that handleDeployRequest uses.
|
||||
|
||||
reg := prometheus.NewRegistry()
|
||||
collector := metrics.NewCollector(reg)
|
||||
|
||||
// Simulate what handleDeployRequest does for a successful deployment
|
||||
collector.RecordDeploymentStart()
|
||||
collector.RecordDeploymentEnd(messages.ActionSwitch, true, 120.5)
|
||||
|
||||
// Verify counter was incremented
|
||||
counterExpected := `
|
||||
# HELP homelab_deploy_deployments_total Total deployment requests processed
|
||||
# TYPE homelab_deploy_deployments_total counter
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 1
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Verify histogram was updated (120.5 seconds falls into le="300" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 120.5
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -35,6 +35,7 @@ const (
|
||||
StatusAccepted Status = "accepted"
|
||||
StatusRejected Status = "rejected"
|
||||
StatusStarted Status = "started"
|
||||
StatusRunning Status = "running"
|
||||
StatusCompleted Status = "completed"
|
||||
StatusFailed Status = "failed"
|
||||
)
|
||||
|
||||
125
internal/metrics/metrics.go
Normal file
125
internal/metrics/metrics.go
Normal file
@@ -0,0 +1,125 @@
|
||||
// Package metrics provides Prometheus metrics for the homelab-deploy listener.
|
||||
package metrics
|
||||
|
||||
import (
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
)
|
||||
|
||||
// Collector holds all Prometheus metrics for the listener.
|
||||
type Collector struct {
|
||||
deploymentsTotal *prometheus.CounterVec
|
||||
deploymentDuration *prometheus.HistogramVec
|
||||
deploymentInProgress prometheus.Gauge
|
||||
info *prometheus.GaugeVec
|
||||
}
|
||||
|
||||
// NewCollector creates a new metrics collector and registers it with the given registerer.
|
||||
func NewCollector(reg prometheus.Registerer) *Collector {
|
||||
c := &Collector{
|
||||
deploymentsTotal: prometheus.NewCounterVec(
|
||||
prometheus.CounterOpts{
|
||||
Name: "homelab_deploy_deployments_total",
|
||||
Help: "Total deployment requests processed",
|
||||
},
|
||||
[]string{"status", "action", "error_code"},
|
||||
),
|
||||
deploymentDuration: prometheus.NewHistogramVec(
|
||||
prometheus.HistogramOpts{
|
||||
Name: "homelab_deploy_deployment_duration_seconds",
|
||||
Help: "Deployment execution time",
|
||||
// Bucket boundaries for typical NixOS build times
|
||||
Buckets: []float64{30, 60, 120, 300, 600, 900, 1200, 1800},
|
||||
},
|
||||
[]string{"action", "success"},
|
||||
),
|
||||
deploymentInProgress: prometheus.NewGauge(
|
||||
prometheus.GaugeOpts{
|
||||
Name: "homelab_deploy_deployment_in_progress",
|
||||
Help: "1 if deployment running, 0 otherwise",
|
||||
},
|
||||
),
|
||||
info: prometheus.NewGaugeVec(
|
||||
prometheus.GaugeOpts{
|
||||
Name: "homelab_deploy_info",
|
||||
Help: "Static instance metadata",
|
||||
},
|
||||
[]string{"hostname", "tier", "role", "version"},
|
||||
),
|
||||
}
|
||||
|
||||
reg.MustRegister(c.deploymentsTotal)
|
||||
reg.MustRegister(c.deploymentDuration)
|
||||
reg.MustRegister(c.deploymentInProgress)
|
||||
reg.MustRegister(c.info)
|
||||
|
||||
c.initMetrics()
|
||||
|
||||
return c
|
||||
}
|
||||
|
||||
// initMetrics initializes all metric label combinations with zero values.
|
||||
// This ensures metrics appear in Prometheus scrapes before any deployments occur.
|
||||
func (c *Collector) initMetrics() {
|
||||
actions := []messages.Action{
|
||||
messages.ActionSwitch,
|
||||
messages.ActionBoot,
|
||||
messages.ActionTest,
|
||||
messages.ActionDryActivate,
|
||||
}
|
||||
|
||||
// Initialize deployment counter for common status/action combinations
|
||||
for _, action := range actions {
|
||||
// Successful completions (no error code)
|
||||
c.deploymentsTotal.WithLabelValues("completed", string(action), "")
|
||||
// Failed deployments (no error code - from RecordDeploymentEnd)
|
||||
c.deploymentsTotal.WithLabelValues("failed", string(action), "")
|
||||
}
|
||||
|
||||
// Initialize histogram for all action/success combinations
|
||||
for _, action := range actions {
|
||||
c.deploymentDuration.WithLabelValues(string(action), "true")
|
||||
c.deploymentDuration.WithLabelValues(string(action), "false")
|
||||
}
|
||||
}
|
||||
|
||||
// SetInfo sets the static instance metadata.
|
||||
func (c *Collector) SetInfo(hostname, tier, role, version string) {
|
||||
c.info.WithLabelValues(hostname, tier, role, version).Set(1)
|
||||
}
|
||||
|
||||
// RecordDeploymentStart marks the start of a deployment.
|
||||
func (c *Collector) RecordDeploymentStart() {
|
||||
c.deploymentInProgress.Set(1)
|
||||
}
|
||||
|
||||
// RecordDeploymentEnd records the completion of a deployment.
|
||||
func (c *Collector) RecordDeploymentEnd(action messages.Action, success bool, durationSeconds float64) {
|
||||
c.deploymentInProgress.Set(0)
|
||||
|
||||
successLabel := "false"
|
||||
if success {
|
||||
successLabel = "true"
|
||||
}
|
||||
|
||||
c.deploymentDuration.WithLabelValues(string(action), successLabel).Observe(durationSeconds)
|
||||
|
||||
status := "completed"
|
||||
if !success {
|
||||
status = "failed"
|
||||
}
|
||||
|
||||
c.deploymentsTotal.WithLabelValues(status, string(action), "").Inc()
|
||||
}
|
||||
|
||||
// RecordDeploymentFailure records a deployment failure with an error code.
|
||||
func (c *Collector) RecordDeploymentFailure(action messages.Action, errorCode messages.ErrorCode, durationSeconds float64) {
|
||||
c.deploymentInProgress.Set(0)
|
||||
c.deploymentDuration.WithLabelValues(string(action), "false").Observe(durationSeconds)
|
||||
c.deploymentsTotal.WithLabelValues("failed", string(action), string(errorCode)).Inc()
|
||||
}
|
||||
|
||||
// RecordRejection records a rejected deployment request.
|
||||
func (c *Collector) RecordRejection(action messages.Action, errorCode messages.ErrorCode) {
|
||||
c.deploymentsTotal.WithLabelValues("rejected", string(action), string(errorCode)).Inc()
|
||||
}
|
||||
650
internal/metrics/metrics_test.go
Normal file
650
internal/metrics/metrics_test.go
Normal file
@@ -0,0 +1,650 @@
|
||||
package metrics
|
||||
|
||||
import (
|
||||
"context"
|
||||
"io"
|
||||
"net/http"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.t-juice.club/torjus/homelab-deploy/internal/messages"
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
"github.com/prometheus/client_golang/prometheus/testutil"
|
||||
)
|
||||
|
||||
func TestCollector_SetInfo(t *testing.T) {
|
||||
reg := prometheus.NewRegistry()
|
||||
c := NewCollector(reg)
|
||||
|
||||
c.SetInfo("testhost", "test", "web", "1.0.0")
|
||||
|
||||
expected := `
|
||||
# HELP homelab_deploy_info Static instance metadata
|
||||
# TYPE homelab_deploy_info gauge
|
||||
homelab_deploy_info{hostname="testhost",role="web",tier="test",version="1.0.0"} 1
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(expected), "homelab_deploy_info"); err != nil {
|
||||
t.Errorf("unexpected metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordDeploymentStart(t *testing.T) {
|
||||
reg := prometheus.NewRegistry()
|
||||
c := NewCollector(reg)
|
||||
|
||||
c.RecordDeploymentStart()
|
||||
|
||||
expected := `
|
||||
# HELP homelab_deploy_deployment_in_progress 1 if deployment running, 0 otherwise
|
||||
# TYPE homelab_deploy_deployment_in_progress gauge
|
||||
homelab_deploy_deployment_in_progress 1
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(expected), "homelab_deploy_deployment_in_progress"); err != nil {
|
||||
t.Errorf("unexpected metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordDeploymentEnd_Success(t *testing.T) {
|
||||
reg := prometheus.NewRegistry()
|
||||
c := NewCollector(reg)
|
||||
|
||||
c.RecordDeploymentStart()
|
||||
c.RecordDeploymentEnd(messages.ActionSwitch, true, 120.5)
|
||||
|
||||
// Check in_progress is 0
|
||||
inProgressExpected := `
|
||||
# HELP homelab_deploy_deployment_in_progress 1 if deployment running, 0 otherwise
|
||||
# TYPE homelab_deploy_deployment_in_progress gauge
|
||||
homelab_deploy_deployment_in_progress 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(inProgressExpected), "homelab_deploy_deployment_in_progress"); err != nil {
|
||||
t.Errorf("unexpected in_progress metrics: %v", err)
|
||||
}
|
||||
|
||||
// Check counter incremented (includes all pre-initialized metrics)
|
||||
counterExpected := `
|
||||
# HELP homelab_deploy_deployments_total Total deployment requests processed
|
||||
# TYPE homelab_deploy_deployments_total counter
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 1
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Check histogram recorded the duration (120.5 seconds falls into le="300" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 120.5
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordDeploymentEnd_Failure(t *testing.T) {
|
||||
reg := prometheus.NewRegistry()
|
||||
c := NewCollector(reg)
|
||||
|
||||
c.RecordDeploymentStart()
|
||||
c.RecordDeploymentEnd(messages.ActionBoot, false, 60.0)
|
||||
|
||||
counterExpected := `
|
||||
# HELP homelab_deploy_deployments_total Total deployment requests processed
|
||||
# TYPE homelab_deploy_deployments_total counter
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 1
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Check histogram recorded the duration (60.0 seconds falls into le="60" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 60
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordDeploymentFailure(t *testing.T) {
|
||||
reg := prometheus.NewRegistry()
|
||||
c := NewCollector(reg)
|
||||
|
||||
c.RecordDeploymentStart()
|
||||
c.RecordDeploymentFailure(messages.ActionSwitch, messages.ErrorBuildFailed, 300.0)
|
||||
|
||||
counterExpected := `
|
||||
# HELP homelab_deploy_deployments_total Total deployment requests processed
|
||||
# TYPE homelab_deploy_deployments_total counter
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="build_failed",status="failed"} 1
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected counter metrics: %v", err)
|
||||
}
|
||||
|
||||
// Check histogram recorded the duration (300.0 seconds falls into le="300" and higher buckets)
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 1
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 300
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 1
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("unexpected histogram metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_RecordRejection(t *testing.T) {
|
||||
reg := prometheus.NewRegistry()
|
||||
c := NewCollector(reg)
|
||||
|
||||
c.RecordRejection(messages.ActionSwitch, messages.ErrorAlreadyRunning)
|
||||
|
||||
expected := `
|
||||
# HELP homelab_deploy_deployments_total Total deployment requests processed
|
||||
# TYPE homelab_deploy_deployments_total counter
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="already_running",status="rejected"} 1
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(expected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("unexpected metrics: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollector_MetricsInitializedAtStartup(t *testing.T) {
|
||||
reg := prometheus.NewRegistry()
|
||||
_ = NewCollector(reg)
|
||||
|
||||
// Verify counter metrics are initialized with zero values before any deployments
|
||||
counterExpected := `
|
||||
# HELP homelab_deploy_deployments_total Total deployment requests processed
|
||||
# TYPE homelab_deploy_deployments_total counter
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="boot",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="dry-activate",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="switch",error_code="",status="failed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="completed"} 0
|
||||
homelab_deploy_deployments_total{action="test",error_code="",status="failed"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(counterExpected), "homelab_deploy_deployments_total"); err != nil {
|
||||
t.Errorf("counter metrics not initialized: %v", err)
|
||||
}
|
||||
|
||||
// Verify histogram metrics are initialized with zero values before any deployments
|
||||
histogramExpected := `
|
||||
# HELP homelab_deploy_deployment_duration_seconds Deployment execution time
|
||||
# TYPE homelab_deploy_deployment_duration_seconds histogram
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="boot",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="boot",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="dry-activate",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="dry-activate",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="switch",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="switch",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="false",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="false"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="30"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="60"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="120"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="300"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="600"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="900"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1200"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="1800"} 0
|
||||
homelab_deploy_deployment_duration_seconds_bucket{action="test",success="true",le="+Inf"} 0
|
||||
homelab_deploy_deployment_duration_seconds_sum{action="test",success="true"} 0
|
||||
homelab_deploy_deployment_duration_seconds_count{action="test",success="true"} 0
|
||||
`
|
||||
if err := testutil.GatherAndCompare(reg, strings.NewReader(histogramExpected), "homelab_deploy_deployment_duration_seconds"); err != nil {
|
||||
t.Errorf("histogram metrics not initialized: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestServer_StartShutdown(t *testing.T) {
|
||||
srv := NewServer(ServerConfig{
|
||||
Addr: ":0", // Let OS pick a free port
|
||||
})
|
||||
|
||||
if err := srv.Start(); err != nil {
|
||||
t.Fatalf("failed to start server: %v", err)
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
|
||||
if err := srv.Shutdown(ctx); err != nil {
|
||||
t.Errorf("failed to shutdown server: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestServer_Endpoints(t *testing.T) {
|
||||
srv := NewServer(ServerConfig{
|
||||
Addr: "127.0.0.1:19972", // Use a fixed port for testing
|
||||
})
|
||||
|
||||
if err := srv.Start(); err != nil {
|
||||
t.Fatalf("failed to start server: %v", err)
|
||||
}
|
||||
|
||||
defer func() {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
_ = srv.Shutdown(ctx)
|
||||
}()
|
||||
|
||||
// Give server time to start
|
||||
time.Sleep(50 * time.Millisecond)
|
||||
|
||||
t.Run("health endpoint", func(t *testing.T) {
|
||||
resp, err := http.Get("http://127.0.0.1:19972/health")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get health endpoint: %v", err)
|
||||
}
|
||||
defer func() { _ = resp.Body.Close() }()
|
||||
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
t.Errorf("expected status 200, got %d", resp.StatusCode)
|
||||
}
|
||||
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
if string(body) != "ok" {
|
||||
t.Errorf("expected body 'ok', got %q", string(body))
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("metrics endpoint", func(t *testing.T) {
|
||||
// Set some info to have metrics to display
|
||||
srv.Collector().SetInfo("testhost", "test", "web", "1.0.0")
|
||||
|
||||
resp, err := http.Get("http://127.0.0.1:19972/metrics")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get metrics endpoint: %v", err)
|
||||
}
|
||||
defer func() { _ = resp.Body.Close() }()
|
||||
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
t.Errorf("expected status 200, got %d", resp.StatusCode)
|
||||
}
|
||||
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
bodyStr := string(body)
|
||||
|
||||
if !strings.Contains(bodyStr, "homelab_deploy_info") {
|
||||
t.Error("expected metrics to contain homelab_deploy_info")
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
func TestServer_Collector(t *testing.T) {
|
||||
srv := NewServer(ServerConfig{
|
||||
Addr: ":0",
|
||||
})
|
||||
|
||||
collector := srv.Collector()
|
||||
if collector == nil {
|
||||
t.Error("expected non-nil collector")
|
||||
}
|
||||
}
|
||||
102
internal/metrics/server.go
Normal file
102
internal/metrics/server.go
Normal file
@@ -0,0 +1,102 @@
|
||||
package metrics
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"time"
|
||||
|
||||
"github.com/prometheus/client_golang/prometheus"
|
||||
"github.com/prometheus/client_golang/prometheus/promhttp"
|
||||
)
|
||||
|
||||
// ServerConfig holds configuration for the metrics server.
|
||||
type ServerConfig struct {
|
||||
Addr string
|
||||
Logger *slog.Logger
|
||||
}
|
||||
|
||||
// Server serves Prometheus metrics over HTTP.
|
||||
type Server struct {
|
||||
httpServer *http.Server
|
||||
registry *prometheus.Registry
|
||||
collector *Collector
|
||||
logger *slog.Logger
|
||||
scrapeCh chan struct{}
|
||||
}
|
||||
|
||||
// NewServer creates a new metrics server.
|
||||
func NewServer(cfg ServerConfig) *Server {
|
||||
logger := cfg.Logger
|
||||
if logger == nil {
|
||||
logger = slog.Default()
|
||||
}
|
||||
|
||||
registry := prometheus.NewRegistry()
|
||||
collector := NewCollector(registry)
|
||||
|
||||
scrapeCh := make(chan struct{}, 1)
|
||||
|
||||
metricsHandler := promhttp.HandlerFor(registry, promhttp.HandlerOpts{
|
||||
Registry: registry,
|
||||
})
|
||||
|
||||
mux := http.NewServeMux()
|
||||
mux.Handle("/metrics", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
metricsHandler.ServeHTTP(w, r)
|
||||
// Signal that a scrape occurred (non-blocking)
|
||||
select {
|
||||
case scrapeCh <- struct{}{}:
|
||||
default:
|
||||
}
|
||||
}))
|
||||
mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write([]byte("ok"))
|
||||
})
|
||||
|
||||
return &Server{
|
||||
httpServer: &http.Server{
|
||||
Addr: cfg.Addr,
|
||||
Handler: mux,
|
||||
ReadHeaderTimeout: 10 * time.Second,
|
||||
},
|
||||
registry: registry,
|
||||
collector: collector,
|
||||
logger: logger,
|
||||
scrapeCh: scrapeCh,
|
||||
}
|
||||
}
|
||||
|
||||
// Collector returns the metrics collector.
|
||||
func (s *Server) Collector() *Collector {
|
||||
return s.collector
|
||||
}
|
||||
|
||||
// ScrapeCh returns a channel that receives a signal each time the metrics endpoint is scraped.
|
||||
func (s *Server) ScrapeCh() <-chan struct{} {
|
||||
return s.scrapeCh
|
||||
}
|
||||
|
||||
// Start starts the HTTP server in a goroutine.
|
||||
func (s *Server) Start() error {
|
||||
s.logger.Info("starting metrics server", "addr", s.httpServer.Addr)
|
||||
|
||||
go func() {
|
||||
if err := s.httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
|
||||
s.logger.Error("metrics server error", "error", err)
|
||||
}
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Shutdown gracefully shuts down the server.
|
||||
func (s *Server) Shutdown(ctx context.Context) error {
|
||||
s.logger.Info("shutting down metrics server")
|
||||
if err := s.httpServer.Shutdown(ctx); err != nil {
|
||||
return fmt.Errorf("failed to shutdown metrics server: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
@@ -25,6 +25,15 @@ type Client struct {
|
||||
|
||||
// Connect establishes a connection to NATS using NKey authentication.
|
||||
func Connect(cfg Config) (*Client, error) {
|
||||
// Verify NKey file has secure permissions (no group/other access)
|
||||
info, err := os.Stat(cfg.NKeyFile)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to stat nkey file: %w", err)
|
||||
}
|
||||
if perm := info.Mode().Perm(); perm&0o077 != 0 {
|
||||
return nil, fmt.Errorf("nkey file has insecure permissions %04o: must not be accessible by group or others", perm)
|
||||
}
|
||||
|
||||
seed, err := os.ReadFile(cfg.NKeyFile)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to read nkey file: %w", err)
|
||||
|
||||
@@ -21,6 +21,29 @@ func TestConnect_InvalidNKeyFile(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestConnect_InsecureNKeyFilePermissions(t *testing.T) {
|
||||
// Create a temp file with insecure permissions
|
||||
tmpDir := t.TempDir()
|
||||
keyFile := filepath.Join(tmpDir, "insecure.nkey")
|
||||
if err := os.WriteFile(keyFile, []byte("test-content"), 0644); err != nil {
|
||||
t.Fatalf("failed to write temp file: %v", err)
|
||||
}
|
||||
|
||||
cfg := Config{
|
||||
URL: "nats://localhost:4222",
|
||||
NKeyFile: keyFile,
|
||||
Name: "test",
|
||||
}
|
||||
|
||||
_, err := Connect(cfg)
|
||||
if err == nil {
|
||||
t.Error("expected error for insecure nkey file permissions")
|
||||
}
|
||||
if err != nil && !contains(err.Error(), "insecure permissions") {
|
||||
t.Errorf("expected insecure permissions error, got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestConnect_InvalidNKeySeed(t *testing.T) {
|
||||
// Create a temp file with invalid content
|
||||
tmpDir := t.TempDir()
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
{ self }:
|
||||
{ config, lib, pkgs, ... }:
|
||||
|
||||
let
|
||||
@@ -14,14 +15,30 @@ let
|
||||
"--discover-subject ${lib.escapeShellArg cfg.discoverSubject}"
|
||||
]
|
||||
++ lib.optional (cfg.role != null) "--role ${lib.escapeShellArg cfg.role}"
|
||||
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") cfg.deploySubjects);
|
||||
++ map (s: "--deploy-subject ${lib.escapeShellArg s}") cfg.deploySubjects
|
||||
++ lib.optionals cfg.metrics.enable [
|
||||
"--metrics-enabled"
|
||||
"--metrics-addr ${lib.escapeShellArg cfg.metrics.address}"
|
||||
]
|
||||
++ cfg.extraArgs);
|
||||
|
||||
# Extract port from metrics address for firewall rule
|
||||
metricsPort = let
|
||||
addr = cfg.metrics.address;
|
||||
# Handle both ":9972" and "0.0.0.0:9972" formats
|
||||
parts = lib.splitString ":" addr;
|
||||
in lib.toInt (lib.last parts);
|
||||
|
||||
in
|
||||
{
|
||||
options.services.homelab-deploy.listener = {
|
||||
enable = lib.mkEnableOption "homelab-deploy listener service";
|
||||
|
||||
package = lib.mkPackageOption pkgs "homelab-deploy" { };
|
||||
package = lib.mkOption {
|
||||
type = lib.types.package;
|
||||
default = self.packages.${pkgs.system}.homelab-deploy;
|
||||
description = "The homelab-deploy package to use";
|
||||
};
|
||||
|
||||
hostname = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
@@ -89,6 +106,30 @@ in
|
||||
description = "Additional environment variables for the service";
|
||||
example = { GIT_SSH_COMMAND = "ssh -i /run/secrets/deploy-key"; };
|
||||
};
|
||||
|
||||
metrics = {
|
||||
enable = lib.mkEnableOption "Prometheus metrics endpoint";
|
||||
|
||||
address = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
default = ":9972";
|
||||
description = "Address for Prometheus metrics HTTP server";
|
||||
example = "127.0.0.1:9972";
|
||||
};
|
||||
|
||||
openFirewall = lib.mkOption {
|
||||
type = lib.types.bool;
|
||||
default = false;
|
||||
description = "Open firewall for metrics port";
|
||||
};
|
||||
};
|
||||
|
||||
extraArgs = lib.mkOption {
|
||||
type = lib.types.listOf lib.types.str;
|
||||
default = [ ];
|
||||
description = "Extra command line arguments to pass to the listener";
|
||||
example = [ "--debug" ];
|
||||
};
|
||||
};
|
||||
|
||||
config = lib.mkIf cfg.enable {
|
||||
@@ -98,35 +139,36 @@ in
|
||||
after = [ "network-online.target" ];
|
||||
wants = [ "network-online.target" ];
|
||||
|
||||
environment = cfg.environment;
|
||||
# Prevent self-interruption during nixos-rebuild switch
|
||||
# The service will continue running the old version until manually restarted
|
||||
stopIfChanged = false;
|
||||
restartIfChanged = false;
|
||||
|
||||
environment = cfg.environment // {
|
||||
# Nix needs a writable cache for git flake fetching
|
||||
XDG_CACHE_HOME = "/var/cache/homelab-deploy";
|
||||
};
|
||||
|
||||
path = [ pkgs.git config.system.build.nixos-rebuild ];
|
||||
|
||||
serviceConfig = {
|
||||
CacheDirectory = "homelab-deploy";
|
||||
Type = "simple";
|
||||
ExecStart = "${cfg.package}/bin/homelab-deploy listener ${args}";
|
||||
Restart = "always";
|
||||
RestartSec = 10;
|
||||
|
||||
# Hardening (compatible with nixos-rebuild requirements)
|
||||
# Note: Some options are relaxed because nixos-rebuild requires:
|
||||
# Minimal hardening - nixos-rebuild requires broad system access:
|
||||
# - Write access to /nix/store for building
|
||||
# - Kernel namespace support for nix sandbox builds
|
||||
# - Ability to activate system configurations
|
||||
# - Network access for fetching from git/cache
|
||||
# - Namespace support for nix sandbox builds
|
||||
NoNewPrivileges = false;
|
||||
ProtectSystem = "false";
|
||||
ProtectHome = "read-only";
|
||||
PrivateTmp = true;
|
||||
PrivateDevices = true;
|
||||
ProtectKernelTunables = true;
|
||||
ProtectKernelModules = true;
|
||||
ProtectControlGroups = true;
|
||||
RestrictAddressFamilies = [ "AF_UNIX" "AF_INET" "AF_INET6" ];
|
||||
RestrictNamespaces = false;
|
||||
RestrictSUIDSGID = true;
|
||||
LockPersonality = true;
|
||||
MemoryDenyWriteExecute = false;
|
||||
SystemCallArchitectures = "native";
|
||||
# Following the approach of nixos auto-upgrade which has no hardening
|
||||
};
|
||||
};
|
||||
|
||||
networking.firewall.allowedTCPPorts = lib.mkIf (cfg.metrics.enable && cfg.metrics.openFirewall) [
|
||||
metricsPort
|
||||
];
|
||||
};
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user