feat: add Prometheus metrics to listener service

Add an optional Prometheus metrics HTTP endpoint to the listener for
monitoring deployment operations. Includes four metrics:

- homelab_deploy_deployments_total (counter with status/action/error_code)
- homelab_deploy_deployment_duration_seconds (histogram with action/success)
- homelab_deploy_deployment_in_progress (gauge)
- homelab_deploy_info (gauge with hostname/tier/role/version)

New CLI flags: --metrics-enabled, --metrics-addr (default :9972)
New NixOS options: metrics.enable, metrics.address, metrics.openFirewall

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-07 07:58:22 +01:00
parent 56365835c7
commit 79db119d1c
10 changed files with 613 additions and 9 deletions

View File

@@ -61,6 +61,8 @@ homelab-deploy listener \
| `--timeout` | No | Deployment timeout in seconds (default: 600) |
| `--deploy-subject` | No | NATS subjects to subscribe to (repeatable) |
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
#### Subject Templates
@@ -209,6 +211,9 @@ Add the module to your NixOS configuration:
| `deploySubjects` | list of string | see below | Subjects to subscribe to |
| `discoverSubject` | string | `"deploy.discover"` | Discovery subject |
| `environment` | attrs | `{}` | Additional environment variables |
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
Default `deploySubjects`:
```nix
@@ -219,6 +224,80 @@ Default `deploySubjects`:
]
```
## Prometheus Metrics
The listener can expose Prometheus metrics for monitoring deployment operations.
### Enabling Metrics
**CLI:**
```bash
homelab-deploy listener \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--metrics-enabled \
--metrics-addr :9972
```
**NixOS module:**
```nix
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
metrics = {
enable = true;
address = ":9972";
openFirewall = true; # Optional: open firewall for Prometheus scraping
};
};
```
### Available Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `homelab_deploy_deployments_total` | Counter | `status`, `action`, `error_code` | Total deployment requests processed |
| `homelab_deploy_deployment_duration_seconds` | Histogram | `action`, `success` | Deployment execution time |
| `homelab_deploy_deployment_in_progress` | Gauge | - | 1 if deployment running, 0 otherwise |
| `homelab_deploy_info` | Gauge | `hostname`, `tier`, `role`, `version` | Static instance metadata |
**Label values:**
- `status`: `completed`, `failed`, `rejected`
- `action`: `switch`, `boot`, `test`, `dry-activate`
- `error_code`: `invalid_action`, `invalid_revision`, `already_running`, `build_failed`, `timeout`, or empty
- `success`: `true`, `false`
### HTTP Endpoints
| Endpoint | Description |
|----------|-------------|
| `/metrics` | Prometheus metrics in text format |
| `/health` | Health check (returns `ok`) |
### Example Prometheus Queries
```promql
# Average deployment duration (last hour)
rate(homelab_deploy_deployment_duration_seconds_sum[1h]) /
rate(homelab_deploy_deployment_duration_seconds_count[1h])
# Deployment success rate (last 24 hours)
sum(rate(homelab_deploy_deployments_total{status="completed"}[24h])) /
sum(rate(homelab_deploy_deployments_total{status=~"completed|failed"}[24h]))
# 95th percentile deployment time
histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[1h]))
# Currently running deployments across all hosts
sum(homelab_deploy_deployment_in_progress)
```
## Message Protocol
### Deploy Request