Add documentation for: - --debug flag in Listener Flags table - --heartbeat-interval flag (was missing) - extraArgs NixOS module option - New Troubleshooting section with debug logging examples and guidance for diagnosing metrics issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
528 lines
16 KiB
Markdown
528 lines
16 KiB
Markdown
# homelab-deploy
|
|
|
|
A message-based deployment system for NixOS configurations using NATS for messaging. Deploy NixOS configurations across a fleet of hosts with support for tiered access control, role-based targeting, and AI assistant integration.
|
|
|
|
## Overview
|
|
|
|
The `homelab-deploy` binary provides three operational modes:
|
|
|
|
1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive
|
|
2. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
|
|
3. **CLI mode** - Manual deployment commands for administrators
|
|
|
|
## Installation
|
|
|
|
### Using Nix Flakes
|
|
|
|
```bash
|
|
# Run directly
|
|
nix run github:torjus/homelab-deploy -- --help
|
|
|
|
# Add to your flake inputs
|
|
{
|
|
inputs.homelab-deploy.url = "github:torjus/homelab-deploy";
|
|
}
|
|
```
|
|
|
|
### Building from source
|
|
|
|
```bash
|
|
nix develop
|
|
go build ./cmd/homelab-deploy
|
|
```
|
|
|
|
## CLI Usage
|
|
|
|
### Listener Mode
|
|
|
|
Run on each NixOS host to listen for deployment requests:
|
|
|
|
```bash
|
|
homelab-deploy listener \
|
|
--hostname myhost \
|
|
--tier prod \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/listener.nkey \
|
|
--flake-url git+https://git.example.com/user/nixos-configs.git \
|
|
--role dns \
|
|
--timeout 600
|
|
```
|
|
|
|
#### Listener Flags
|
|
|
|
| Flag | Required | Description |
|
|
|------|----------|-------------|
|
|
| `--hostname` | Yes | Hostname for this listener |
|
|
| `--tier` | Yes | Deployment tier (`test` or `prod`) |
|
|
| `--nats-url` | Yes | NATS server URL |
|
|
| `--nkey-file` | Yes | Path to NKey seed file |
|
|
| `--flake-url` | Yes | Git flake URL for nixos-rebuild |
|
|
| `--role` | No | Role for role-based targeting |
|
|
| `--timeout` | No | Deployment timeout in seconds (default: 600) |
|
|
| `--deploy-subject` | No | NATS subjects to subscribe to (repeatable) |
|
|
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
|
|
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
|
|
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
|
|
| `--heartbeat-interval` | No | Status update interval in seconds during deployment (default: 15) |
|
|
| `--debug` | No | Enable debug logging for troubleshooting |
|
|
|
|
#### Subject Templates
|
|
|
|
Deploy subjects support template variables that are expanded at startup:
|
|
|
|
- `<hostname>` - The listener's hostname
|
|
- `<tier>` - The listener's tier
|
|
- `<role>` - The listener's role (subjects with `<role>` are skipped if role is not set)
|
|
|
|
Default subjects:
|
|
```
|
|
deploy.<tier>.<hostname>
|
|
deploy.<tier>.all
|
|
deploy.<tier>.role.<role>
|
|
```
|
|
|
|
### Deploy Command
|
|
|
|
Deploy to hosts via NATS:
|
|
|
|
```bash
|
|
# Deploy to a specific host
|
|
homelab-deploy deploy deploy.prod.myhost \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/deployer.nkey \
|
|
--branch main \
|
|
--action switch
|
|
|
|
# Deploy to all test hosts
|
|
homelab-deploy deploy deploy.test.all \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/deployer.nkey
|
|
|
|
# Deploy to all prod DNS servers
|
|
homelab-deploy deploy deploy.prod.role.dns \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/deployer.nkey
|
|
```
|
|
|
|
#### Deploy Flags
|
|
|
|
| Flag | Required | Env Var | Description |
|
|
|------|----------|---------|-------------|
|
|
| `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL |
|
|
| `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file |
|
|
| `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch or commit (default: `master`) |
|
|
| `--action` | No | `HOMELAB_DEPLOY_ACTION` | nixos-rebuild action (default: `switch`) |
|
|
| `--timeout` | No | `HOMELAB_DEPLOY_TIMEOUT` | Response timeout in seconds (default: 900) |
|
|
|
|
#### Subject Aliases
|
|
|
|
Configure aliases via environment variables to simplify common deployments:
|
|
|
|
```bash
|
|
export HOMELAB_DEPLOY_ALIAS_TEST="deploy.test.all"
|
|
export HOMELAB_DEPLOY_ALIAS_PROD="deploy.prod.all"
|
|
export HOMELAB_DEPLOY_ALIAS_PROD_DNS="deploy.prod.role.dns"
|
|
|
|
# Now use short aliases
|
|
homelab-deploy deploy test --nats-url ... --nkey-file ...
|
|
homelab-deploy deploy prod-dns --nats-url ... --nkey-file ...
|
|
```
|
|
|
|
Alias lookup: `HOMELAB_DEPLOY_ALIAS_<NAME>` where name is uppercased and hyphens become underscores.
|
|
|
|
### MCP Server Mode
|
|
|
|
Run as an MCP server for AI assistant integration:
|
|
|
|
```bash
|
|
# Test-tier only access
|
|
homelab-deploy mcp \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/mcp.nkey
|
|
|
|
# With admin access to all tiers
|
|
homelab-deploy mcp \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/mcp.nkey \
|
|
--enable-admin \
|
|
--admin-nkey-file /run/secrets/admin.nkey
|
|
```
|
|
|
|
#### MCP Tools
|
|
|
|
| Tool | Description |
|
|
|------|-------------|
|
|
| `deploy` | Deploy to test-tier hosts only |
|
|
| `deploy_admin` | Deploy to any tier (requires `--enable-admin`) |
|
|
| `list_hosts` | Discover available deployment targets |
|
|
|
|
#### Tool Parameters
|
|
|
|
**deploy / deploy_admin:**
|
|
- `hostname` - Target specific host
|
|
- `all` - Deploy to all hosts (in tier)
|
|
- `role` - Deploy to hosts with this role
|
|
- `branch` - Git branch/commit (default: master)
|
|
- `action` - switch, boot, test, dry-activate (default: switch)
|
|
- `tier` - Required for deploy_admin only
|
|
|
|
**list_hosts:**
|
|
- `tier` - Filter by tier (optional)
|
|
|
|
## NixOS Module
|
|
|
|
Add the module to your NixOS configuration:
|
|
|
|
```nix
|
|
{
|
|
inputs.homelab-deploy.url = "github:torjus/homelab-deploy";
|
|
|
|
outputs = { self, nixpkgs, homelab-deploy, ... }: {
|
|
nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
|
|
modules = [
|
|
homelab-deploy.nixosModules.default
|
|
{
|
|
services.homelab-deploy.listener = {
|
|
enable = true;
|
|
tier = "prod";
|
|
role = "dns";
|
|
natsUrl = "nats://nats.example.com:4222";
|
|
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
|
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
|
|
};
|
|
}
|
|
];
|
|
};
|
|
};
|
|
}
|
|
```
|
|
|
|
### Module Options
|
|
|
|
| Option | Type | Default | Description |
|
|
|--------|------|---------|-------------|
|
|
| `enable` | bool | `false` | Enable the listener service |
|
|
| `package` | package | from flake | Package to use |
|
|
| `hostname` | string | `config.networking.hostName` | Hostname for subject templates |
|
|
| `tier` | enum | required | `"test"` or `"prod"` |
|
|
| `role` | string | `null` | Role for role-based targeting |
|
|
| `natsUrl` | string | required | NATS server URL |
|
|
| `nkeyFile` | path | required | Path to NKey seed file |
|
|
| `flakeUrl` | string | required | Git flake URL |
|
|
| `timeout` | int | `600` | Deployment timeout in seconds |
|
|
| `deploySubjects` | list of string | see below | Subjects to subscribe to |
|
|
| `discoverSubject` | string | `"deploy.discover"` | Discovery subject |
|
|
| `environment` | attrs | `{}` | Additional environment variables |
|
|
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
|
|
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
|
|
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
|
|
| `extraArgs` | list of string | `[]` | Extra command line arguments (e.g., `["--debug"]`) |
|
|
|
|
Default `deploySubjects`:
|
|
```nix
|
|
[
|
|
"deploy.<tier>.<hostname>"
|
|
"deploy.<tier>.all"
|
|
"deploy.<tier>.role.<role>"
|
|
]
|
|
```
|
|
|
|
## Prometheus Metrics
|
|
|
|
The listener can expose Prometheus metrics for monitoring deployment operations.
|
|
|
|
### Enabling Metrics
|
|
|
|
**CLI:**
|
|
```bash
|
|
homelab-deploy listener \
|
|
--hostname myhost \
|
|
--tier prod \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/listener.nkey \
|
|
--flake-url git+https://git.example.com/user/nixos-configs.git \
|
|
--metrics-enabled \
|
|
--metrics-addr :9972
|
|
```
|
|
|
|
**NixOS module:**
|
|
```nix
|
|
services.homelab-deploy.listener = {
|
|
enable = true;
|
|
tier = "prod";
|
|
natsUrl = "nats://nats.example.com:4222";
|
|
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
|
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
|
|
metrics = {
|
|
enable = true;
|
|
address = ":9972";
|
|
openFirewall = true; # Optional: open firewall for Prometheus scraping
|
|
};
|
|
};
|
|
```
|
|
|
|
### Available Metrics
|
|
|
|
| Metric | Type | Labels | Description |
|
|
|--------|------|--------|-------------|
|
|
| `homelab_deploy_deployments_total` | Counter | `status`, `action`, `error_code` | Total deployment requests processed |
|
|
| `homelab_deploy_deployment_duration_seconds` | Histogram | `action`, `success` | Deployment execution time |
|
|
| `homelab_deploy_deployment_in_progress` | Gauge | - | 1 if deployment running, 0 otherwise |
|
|
| `homelab_deploy_info` | Gauge | `hostname`, `tier`, `role`, `version` | Static instance metadata |
|
|
|
|
**Label values:**
|
|
- `status`: `completed`, `failed`, `rejected`
|
|
- `action`: `switch`, `boot`, `test`, `dry-activate`
|
|
- `error_code`: `invalid_action`, `invalid_revision`, `already_running`, `build_failed`, `timeout`, or empty
|
|
- `success`: `true`, `false`
|
|
|
|
### HTTP Endpoints
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `/metrics` | Prometheus metrics in text format |
|
|
| `/health` | Health check (returns `ok`) |
|
|
|
|
### Example Prometheus Queries
|
|
|
|
```promql
|
|
# Average deployment duration (last hour)
|
|
rate(homelab_deploy_deployment_duration_seconds_sum[1h]) /
|
|
rate(homelab_deploy_deployment_duration_seconds_count[1h])
|
|
|
|
# Deployment success rate (last 24 hours)
|
|
sum(rate(homelab_deploy_deployments_total{status="completed"}[24h])) /
|
|
sum(rate(homelab_deploy_deployments_total{status=~"completed|failed"}[24h]))
|
|
|
|
# 95th percentile deployment time
|
|
histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[1h]))
|
|
|
|
# Currently running deployments across all hosts
|
|
sum(homelab_deploy_deployment_in_progress)
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Debug Logging
|
|
|
|
Enable debug logging to diagnose issues with deployments or metrics:
|
|
|
|
**CLI:**
|
|
```bash
|
|
homelab-deploy listener --debug \
|
|
--hostname myhost \
|
|
--tier prod \
|
|
--nats-url nats://nats.example.com:4222 \
|
|
--nkey-file /run/secrets/listener.nkey \
|
|
--flake-url git+https://git.example.com/user/nixos-configs.git \
|
|
--metrics-enabled
|
|
```
|
|
|
|
**NixOS module:**
|
|
```nix
|
|
services.homelab-deploy.listener = {
|
|
enable = true;
|
|
tier = "prod";
|
|
natsUrl = "nats://nats.example.com:4222";
|
|
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
|
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
|
|
metrics.enable = true;
|
|
extraArgs = [ "--debug" ];
|
|
};
|
|
```
|
|
|
|
With debug logging enabled, the listener outputs detailed information about metrics recording:
|
|
|
|
```json
|
|
{"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true}
|
|
{"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5}
|
|
```
|
|
|
|
### Metrics Showing Zero
|
|
|
|
If deployment metrics remain at zero after deployments:
|
|
|
|
1. **Check metrics are enabled**: Verify `--metrics-enabled` is set and the metrics endpoint is accessible at `/metrics`
|
|
|
|
2. **Enable debug logging**: Use `--debug` to confirm metrics recording is being called
|
|
|
|
3. **Check deployment status**: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with `status="rejected"` but don't record duration
|
|
|
|
4. **Check after restart**: After a successful `switch` deployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics
|
|
|
|
5. **Verify Prometheus scrape timing**: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts
|
|
|
|
## Message Protocol
|
|
|
|
### Deploy Request
|
|
|
|
```json
|
|
{
|
|
"action": "switch",
|
|
"revision": "main",
|
|
"reply_to": "deploy.responses.abc123"
|
|
}
|
|
```
|
|
|
|
### Deploy Response
|
|
|
|
```json
|
|
{
|
|
"hostname": "myhost",
|
|
"status": "completed",
|
|
"error": null,
|
|
"message": "Successfully switched to generation 42"
|
|
}
|
|
```
|
|
|
|
**Status values:** `accepted`, `rejected`, `started`, `completed`, `failed`
|
|
|
|
**Error codes:** `invalid_revision`, `invalid_action`, `already_running`, `build_failed`, `timeout`
|
|
|
|
## NATS Authentication
|
|
|
|
All connections use NKey authentication. Generate keys with:
|
|
|
|
```bash
|
|
nk -gen user -pubout
|
|
```
|
|
|
|
Configure appropriate publish/subscribe permissions in your NATS server for each credential type.
|
|
|
|
## NATS Subject Structure
|
|
|
|
The deployment system uses the following NATS subject hierarchy:
|
|
|
|
### Deploy Subjects
|
|
|
|
| Subject Pattern | Purpose |
|
|
|-----------------|---------|
|
|
| `deploy.<tier>.<hostname>` | Deploy to a specific host |
|
|
| `deploy.<tier>.all` | Deploy to all hosts in a tier |
|
|
| `deploy.<tier>.role.<role>` | Deploy to hosts with a specific role in a tier |
|
|
|
|
**Tier values:** `test`, `prod`
|
|
|
|
**Examples:**
|
|
- `deploy.test.myhost` - Deploy to myhost in test tier
|
|
- `deploy.prod.all` - Deploy to all production hosts
|
|
- `deploy.prod.role.dns` - Deploy to all DNS servers in production
|
|
|
|
### Response Subjects
|
|
|
|
| Subject Pattern | Purpose |
|
|
|-----------------|---------|
|
|
| `deploy.responses.<uuid>` | Unique reply subject for each deployment request |
|
|
|
|
Deployers create a unique response subject for each request and include it in the `reply_to` field. Listeners publish status updates to this subject.
|
|
|
|
### Discovery Subject
|
|
|
|
| Subject Pattern | Purpose |
|
|
|-----------------|---------|
|
|
| `deploy.discover` | Host discovery requests and responses |
|
|
|
|
Used by the `list_hosts` MCP tool and for discovering available deployment targets.
|
|
|
|
## Example NATS Configuration
|
|
|
|
Below is an example NATS server configuration implementing tiered authentication. This setup provides:
|
|
|
|
- **Listeners** - Each host has credentials to subscribe to its own subjects and publish responses
|
|
- **Test deployer** - Can deploy to test tier only (suitable for MCP without admin access)
|
|
- **Admin deployer** - Can deploy to all tiers (for CLI or MCP with admin access)
|
|
|
|
```conf
|
|
authorization {
|
|
users = [
|
|
# Listener for a test-tier host
|
|
{
|
|
nkey: "UTEST_HOST1_PUBLIC_KEY_HERE"
|
|
permissions: {
|
|
subscribe: [
|
|
"deploy.test.testhost1"
|
|
"deploy.test.all"
|
|
"deploy.test.role.>"
|
|
"deploy.discover"
|
|
]
|
|
publish: [
|
|
"deploy.responses.>"
|
|
"deploy.discover"
|
|
]
|
|
}
|
|
}
|
|
|
|
# Listener for a prod-tier host with 'dns' role
|
|
{
|
|
nkey: "UPROD_DNS1_PUBLIC_KEY_HERE"
|
|
permissions: {
|
|
subscribe: [
|
|
"deploy.prod.dns1"
|
|
"deploy.prod.all"
|
|
"deploy.prod.role.dns"
|
|
"deploy.discover"
|
|
]
|
|
publish: [
|
|
"deploy.responses.>"
|
|
"deploy.discover"
|
|
]
|
|
}
|
|
}
|
|
|
|
# Test-tier deployer (MCP without admin)
|
|
{
|
|
nkey: "UTEST_DEPLOYER_PUBLIC_KEY_HERE"
|
|
permissions: {
|
|
publish: [
|
|
"deploy.test.>"
|
|
"deploy.discover"
|
|
]
|
|
subscribe: [
|
|
"deploy.responses.>"
|
|
"deploy.discover"
|
|
]
|
|
}
|
|
}
|
|
|
|
# Admin deployer (full access to all tiers)
|
|
{
|
|
nkey: "UADMIN_DEPLOYER_PUBLIC_KEY_HERE"
|
|
permissions: {
|
|
publish: [
|
|
"deploy.>"
|
|
]
|
|
subscribe: [
|
|
"deploy.>"
|
|
]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Key Permission Patterns
|
|
|
|
| Credential Type | Publish | Subscribe |
|
|
|-----------------|---------|-----------|
|
|
| Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` |
|
|
| Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` |
|
|
| Admin deployer | `deploy.>` | `deploy.>` |
|
|
|
|
### Generating NKeys
|
|
|
|
```bash
|
|
# Generate a keypair (outputs public key, saves seed to file)
|
|
nk -gen user -pubout > mykey.pub
|
|
# The seed (private key) is printed to stderr - save it securely
|
|
|
|
# Or generate and save seed directly
|
|
nk -gen user > mykey.seed
|
|
nk -inkey mykey.seed -pubout # Get public key from seed
|
|
```
|
|
|
|
The public key (starting with `U`) goes in the NATS server config. The seed file (starting with `SU`) is used by homelab-deploy via `--nkey-file`.
|
|
|
|
## License
|
|
|
|
MIT
|