This repository has been archived on 2026-03-09. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
homelab-deploy/README.md
Torjus Håkestad c272ce6903 docs: document --debug flag and extraArgs module option
Add documentation for:
- --debug flag in Listener Flags table
- --heartbeat-interval flag (was missing)
- extraArgs NixOS module option
- New Troubleshooting section with debug logging examples
  and guidance for diagnosing metrics issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 01:28:21 +01:00

528 lines
16 KiB
Markdown

# homelab-deploy
A message-based deployment system for NixOS configurations using NATS for messaging. Deploy NixOS configurations across a fleet of hosts with support for tiered access control, role-based targeting, and AI assistant integration.
## Overview
The `homelab-deploy` binary provides three operational modes:
1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive
2. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
3. **CLI mode** - Manual deployment commands for administrators
## Installation
### Using Nix Flakes
```bash
# Run directly
nix run github:torjus/homelab-deploy -- --help
# Add to your flake inputs
{
inputs.homelab-deploy.url = "github:torjus/homelab-deploy";
}
```
### Building from source
```bash
nix develop
go build ./cmd/homelab-deploy
```
## CLI Usage
### Listener Mode
Run on each NixOS host to listen for deployment requests:
```bash
homelab-deploy listener \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--role dns \
--timeout 600
```
#### Listener Flags
| Flag | Required | Description |
|------|----------|-------------|
| `--hostname` | Yes | Hostname for this listener |
| `--tier` | Yes | Deployment tier (`test` or `prod`) |
| `--nats-url` | Yes | NATS server URL |
| `--nkey-file` | Yes | Path to NKey seed file |
| `--flake-url` | Yes | Git flake URL for nixos-rebuild |
| `--role` | No | Role for role-based targeting |
| `--timeout` | No | Deployment timeout in seconds (default: 600) |
| `--deploy-subject` | No | NATS subjects to subscribe to (repeatable) |
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
| `--heartbeat-interval` | No | Status update interval in seconds during deployment (default: 15) |
| `--debug` | No | Enable debug logging for troubleshooting |
#### Subject Templates
Deploy subjects support template variables that are expanded at startup:
- `<hostname>` - The listener's hostname
- `<tier>` - The listener's tier
- `<role>` - The listener's role (subjects with `<role>` are skipped if role is not set)
Default subjects:
```
deploy.<tier>.<hostname>
deploy.<tier>.all
deploy.<tier>.role.<role>
```
### Deploy Command
Deploy to hosts via NATS:
```bash
# Deploy to a specific host
homelab-deploy deploy deploy.prod.myhost \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey \
--branch main \
--action switch
# Deploy to all test hosts
homelab-deploy deploy deploy.test.all \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
# Deploy to all prod DNS servers
homelab-deploy deploy deploy.prod.role.dns \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
```
#### Deploy Flags
| Flag | Required | Env Var | Description |
|------|----------|---------|-------------|
| `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL |
| `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file |
| `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch or commit (default: `master`) |
| `--action` | No | `HOMELAB_DEPLOY_ACTION` | nixos-rebuild action (default: `switch`) |
| `--timeout` | No | `HOMELAB_DEPLOY_TIMEOUT` | Response timeout in seconds (default: 900) |
#### Subject Aliases
Configure aliases via environment variables to simplify common deployments:
```bash
export HOMELAB_DEPLOY_ALIAS_TEST="deploy.test.all"
export HOMELAB_DEPLOY_ALIAS_PROD="deploy.prod.all"
export HOMELAB_DEPLOY_ALIAS_PROD_DNS="deploy.prod.role.dns"
# Now use short aliases
homelab-deploy deploy test --nats-url ... --nkey-file ...
homelab-deploy deploy prod-dns --nats-url ... --nkey-file ...
```
Alias lookup: `HOMELAB_DEPLOY_ALIAS_<NAME>` where name is uppercased and hyphens become underscores.
### MCP Server Mode
Run as an MCP server for AI assistant integration:
```bash
# Test-tier only access
homelab-deploy mcp \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/mcp.nkey
# With admin access to all tiers
homelab-deploy mcp \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/mcp.nkey \
--enable-admin \
--admin-nkey-file /run/secrets/admin.nkey
```
#### MCP Tools
| Tool | Description |
|------|-------------|
| `deploy` | Deploy to test-tier hosts only |
| `deploy_admin` | Deploy to any tier (requires `--enable-admin`) |
| `list_hosts` | Discover available deployment targets |
#### Tool Parameters
**deploy / deploy_admin:**
- `hostname` - Target specific host
- `all` - Deploy to all hosts (in tier)
- `role` - Deploy to hosts with this role
- `branch` - Git branch/commit (default: master)
- `action` - switch, boot, test, dry-activate (default: switch)
- `tier` - Required for deploy_admin only
**list_hosts:**
- `tier` - Filter by tier (optional)
## NixOS Module
Add the module to your NixOS configuration:
```nix
{
inputs.homelab-deploy.url = "github:torjus/homelab-deploy";
outputs = { self, nixpkgs, homelab-deploy, ... }: {
nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
modules = [
homelab-deploy.nixosModules.default
{
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
role = "dns";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
};
}
];
};
};
}
```
### Module Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enable` | bool | `false` | Enable the listener service |
| `package` | package | from flake | Package to use |
| `hostname` | string | `config.networking.hostName` | Hostname for subject templates |
| `tier` | enum | required | `"test"` or `"prod"` |
| `role` | string | `null` | Role for role-based targeting |
| `natsUrl` | string | required | NATS server URL |
| `nkeyFile` | path | required | Path to NKey seed file |
| `flakeUrl` | string | required | Git flake URL |
| `timeout` | int | `600` | Deployment timeout in seconds |
| `deploySubjects` | list of string | see below | Subjects to subscribe to |
| `discoverSubject` | string | `"deploy.discover"` | Discovery subject |
| `environment` | attrs | `{}` | Additional environment variables |
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
| `extraArgs` | list of string | `[]` | Extra command line arguments (e.g., `["--debug"]`) |
Default `deploySubjects`:
```nix
[
"deploy.<tier>.<hostname>"
"deploy.<tier>.all"
"deploy.<tier>.role.<role>"
]
```
## Prometheus Metrics
The listener can expose Prometheus metrics for monitoring deployment operations.
### Enabling Metrics
**CLI:**
```bash
homelab-deploy listener \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--metrics-enabled \
--metrics-addr :9972
```
**NixOS module:**
```nix
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
metrics = {
enable = true;
address = ":9972";
openFirewall = true; # Optional: open firewall for Prometheus scraping
};
};
```
### Available Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `homelab_deploy_deployments_total` | Counter | `status`, `action`, `error_code` | Total deployment requests processed |
| `homelab_deploy_deployment_duration_seconds` | Histogram | `action`, `success` | Deployment execution time |
| `homelab_deploy_deployment_in_progress` | Gauge | - | 1 if deployment running, 0 otherwise |
| `homelab_deploy_info` | Gauge | `hostname`, `tier`, `role`, `version` | Static instance metadata |
**Label values:**
- `status`: `completed`, `failed`, `rejected`
- `action`: `switch`, `boot`, `test`, `dry-activate`
- `error_code`: `invalid_action`, `invalid_revision`, `already_running`, `build_failed`, `timeout`, or empty
- `success`: `true`, `false`
### HTTP Endpoints
| Endpoint | Description |
|----------|-------------|
| `/metrics` | Prometheus metrics in text format |
| `/health` | Health check (returns `ok`) |
### Example Prometheus Queries
```promql
# Average deployment duration (last hour)
rate(homelab_deploy_deployment_duration_seconds_sum[1h]) /
rate(homelab_deploy_deployment_duration_seconds_count[1h])
# Deployment success rate (last 24 hours)
sum(rate(homelab_deploy_deployments_total{status="completed"}[24h])) /
sum(rate(homelab_deploy_deployments_total{status=~"completed|failed"}[24h]))
# 95th percentile deployment time
histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[1h]))
# Currently running deployments across all hosts
sum(homelab_deploy_deployment_in_progress)
```
## Troubleshooting
### Debug Logging
Enable debug logging to diagnose issues with deployments or metrics:
**CLI:**
```bash
homelab-deploy listener --debug \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--metrics-enabled
```
**NixOS module:**
```nix
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
metrics.enable = true;
extraArgs = [ "--debug" ];
};
```
With debug logging enabled, the listener outputs detailed information about metrics recording:
```json
{"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true}
{"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5}
```
### Metrics Showing Zero
If deployment metrics remain at zero after deployments:
1. **Check metrics are enabled**: Verify `--metrics-enabled` is set and the metrics endpoint is accessible at `/metrics`
2. **Enable debug logging**: Use `--debug` to confirm metrics recording is being called
3. **Check deployment status**: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with `status="rejected"` but don't record duration
4. **Check after restart**: After a successful `switch` deployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics
5. **Verify Prometheus scrape timing**: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts
## Message Protocol
### Deploy Request
```json
{
"action": "switch",
"revision": "main",
"reply_to": "deploy.responses.abc123"
}
```
### Deploy Response
```json
{
"hostname": "myhost",
"status": "completed",
"error": null,
"message": "Successfully switched to generation 42"
}
```
**Status values:** `accepted`, `rejected`, `started`, `completed`, `failed`
**Error codes:** `invalid_revision`, `invalid_action`, `already_running`, `build_failed`, `timeout`
## NATS Authentication
All connections use NKey authentication. Generate keys with:
```bash
nk -gen user -pubout
```
Configure appropriate publish/subscribe permissions in your NATS server for each credential type.
## NATS Subject Structure
The deployment system uses the following NATS subject hierarchy:
### Deploy Subjects
| Subject Pattern | Purpose |
|-----------------|---------|
| `deploy.<tier>.<hostname>` | Deploy to a specific host |
| `deploy.<tier>.all` | Deploy to all hosts in a tier |
| `deploy.<tier>.role.<role>` | Deploy to hosts with a specific role in a tier |
**Tier values:** `test`, `prod`
**Examples:**
- `deploy.test.myhost` - Deploy to myhost in test tier
- `deploy.prod.all` - Deploy to all production hosts
- `deploy.prod.role.dns` - Deploy to all DNS servers in production
### Response Subjects
| Subject Pattern | Purpose |
|-----------------|---------|
| `deploy.responses.<uuid>` | Unique reply subject for each deployment request |
Deployers create a unique response subject for each request and include it in the `reply_to` field. Listeners publish status updates to this subject.
### Discovery Subject
| Subject Pattern | Purpose |
|-----------------|---------|
| `deploy.discover` | Host discovery requests and responses |
Used by the `list_hosts` MCP tool and for discovering available deployment targets.
## Example NATS Configuration
Below is an example NATS server configuration implementing tiered authentication. This setup provides:
- **Listeners** - Each host has credentials to subscribe to its own subjects and publish responses
- **Test deployer** - Can deploy to test tier only (suitable for MCP without admin access)
- **Admin deployer** - Can deploy to all tiers (for CLI or MCP with admin access)
```conf
authorization {
users = [
# Listener for a test-tier host
{
nkey: "UTEST_HOST1_PUBLIC_KEY_HERE"
permissions: {
subscribe: [
"deploy.test.testhost1"
"deploy.test.all"
"deploy.test.role.>"
"deploy.discover"
]
publish: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Listener for a prod-tier host with 'dns' role
{
nkey: "UPROD_DNS1_PUBLIC_KEY_HERE"
permissions: {
subscribe: [
"deploy.prod.dns1"
"deploy.prod.all"
"deploy.prod.role.dns"
"deploy.discover"
]
publish: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Test-tier deployer (MCP without admin)
{
nkey: "UTEST_DEPLOYER_PUBLIC_KEY_HERE"
permissions: {
publish: [
"deploy.test.>"
"deploy.discover"
]
subscribe: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Admin deployer (full access to all tiers)
{
nkey: "UADMIN_DEPLOYER_PUBLIC_KEY_HERE"
permissions: {
publish: [
"deploy.>"
]
subscribe: [
"deploy.>"
]
}
}
]
}
```
### Key Permission Patterns
| Credential Type | Publish | Subscribe |
|-----------------|---------|-----------|
| Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` |
| Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` |
| Admin deployer | `deploy.>` | `deploy.>` |
### Generating NKeys
```bash
# Generate a keypair (outputs public key, saves seed to file)
nk -gen user -pubout > mykey.pub
# The seed (private key) is printed to stderr - save it securely
# Or generate and save seed directly
nk -gen user > mykey.seed
nk -inkey mykey.seed -pubout # Get public key from seed
```
The public key (starting with `U`) goes in the NATS server config. The seed file (starting with `SU`) is used by homelab-deploy via `--nkey-file`.
## License
MIT