Add documentation for: - --debug flag in Listener Flags table - --heartbeat-interval flag (was missing) - extraArgs NixOS module option - New Troubleshooting section with debug logging examples and guidance for diagnosing metrics issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
16 KiB
homelab-deploy
A message-based deployment system for NixOS configurations using NATS for messaging. Deploy NixOS configurations across a fleet of hosts with support for tiered access control, role-based targeting, and AI assistant integration.
Overview
The homelab-deploy binary provides three operational modes:
- Listener mode - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing
nixos-rebuildwhen deployment requests arrive - MCP mode - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
- CLI mode - Manual deployment commands for administrators
Installation
Using Nix Flakes
# Run directly
nix run github:torjus/homelab-deploy -- --help
# Add to your flake inputs
{
inputs.homelab-deploy.url = "github:torjus/homelab-deploy";
}
Building from source
nix develop
go build ./cmd/homelab-deploy
CLI Usage
Listener Mode
Run on each NixOS host to listen for deployment requests:
homelab-deploy listener \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--role dns \
--timeout 600
Listener Flags
| Flag | Required | Description |
|---|---|---|
--hostname |
Yes | Hostname for this listener |
--tier |
Yes | Deployment tier (test or prod) |
--nats-url |
Yes | NATS server URL |
--nkey-file |
Yes | Path to NKey seed file |
--flake-url |
Yes | Git flake URL for nixos-rebuild |
--role |
No | Role for role-based targeting |
--timeout |
No | Deployment timeout in seconds (default: 600) |
--deploy-subject |
No | NATS subjects to subscribe to (repeatable) |
--discover-subject |
No | Discovery subject (default: deploy.discover) |
--metrics-enabled |
No | Enable Prometheus metrics endpoint |
--metrics-addr |
No | Metrics HTTP server address (default: :9972) |
--heartbeat-interval |
No | Status update interval in seconds during deployment (default: 15) |
--debug |
No | Enable debug logging for troubleshooting |
Subject Templates
Deploy subjects support template variables that are expanded at startup:
<hostname>- The listener's hostname<tier>- The listener's tier<role>- The listener's role (subjects with<role>are skipped if role is not set)
Default subjects:
deploy.<tier>.<hostname>
deploy.<tier>.all
deploy.<tier>.role.<role>
Deploy Command
Deploy to hosts via NATS:
# Deploy to a specific host
homelab-deploy deploy deploy.prod.myhost \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey \
--branch main \
--action switch
# Deploy to all test hosts
homelab-deploy deploy deploy.test.all \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
# Deploy to all prod DNS servers
homelab-deploy deploy deploy.prod.role.dns \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/deployer.nkey
Deploy Flags
| Flag | Required | Env Var | Description |
|---|---|---|---|
--nats-url |
Yes | HOMELAB_DEPLOY_NATS_URL |
NATS server URL |
--nkey-file |
Yes | HOMELAB_DEPLOY_NKEY_FILE |
Path to NKey seed file |
--branch |
No | HOMELAB_DEPLOY_BRANCH |
Git branch or commit (default: master) |
--action |
No | HOMELAB_DEPLOY_ACTION |
nixos-rebuild action (default: switch) |
--timeout |
No | HOMELAB_DEPLOY_TIMEOUT |
Response timeout in seconds (default: 900) |
Subject Aliases
Configure aliases via environment variables to simplify common deployments:
export HOMELAB_DEPLOY_ALIAS_TEST="deploy.test.all"
export HOMELAB_DEPLOY_ALIAS_PROD="deploy.prod.all"
export HOMELAB_DEPLOY_ALIAS_PROD_DNS="deploy.prod.role.dns"
# Now use short aliases
homelab-deploy deploy test --nats-url ... --nkey-file ...
homelab-deploy deploy prod-dns --nats-url ... --nkey-file ...
Alias lookup: HOMELAB_DEPLOY_ALIAS_<NAME> where name is uppercased and hyphens become underscores.
MCP Server Mode
Run as an MCP server for AI assistant integration:
# Test-tier only access
homelab-deploy mcp \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/mcp.nkey
# With admin access to all tiers
homelab-deploy mcp \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/mcp.nkey \
--enable-admin \
--admin-nkey-file /run/secrets/admin.nkey
MCP Tools
| Tool | Description |
|---|---|
deploy |
Deploy to test-tier hosts only |
deploy_admin |
Deploy to any tier (requires --enable-admin) |
list_hosts |
Discover available deployment targets |
Tool Parameters
deploy / deploy_admin:
hostname- Target specific hostall- Deploy to all hosts (in tier)role- Deploy to hosts with this rolebranch- Git branch/commit (default: master)action- switch, boot, test, dry-activate (default: switch)tier- Required for deploy_admin only
list_hosts:
tier- Filter by tier (optional)
NixOS Module
Add the module to your NixOS configuration:
{
inputs.homelab-deploy.url = "github:torjus/homelab-deploy";
outputs = { self, nixpkgs, homelab-deploy, ... }: {
nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
modules = [
homelab-deploy.nixosModules.default
{
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
role = "dns";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
};
}
];
};
};
}
Module Options
| Option | Type | Default | Description |
|---|---|---|---|
enable |
bool | false |
Enable the listener service |
package |
package | from flake | Package to use |
hostname |
string | config.networking.hostName |
Hostname for subject templates |
tier |
enum | required | "test" or "prod" |
role |
string | null |
Role for role-based targeting |
natsUrl |
string | required | NATS server URL |
nkeyFile |
path | required | Path to NKey seed file |
flakeUrl |
string | required | Git flake URL |
timeout |
int | 600 |
Deployment timeout in seconds |
deploySubjects |
list of string | see below | Subjects to subscribe to |
discoverSubject |
string | "deploy.discover" |
Discovery subject |
environment |
attrs | {} |
Additional environment variables |
metrics.enable |
bool | false |
Enable Prometheus metrics endpoint |
metrics.address |
string | ":9972" |
Metrics HTTP server address |
metrics.openFirewall |
bool | false |
Open firewall for metrics port |
extraArgs |
list of string | [] |
Extra command line arguments (e.g., ["--debug"]) |
Default deploySubjects:
[
"deploy.<tier>.<hostname>"
"deploy.<tier>.all"
"deploy.<tier>.role.<role>"
]
Prometheus Metrics
The listener can expose Prometheus metrics for monitoring deployment operations.
Enabling Metrics
CLI:
homelab-deploy listener \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--metrics-enabled \
--metrics-addr :9972
NixOS module:
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
metrics = {
enable = true;
address = ":9972";
openFirewall = true; # Optional: open firewall for Prometheus scraping
};
};
Available Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
homelab_deploy_deployments_total |
Counter | status, action, error_code |
Total deployment requests processed |
homelab_deploy_deployment_duration_seconds |
Histogram | action, success |
Deployment execution time |
homelab_deploy_deployment_in_progress |
Gauge | - | 1 if deployment running, 0 otherwise |
homelab_deploy_info |
Gauge | hostname, tier, role, version |
Static instance metadata |
Label values:
status:completed,failed,rejectedaction:switch,boot,test,dry-activateerror_code:invalid_action,invalid_revision,already_running,build_failed,timeout, or emptysuccess:true,false
HTTP Endpoints
| Endpoint | Description |
|---|---|
/metrics |
Prometheus metrics in text format |
/health |
Health check (returns ok) |
Example Prometheus Queries
# Average deployment duration (last hour)
rate(homelab_deploy_deployment_duration_seconds_sum[1h]) /
rate(homelab_deploy_deployment_duration_seconds_count[1h])
# Deployment success rate (last 24 hours)
sum(rate(homelab_deploy_deployments_total{status="completed"}[24h])) /
sum(rate(homelab_deploy_deployments_total{status=~"completed|failed"}[24h]))
# 95th percentile deployment time
histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[1h]))
# Currently running deployments across all hosts
sum(homelab_deploy_deployment_in_progress)
Troubleshooting
Debug Logging
Enable debug logging to diagnose issues with deployments or metrics:
CLI:
homelab-deploy listener --debug \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--metrics-enabled
NixOS module:
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
metrics.enable = true;
extraArgs = [ "--debug" ];
};
With debug logging enabled, the listener outputs detailed information about metrics recording:
{"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true}
{"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5}
Metrics Showing Zero
If deployment metrics remain at zero after deployments:
-
Check metrics are enabled: Verify
--metrics-enabledis set and the metrics endpoint is accessible at/metrics -
Enable debug logging: Use
--debugto confirm metrics recording is being called -
Check deployment status: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with
status="rejected"but don't record duration -
Check after restart: After a successful
switchdeployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics -
Verify Prometheus scrape timing: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts
Message Protocol
Deploy Request
{
"action": "switch",
"revision": "main",
"reply_to": "deploy.responses.abc123"
}
Deploy Response
{
"hostname": "myhost",
"status": "completed",
"error": null,
"message": "Successfully switched to generation 42"
}
Status values: accepted, rejected, started, completed, failed
Error codes: invalid_revision, invalid_action, already_running, build_failed, timeout
NATS Authentication
All connections use NKey authentication. Generate keys with:
nk -gen user -pubout
Configure appropriate publish/subscribe permissions in your NATS server for each credential type.
NATS Subject Structure
The deployment system uses the following NATS subject hierarchy:
Deploy Subjects
| Subject Pattern | Purpose |
|---|---|
deploy.<tier>.<hostname> |
Deploy to a specific host |
deploy.<tier>.all |
Deploy to all hosts in a tier |
deploy.<tier>.role.<role> |
Deploy to hosts with a specific role in a tier |
Tier values: test, prod
Examples:
deploy.test.myhost- Deploy to myhost in test tierdeploy.prod.all- Deploy to all production hostsdeploy.prod.role.dns- Deploy to all DNS servers in production
Response Subjects
| Subject Pattern | Purpose |
|---|---|
deploy.responses.<uuid> |
Unique reply subject for each deployment request |
Deployers create a unique response subject for each request and include it in the reply_to field. Listeners publish status updates to this subject.
Discovery Subject
| Subject Pattern | Purpose |
|---|---|
deploy.discover |
Host discovery requests and responses |
Used by the list_hosts MCP tool and for discovering available deployment targets.
Example NATS Configuration
Below is an example NATS server configuration implementing tiered authentication. This setup provides:
- Listeners - Each host has credentials to subscribe to its own subjects and publish responses
- Test deployer - Can deploy to test tier only (suitable for MCP without admin access)
- Admin deployer - Can deploy to all tiers (for CLI or MCP with admin access)
authorization {
users = [
# Listener for a test-tier host
{
nkey: "UTEST_HOST1_PUBLIC_KEY_HERE"
permissions: {
subscribe: [
"deploy.test.testhost1"
"deploy.test.all"
"deploy.test.role.>"
"deploy.discover"
]
publish: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Listener for a prod-tier host with 'dns' role
{
nkey: "UPROD_DNS1_PUBLIC_KEY_HERE"
permissions: {
subscribe: [
"deploy.prod.dns1"
"deploy.prod.all"
"deploy.prod.role.dns"
"deploy.discover"
]
publish: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Test-tier deployer (MCP without admin)
{
nkey: "UTEST_DEPLOYER_PUBLIC_KEY_HERE"
permissions: {
publish: [
"deploy.test.>"
"deploy.discover"
]
subscribe: [
"deploy.responses.>"
"deploy.discover"
]
}
}
# Admin deployer (full access to all tiers)
{
nkey: "UADMIN_DEPLOYER_PUBLIC_KEY_HERE"
permissions: {
publish: [
"deploy.>"
]
subscribe: [
"deploy.>"
]
}
}
]
}
Key Permission Patterns
| Credential Type | Publish | Subscribe |
|---|---|---|
| Listener | deploy.responses.>, deploy.discover |
Own subjects, deploy.discover |
| Test deployer | deploy.test.>, deploy.discover |
deploy.responses.>, deploy.discover |
| Admin deployer | deploy.> |
deploy.> |
Generating NKeys
# Generate a keypair (outputs public key, saves seed to file)
nk -gen user -pubout > mykey.pub
# The seed (private key) is printed to stderr - save it securely
# Or generate and save seed directly
nk -gen user > mykey.seed
nk -inkey mykey.seed -pubout # Get public key from seed
The public key (starting with U) goes in the NATS server config. The seed file (starting with SU) is used by homelab-deploy via --nkey-file.
License
MIT