# homelab-deploy A message-based deployment system for NixOS configurations using NATS for messaging. Deploy NixOS configurations across a fleet of hosts with support for tiered access control, role-based targeting, and AI assistant integration. ## Overview The `homelab-deploy` binary provides three operational modes: 1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive 2. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants 3. **CLI mode** - Manual deployment commands for administrators ## Installation ### Using Nix Flakes ```bash # Run directly nix run github:torjus/homelab-deploy -- --help # Add to your flake inputs { inputs.homelab-deploy.url = "github:torjus/homelab-deploy"; } ``` ### Building from source ```bash nix develop go build ./cmd/homelab-deploy ``` ## CLI Usage ### Listener Mode Run on each NixOS host to listen for deployment requests: ```bash homelab-deploy listener \ --hostname myhost \ --tier prod \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/listener.nkey \ --flake-url git+https://git.example.com/user/nixos-configs.git \ --role dns \ --timeout 600 ``` #### Listener Flags | Flag | Required | Description | |------|----------|-------------| | `--hostname` | Yes | Hostname for this listener | | `--tier` | Yes | Deployment tier (`test` or `prod`) | | `--nats-url` | Yes | NATS server URL | | `--nkey-file` | Yes | Path to NKey seed file | | `--flake-url` | Yes | Git flake URL for nixos-rebuild | | `--role` | No | Role for role-based targeting | | `--timeout` | No | Deployment timeout in seconds (default: 600) | | `--deploy-subject` | No | NATS subjects to subscribe to (repeatable) | | `--discover-subject` | No | Discovery subject (default: `deploy.discover`) | | `--metrics-enabled` | No | Enable Prometheus metrics endpoint | | `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) | | `--heartbeat-interval` | No | Status update interval in seconds during deployment (default: 15) | | `--debug` | No | Enable debug logging for troubleshooting | #### Subject Templates Deploy subjects support template variables that are expanded at startup: - `` - The listener's hostname - `` - The listener's tier - `` - The listener's role (subjects with `` are skipped if role is not set) Default subjects: ``` deploy.. deploy..all deploy..role. ``` ### Deploy Command Deploy to hosts via NATS: ```bash # Deploy to a specific host homelab-deploy deploy deploy.prod.myhost \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/deployer.nkey \ --branch main \ --action switch # Deploy to all test hosts homelab-deploy deploy deploy.test.all \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/deployer.nkey # Deploy to all prod DNS servers homelab-deploy deploy deploy.prod.role.dns \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/deployer.nkey ``` #### Deploy Flags | Flag | Required | Env Var | Description | |------|----------|---------|-------------| | `--nats-url` | Yes | `HOMELAB_DEPLOY_NATS_URL` | NATS server URL | | `--nkey-file` | Yes | `HOMELAB_DEPLOY_NKEY_FILE` | Path to NKey seed file | | `--branch` | No | `HOMELAB_DEPLOY_BRANCH` | Git branch or commit (default: `master`) | | `--action` | No | `HOMELAB_DEPLOY_ACTION` | nixos-rebuild action (default: `switch`) | | `--timeout` | No | `HOMELAB_DEPLOY_TIMEOUT` | Response timeout in seconds (default: 900) | #### Subject Aliases Configure aliases via environment variables to simplify common deployments: ```bash export HOMELAB_DEPLOY_ALIAS_TEST="deploy.test.all" export HOMELAB_DEPLOY_ALIAS_PROD="deploy.prod.all" export HOMELAB_DEPLOY_ALIAS_PROD_DNS="deploy.prod.role.dns" # Now use short aliases homelab-deploy deploy test --nats-url ... --nkey-file ... homelab-deploy deploy prod-dns --nats-url ... --nkey-file ... ``` Alias lookup: `HOMELAB_DEPLOY_ALIAS_` where name is uppercased and hyphens become underscores. ### MCP Server Mode Run as an MCP server for AI assistant integration: ```bash # Test-tier only access homelab-deploy mcp \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/mcp.nkey # With admin access to all tiers homelab-deploy mcp \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/mcp.nkey \ --enable-admin \ --admin-nkey-file /run/secrets/admin.nkey ``` #### MCP Tools | Tool | Description | |------|-------------| | `deploy` | Deploy to test-tier hosts only | | `deploy_admin` | Deploy to any tier (requires `--enable-admin`) | | `list_hosts` | Discover available deployment targets | #### Tool Parameters **deploy / deploy_admin:** - `hostname` - Target specific host - `all` - Deploy to all hosts (in tier) - `role` - Deploy to hosts with this role - `branch` - Git branch/commit (default: master) - `action` - switch, boot, test, dry-activate (default: switch) - `tier` - Required for deploy_admin only **list_hosts:** - `tier` - Filter by tier (optional) ## NixOS Module Add the module to your NixOS configuration: ```nix { inputs.homelab-deploy.url = "github:torjus/homelab-deploy"; outputs = { self, nixpkgs, homelab-deploy, ... }: { nixosConfigurations.myhost = nixpkgs.lib.nixosSystem { modules = [ homelab-deploy.nixosModules.default { services.homelab-deploy.listener = { enable = true; tier = "prod"; role = "dns"; natsUrl = "nats://nats.example.com:4222"; nkeyFile = "/run/secrets/homelab-deploy-nkey"; flakeUrl = "git+https://git.example.com/user/nixos-configs.git"; }; } ]; }; }; } ``` ### Module Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `enable` | bool | `false` | Enable the listener service | | `package` | package | from flake | Package to use | | `hostname` | string | `config.networking.hostName` | Hostname for subject templates | | `tier` | enum | required | `"test"` or `"prod"` | | `role` | string | `null` | Role for role-based targeting | | `natsUrl` | string | required | NATS server URL | | `nkeyFile` | path | required | Path to NKey seed file | | `flakeUrl` | string | required | Git flake URL | | `timeout` | int | `600` | Deployment timeout in seconds | | `deploySubjects` | list of string | see below | Subjects to subscribe to | | `discoverSubject` | string | `"deploy.discover"` | Discovery subject | | `environment` | attrs | `{}` | Additional environment variables | | `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint | | `metrics.address` | string | `":9972"` | Metrics HTTP server address | | `metrics.openFirewall` | bool | `false` | Open firewall for metrics port | | `extraArgs` | list of string | `[]` | Extra command line arguments (e.g., `["--debug"]`) | Default `deploySubjects`: ```nix [ "deploy.." "deploy..all" "deploy..role." ] ``` ## Prometheus Metrics The listener can expose Prometheus metrics for monitoring deployment operations. ### Enabling Metrics **CLI:** ```bash homelab-deploy listener \ --hostname myhost \ --tier prod \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/listener.nkey \ --flake-url git+https://git.example.com/user/nixos-configs.git \ --metrics-enabled \ --metrics-addr :9972 ``` **NixOS module:** ```nix services.homelab-deploy.listener = { enable = true; tier = "prod"; natsUrl = "nats://nats.example.com:4222"; nkeyFile = "/run/secrets/homelab-deploy-nkey"; flakeUrl = "git+https://git.example.com/user/nixos-configs.git"; metrics = { enable = true; address = ":9972"; openFirewall = true; # Optional: open firewall for Prometheus scraping }; }; ``` ### Available Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `homelab_deploy_deployments_total` | Counter | `status`, `action`, `error_code` | Total deployment requests processed | | `homelab_deploy_deployment_duration_seconds` | Histogram | `action`, `success` | Deployment execution time | | `homelab_deploy_deployment_in_progress` | Gauge | - | 1 if deployment running, 0 otherwise | | `homelab_deploy_info` | Gauge | `hostname`, `tier`, `role`, `version` | Static instance metadata | **Label values:** - `status`: `completed`, `failed`, `rejected` - `action`: `switch`, `boot`, `test`, `dry-activate` - `error_code`: `invalid_action`, `invalid_revision`, `already_running`, `build_failed`, `timeout`, or empty - `success`: `true`, `false` ### HTTP Endpoints | Endpoint | Description | |----------|-------------| | `/metrics` | Prometheus metrics in text format | | `/health` | Health check (returns `ok`) | ### Example Prometheus Queries ```promql # Average deployment duration (last hour) rate(homelab_deploy_deployment_duration_seconds_sum[1h]) / rate(homelab_deploy_deployment_duration_seconds_count[1h]) # Deployment success rate (last 24 hours) sum(rate(homelab_deploy_deployments_total{status="completed"}[24h])) / sum(rate(homelab_deploy_deployments_total{status=~"completed|failed"}[24h])) # 95th percentile deployment time histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[1h])) # Currently running deployments across all hosts sum(homelab_deploy_deployment_in_progress) ``` ## Troubleshooting ### Debug Logging Enable debug logging to diagnose issues with deployments or metrics: **CLI:** ```bash homelab-deploy listener --debug \ --hostname myhost \ --tier prod \ --nats-url nats://nats.example.com:4222 \ --nkey-file /run/secrets/listener.nkey \ --flake-url git+https://git.example.com/user/nixos-configs.git \ --metrics-enabled ``` **NixOS module:** ```nix services.homelab-deploy.listener = { enable = true; tier = "prod"; natsUrl = "nats://nats.example.com:4222"; nkeyFile = "/run/secrets/homelab-deploy-nkey"; flakeUrl = "git+https://git.example.com/user/nixos-configs.git"; metrics.enable = true; extraArgs = [ "--debug" ]; }; ``` With debug logging enabled, the listener outputs detailed information about metrics recording: ```json {"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true} {"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5} ``` ### Metrics Showing Zero If deployment metrics remain at zero after deployments: 1. **Check metrics are enabled**: Verify `--metrics-enabled` is set and the metrics endpoint is accessible at `/metrics` 2. **Enable debug logging**: Use `--debug` to confirm metrics recording is being called 3. **Check deployment status**: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with `status="rejected"` but don't record duration 4. **Check after restart**: After a successful `switch` deployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics 5. **Verify Prometheus scrape timing**: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts ## Message Protocol ### Deploy Request ```json { "action": "switch", "revision": "main", "reply_to": "deploy.responses.abc123" } ``` ### Deploy Response ```json { "hostname": "myhost", "status": "completed", "error": null, "message": "Successfully switched to generation 42" } ``` **Status values:** `accepted`, `rejected`, `started`, `completed`, `failed` **Error codes:** `invalid_revision`, `invalid_action`, `already_running`, `build_failed`, `timeout` ## NATS Authentication All connections use NKey authentication. Generate keys with: ```bash nk -gen user -pubout ``` Configure appropriate publish/subscribe permissions in your NATS server for each credential type. ## NATS Subject Structure The deployment system uses the following NATS subject hierarchy: ### Deploy Subjects | Subject Pattern | Purpose | |-----------------|---------| | `deploy..` | Deploy to a specific host | | `deploy..all` | Deploy to all hosts in a tier | | `deploy..role.` | Deploy to hosts with a specific role in a tier | **Tier values:** `test`, `prod` **Examples:** - `deploy.test.myhost` - Deploy to myhost in test tier - `deploy.prod.all` - Deploy to all production hosts - `deploy.prod.role.dns` - Deploy to all DNS servers in production ### Response Subjects | Subject Pattern | Purpose | |-----------------|---------| | `deploy.responses.` | Unique reply subject for each deployment request | Deployers create a unique response subject for each request and include it in the `reply_to` field. Listeners publish status updates to this subject. ### Discovery Subject | Subject Pattern | Purpose | |-----------------|---------| | `deploy.discover` | Host discovery requests and responses | Used by the `list_hosts` MCP tool and for discovering available deployment targets. ## Example NATS Configuration Below is an example NATS server configuration implementing tiered authentication. This setup provides: - **Listeners** - Each host has credentials to subscribe to its own subjects and publish responses - **Test deployer** - Can deploy to test tier only (suitable for MCP without admin access) - **Admin deployer** - Can deploy to all tiers (for CLI or MCP with admin access) ```conf authorization { users = [ # Listener for a test-tier host { nkey: "UTEST_HOST1_PUBLIC_KEY_HERE" permissions: { subscribe: [ "deploy.test.testhost1" "deploy.test.all" "deploy.test.role.>" "deploy.discover" ] publish: [ "deploy.responses.>" "deploy.discover" ] } } # Listener for a prod-tier host with 'dns' role { nkey: "UPROD_DNS1_PUBLIC_KEY_HERE" permissions: { subscribe: [ "deploy.prod.dns1" "deploy.prod.all" "deploy.prod.role.dns" "deploy.discover" ] publish: [ "deploy.responses.>" "deploy.discover" ] } } # Test-tier deployer (MCP without admin) { nkey: "UTEST_DEPLOYER_PUBLIC_KEY_HERE" permissions: { publish: [ "deploy.test.>" "deploy.discover" ] subscribe: [ "deploy.responses.>" "deploy.discover" ] } } # Admin deployer (full access to all tiers) { nkey: "UADMIN_DEPLOYER_PUBLIC_KEY_HERE" permissions: { publish: [ "deploy.>" ] subscribe: [ "deploy.>" ] } } ] } ``` ### Key Permission Patterns | Credential Type | Publish | Subscribe | |-----------------|---------|-----------| | Listener | `deploy.responses.>`, `deploy.discover` | Own subjects, `deploy.discover` | | Test deployer | `deploy.test.>`, `deploy.discover` | `deploy.responses.>`, `deploy.discover` | | Admin deployer | `deploy.>` | `deploy.>` | ### Generating NKeys ```bash # Generate a keypair (outputs public key, saves seed to file) nk -gen user -pubout > mykey.pub # The seed (private key) is printed to stderr - save it securely # Or generate and save seed directly nk -gen user > mykey.seed nk -inkey mykey.seed -pubout # Get public key from seed ``` The public key (starting with `U`) goes in the NATS server config. The seed file (starting with `SU`) is used by homelab-deploy via `--nkey-file`. ## License MIT