- Add configurable NATS subject patterns with template variables (<hostname>, <tier>, <role>) for multi-tenant setups - Add deploy.discover subject for host discovery - Simplify CLI to use direct subjects with optional aliases via HOMELAB_DEPLOY_ALIAS_* environment variables - Clarify request/reply flow with UUID-based response subjects - Expand NixOS module with hardening options, package option, and configurable deploy/discover subjects - Switch CLI framework from cobra to urfave/cli/v3 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
655 lines
21 KiB
Markdown
655 lines
21 KiB
Markdown
# homelab-deploy Design Document
|
|
|
|
A message-based deployment system for NixOS configurations using NATS for messaging. This binary runs in multiple modes to enable on-demand NixOS configuration updates across a fleet of hosts.
|
|
|
|
## Overview
|
|
|
|
The `homelab-deploy` binary provides three operational modes:
|
|
|
|
1. **Listener mode** - Runs on each NixOS host as a systemd service, subscribing to NATS subjects and executing `nixos-rebuild` when deployment requests arrive
|
|
2. **MCP mode** - Runs as an MCP (Model Context Protocol) server, exposing deployment tools for AI assistants
|
|
3. **CLI mode** - Manual deployment commands for administrators
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐
|
|
│ MCP Tool │ deploy.test.> │ Admin CLI │ deploy.test.> + deploy.prod.>
|
|
│ │────────────┐ ┌─────│ │
|
|
└─────────────┘ │ │ └─────────────┘
|
|
▼ ▼
|
|
┌──────────────┐
|
|
│ NATS Server │
|
|
│ (authz) │
|
|
└──────┬───────┘
|
|
│
|
|
┌─────────────────┼─────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
│ host-a │ │ host-b │ │ host-c │
|
|
│ tier=test│ │ tier=prod│ │ tier=prod│
|
|
└──────────┘ └──────────┘ └──────────┘
|
|
```
|
|
|
|
## Repository Structure
|
|
|
|
```
|
|
homelab-deploy/
|
|
├── flake.nix # Nix flake with Go package + NixOS module
|
|
├── go.mod
|
|
├── go.sum
|
|
├── cmd/
|
|
│ └── homelab-deploy/
|
|
│ └── main.go # CLI entrypoint with subcommands
|
|
├── internal/
|
|
│ ├── listener/ # Listener mode logic
|
|
│ ├── mcp/ # MCP server mode logic
|
|
│ ├── nats/ # NATS client wrapper
|
|
│ └── deploy/ # Shared deployment execution logic
|
|
└── nixos/
|
|
└── module.nix # NixOS module for listener service
|
|
```
|
|
|
|
## CLI Interface
|
|
|
|
```bash
|
|
# Listener mode (runs as systemd service on each host)
|
|
homelab-deploy listener \
|
|
--hostname <hostname> \
|
|
--tier <test|prod> \
|
|
--nats-url nats://server:4222 \
|
|
--nkey-file /path/to/listener.nkey \
|
|
--flake-url <git+https://...> \
|
|
[--role <role>] \
|
|
[--timeout 600] \
|
|
[--deploy-subject <subject>]... \
|
|
[--discover-subject <subject>]
|
|
|
|
# Subject flags can be repeated and use template variables:
|
|
homelab-deploy listener \
|
|
--hostname ns1 \
|
|
--tier prod \
|
|
--role dns \
|
|
--deploy-subject "deploy.<tier>.<hostname>" \
|
|
--deploy-subject "deploy.<tier>.all" \
|
|
--deploy-subject "deploy.<tier>.role.<role>" \
|
|
--discover-subject "deploy.discover" \
|
|
...
|
|
|
|
# MCP server mode (for AI assistants)
|
|
homelab-deploy mcp \
|
|
--nats-url nats://server:4222 \
|
|
--nkey-file /path/to/mcp.nkey \
|
|
[--enable-admin --admin-nkey-file /path/to/admin.nkey]
|
|
|
|
# CLI commands for manual use
|
|
# Deploy to a specific subject
|
|
homelab-deploy deploy <subject> \
|
|
--nats-url nats://server:4222 \
|
|
--nkey-file /path/to/deployer.nkey \
|
|
[--branch <branch>] \
|
|
[--action <switch|boot|test|dry-activate>]
|
|
|
|
# Examples:
|
|
homelab-deploy deploy deploy.prod.ns1 # Deploy to specific host
|
|
homelab-deploy deploy deploy.test.all # Deploy to all test hosts
|
|
homelab-deploy deploy deploy.prod.role.dns # Deploy to all prod DNS hosts
|
|
|
|
# Using aliases (configured via environment variables)
|
|
homelab-deploy deploy test # Expands to configured subject
|
|
homelab-deploy deploy prod-dns # Expands to configured subject
|
|
```
|
|
|
|
### CLI Subject Aliases
|
|
|
|
The CLI supports subject aliases via environment variables. If the `<subject>` argument doesn't look like a NATS subject (no dots), the CLI checks for an alias.
|
|
|
|
**Environment variable format:** `HOMELAB_DEPLOY_ALIAS_<NAME>=<subject>`
|
|
|
|
```bash
|
|
export HOMELAB_DEPLOY_ALIAS_TEST="deploy.test.all"
|
|
export HOMELAB_DEPLOY_ALIAS_PROD="deploy.prod.all"
|
|
export HOMELAB_DEPLOY_ALIAS_PROD_DNS="deploy.prod.role.dns"
|
|
|
|
# Now these work:
|
|
homelab-deploy deploy test # -> deploy.test.all
|
|
homelab-deploy deploy prod # -> deploy.prod.all
|
|
homelab-deploy deploy prod-dns # -> deploy.prod.role.dns
|
|
```
|
|
|
|
Alias names are case-insensitive and hyphens are converted to underscores when looking up the environment variable.
|
|
|
|
## NATS Subject Structure
|
|
|
|
Subjects follow the pattern `deploy.<tier>.<target>` by default, but are fully configurable:
|
|
|
|
| Subject Pattern | Description |
|
|
|-----------------|-------------|
|
|
| `deploy.<tier>.<hostname>` | Deploy to specific host (e.g., `deploy.prod.ns1`) |
|
|
| `deploy.<tier>.all` | Deploy to all hosts in tier (e.g., `deploy.test.all`) |
|
|
| `deploy.<tier>.role.<role>` | Deploy to hosts with role in tier (e.g., `deploy.prod.role.dns`) |
|
|
| `deploy.responses.<uuid>` | Response subject for request/reply (UUID generated by CLI) |
|
|
| `deploy.discover` | Host discovery requests |
|
|
|
|
### Subject Customization
|
|
|
|
Listeners can configure custom subject patterns using template variables:
|
|
- `<hostname>` - The listener's hostname
|
|
- `<tier>` - The listener's tier (test/prod)
|
|
- `<role>` - The listener's role (if configured)
|
|
|
|
This allows prefixing subjects for multi-tenant setups (e.g., `homelab.deploy.<tier>.<hostname>`).
|
|
|
|
## Listener Mode
|
|
|
|
### Responsibilities
|
|
|
|
1. Connect to NATS using NKey authentication
|
|
2. Subscribe to configured deploy subjects (with template expansion)
|
|
3. Subscribe to discovery subject and respond with host metadata
|
|
4. Validate incoming deployment requests
|
|
5. Execute `nixos-rebuild` with the specified parameters
|
|
6. Report status back via NATS reply subject
|
|
|
|
### Subject Subscriptions
|
|
|
|
Listeners subscribe to a configurable list of subjects. The configuration uses template variables that are expanded at runtime:
|
|
|
|
```yaml
|
|
listener:
|
|
hostname: ns1
|
|
tier: prod
|
|
role: dns
|
|
|
|
deploy_subjects:
|
|
- "deploy.<tier>.<hostname>"
|
|
- "deploy.<tier>.all"
|
|
- "deploy.<tier>.role.<role>"
|
|
|
|
discover_subject: "deploy.discover"
|
|
```
|
|
|
|
Template variables:
|
|
- `<hostname>` - Replaced with the configured hostname
|
|
- `<tier>` - Replaced with the configured tier
|
|
- `<role>` - Replaced with the configured role (subject skipped if role is null)
|
|
|
|
**Example:** With the above configuration, the listener subscribes to:
|
|
- `deploy.prod.ns1`
|
|
- `deploy.prod.all`
|
|
- `deploy.prod.role.dns`
|
|
- `deploy.discover`
|
|
|
|
**Prefixed example:** For multi-tenant setups:
|
|
```yaml
|
|
listener:
|
|
hostname: ns1
|
|
tier: prod
|
|
deploy_subjects:
|
|
- "homelab.deploy.<tier>.<hostname>"
|
|
- "homelab.deploy.<tier>.all"
|
|
discover_subject: "homelab.deploy.discover"
|
|
```
|
|
|
|
### Message Formats
|
|
|
|
**Request message:**
|
|
```json
|
|
{
|
|
"action": "switch",
|
|
"revision": "master",
|
|
"reply_to": "deploy.responses.abc123"
|
|
}
|
|
```
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `action` | string | yes | One of: `switch`, `boot`, `test`, `dry-activate` |
|
|
| `revision` | string | yes | Git branch name or commit hash |
|
|
| `reply_to` | string | yes | Subject to publish responses to |
|
|
|
|
**Response message:**
|
|
```json
|
|
{
|
|
"hostname": "ns1",
|
|
"status": "completed",
|
|
"error": null,
|
|
"message": "Successfully switched to generation 42"
|
|
}
|
|
```
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `hostname` | string | The responding host's name |
|
|
| `status` | string | One of: `accepted`, `rejected`, `started`, `completed`, `failed` |
|
|
| `error` | string or null | Error code if status is `rejected` or `failed` |
|
|
| `message` | string | Human-readable details |
|
|
|
|
**Error codes:**
|
|
- `invalid_revision` - The specified branch/commit does not exist
|
|
- `invalid_action` - The action is not recognized
|
|
- `already_running` - A deployment is already in progress on this host
|
|
- `build_failed` - nixos-rebuild exited with non-zero status
|
|
- `timeout` - Deployment exceeded the configured timeout
|
|
|
|
### Request/Reply Flow
|
|
|
|
1. CLI generates a UUID for the request (e.g., `550e8400-e29b-41d4-a716-446655440000`)
|
|
2. CLI subscribes to `deploy.responses.<uuid>`
|
|
3. CLI publishes deploy request to target subject with `reply_to: "deploy.responses.<uuid>"`
|
|
4. Listener validates request:
|
|
- Checks revision exists using `git ls-remote`
|
|
- Checks no other deployment is running
|
|
5. Listener publishes response to the `reply_to` subject:
|
|
- `{"status": "rejected", ...}` if validation fails, or
|
|
- `{"status": "started", ...}` if deployment begins
|
|
6. If started, listener executes nixos-rebuild
|
|
7. Listener publishes final response to the same `reply_to` subject:
|
|
- `{"status": "completed", ...}` on success, or
|
|
- `{"status": "failed", ...}` on failure
|
|
8. CLI receives responses and displays progress/results
|
|
9. CLI unsubscribes after receiving final status or timeout
|
|
|
|
### Deployment Execution
|
|
|
|
The listener executes `nixos-rebuild` with the following command pattern:
|
|
|
|
```bash
|
|
nixos-rebuild <action> --flake <flake-url>?ref=<revision>#<hostname>
|
|
```
|
|
|
|
Where:
|
|
- `<action>` is one of: `switch`, `boot`, `test`, `dry-activate`
|
|
- `<flake-url>` is the configured git flake URL (e.g., `git+https://git.example.com/user/nixos-configs.git`)
|
|
- `<revision>` is the branch name or commit hash from the request
|
|
- `<hostname>` is the listener's configured hostname
|
|
|
|
**Environment requirements:**
|
|
- Must run as root (nixos-rebuild requires root)
|
|
- Nix must be configured with proper git credentials if the flake is private
|
|
- Network access to the git repository
|
|
|
|
### Concurrency Control
|
|
|
|
Only one deployment may run at a time per host. The listener maintains a simple lock:
|
|
- Before starting a deployment, acquire lock
|
|
- If lock is held, reject with `already_running` error
|
|
- Release lock when deployment completes (success or failure)
|
|
- Lock should be in-memory (no persistence needed - restarts clear it)
|
|
|
|
### Logging
|
|
|
|
All deployment events should be logged to stdout/stderr (captured by systemd journal):
|
|
- Request received (with subject, action, revision)
|
|
- Validation result
|
|
- Deployment start
|
|
- Deployment completion (with exit code)
|
|
- Any errors
|
|
|
|
This enables integration with log aggregation systems (e.g., Loki via Promtail).
|
|
|
|
## MCP Mode
|
|
|
|
### Purpose
|
|
|
|
Exposes deployment functionality as MCP tools for AI assistants (e.g., Claude Code).
|
|
|
|
### Tools
|
|
|
|
| Tool | Description | Parameters |
|
|
|------|-------------|------------|
|
|
| `deploy` | Deploy to test-tier hosts | `hostname` or `all`, optional `role`, `branch`, `action` |
|
|
| `deploy_admin` | Deploy to any tier (requires `--enable-admin`) | `tier`, `hostname` or `all`, optional `role`, `branch`, `action` |
|
|
| `list_hosts` | List available deployment targets | `tier` (optional) |
|
|
|
|
### Tool Schemas
|
|
|
|
**deploy:**
|
|
```json
|
|
{
|
|
"name": "deploy",
|
|
"description": "Deploy NixOS configuration to test-tier hosts",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"hostname": {
|
|
"type": "string",
|
|
"description": "Target hostname, or omit to use 'all' or 'role' targeting"
|
|
},
|
|
"all": {
|
|
"type": "boolean",
|
|
"description": "Deploy to all test-tier hosts"
|
|
},
|
|
"role": {
|
|
"type": "string",
|
|
"description": "Deploy to all test-tier hosts with this role"
|
|
},
|
|
"branch": {
|
|
"type": "string",
|
|
"description": "Git branch or commit to deploy (default: master)"
|
|
},
|
|
"action": {
|
|
"type": "string",
|
|
"enum": ["switch", "boot", "test", "dry-activate"],
|
|
"description": "nixos-rebuild action (default: switch)"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**deploy_admin:**
|
|
```json
|
|
{
|
|
"name": "deploy_admin",
|
|
"description": "Deploy NixOS configuration to any host (admin access required)",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"tier": {
|
|
"type": "string",
|
|
"enum": ["test", "prod"],
|
|
"description": "Target tier"
|
|
},
|
|
"hostname": {
|
|
"type": "string",
|
|
"description": "Target hostname, or omit to use 'all' or 'role' targeting"
|
|
},
|
|
"all": {
|
|
"type": "boolean",
|
|
"description": "Deploy to all hosts in tier"
|
|
},
|
|
"role": {
|
|
"type": "string",
|
|
"description": "Deploy to all hosts with this role in tier"
|
|
},
|
|
"branch": {
|
|
"type": "string",
|
|
"description": "Git branch or commit to deploy (default: master)"
|
|
},
|
|
"action": {
|
|
"type": "string",
|
|
"enum": ["switch", "boot", "test", "dry-activate"],
|
|
"description": "nixos-rebuild action (default: switch)"
|
|
}
|
|
},
|
|
"required": ["tier"]
|
|
}
|
|
}
|
|
```
|
|
|
|
**list_hosts:**
|
|
```json
|
|
{
|
|
"name": "list_hosts",
|
|
"description": "List available deployment targets",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"tier": {
|
|
"type": "string",
|
|
"enum": ["test", "prod"],
|
|
"description": "Filter by tier (optional)"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Security Layers
|
|
|
|
1. **MCP flag**: `deploy_admin` tool only registered when `--enable-admin` is passed
|
|
2. **NATS authz**: MCP credentials can only publish to authorized subjects
|
|
3. **AI assistant permissions**: The assistant's configuration can require confirmation for admin operations
|
|
|
|
### Multi-Host Deployments
|
|
|
|
When deploying to multiple hosts (via `all` or `role`), the MCP should:
|
|
1. Publish the request to the appropriate broadcast subject
|
|
2. Collect responses from all responding hosts
|
|
3. Return aggregated results showing each host's status
|
|
|
|
**Timeout handling:**
|
|
- Set a reasonable timeout for collecting responses (e.g., 30 seconds after last response, or max 15 minutes)
|
|
- Return partial results if some hosts don't respond
|
|
- Indicate which hosts did not respond
|
|
|
|
### Host Discovery
|
|
|
|
The `list_hosts` tool needs to know available hosts. Options:
|
|
1. **Static configuration**: Read from a config file or environment variable
|
|
2. **NATS request**: Publish to a discovery subject and collect responses from listeners
|
|
|
|
Recommend option 2: Listeners subscribe to their configured `discover_subject` and respond with metadata.
|
|
|
|
**Discovery request:**
|
|
```json
|
|
{
|
|
"reply_to": "deploy.responses.discover-abc123"
|
|
}
|
|
```
|
|
|
|
**Discovery response:**
|
|
```json
|
|
{
|
|
"hostname": "ns1",
|
|
"tier": "prod",
|
|
"role": "dns",
|
|
"deploy_subjects": [
|
|
"deploy.prod.ns1",
|
|
"deploy.prod.all",
|
|
"deploy.prod.role.dns"
|
|
]
|
|
}
|
|
```
|
|
|
|
The response includes the expanded `deploy_subjects` so clients know exactly which subjects reach this host.
|
|
|
|
## NixOS Module
|
|
|
|
The NixOS module configures the listener as a systemd service with appropriate hardening.
|
|
|
|
### Module Options
|
|
|
|
```nix
|
|
{
|
|
options.services.homelab-deploy.listener = {
|
|
enable = lib.mkEnableOption "homelab-deploy listener service";
|
|
|
|
package = lib.mkPackageOption pkgs "homelab-deploy" { };
|
|
|
|
hostname = lib.mkOption {
|
|
type = lib.types.str;
|
|
default = config.networking.hostName;
|
|
description = "Hostname for this listener (used in subject templates)";
|
|
};
|
|
|
|
tier = lib.mkOption {
|
|
type = lib.types.enum [ "test" "prod" ];
|
|
description = "Deployment tier for this host";
|
|
};
|
|
|
|
role = lib.mkOption {
|
|
type = lib.types.nullOr lib.types.str;
|
|
default = null;
|
|
description = "Role for role-based deployment targeting";
|
|
};
|
|
|
|
natsUrl = lib.mkOption {
|
|
type = lib.types.str;
|
|
description = "NATS server URL";
|
|
example = "nats://nats.example.com:4222";
|
|
};
|
|
|
|
nkeyFile = lib.mkOption {
|
|
type = lib.types.path;
|
|
description = "Path to NKey seed file for NATS authentication";
|
|
example = "/run/secrets/homelab-deploy-nkey";
|
|
};
|
|
|
|
flakeUrl = lib.mkOption {
|
|
type = lib.types.str;
|
|
description = "Git flake URL for nixos-rebuild";
|
|
example = "git+https://git.example.com/user/nixos-configs.git";
|
|
};
|
|
|
|
timeout = lib.mkOption {
|
|
type = lib.types.int;
|
|
default = 600;
|
|
description = "Deployment timeout in seconds";
|
|
};
|
|
|
|
deploySubjects = lib.mkOption {
|
|
type = lib.types.listOf lib.types.str;
|
|
default = [
|
|
"deploy.<tier>.<hostname>"
|
|
"deploy.<tier>.all"
|
|
"deploy.<tier>.role.<role>"
|
|
];
|
|
description = ''
|
|
List of NATS subjects to subscribe to for deployment requests.
|
|
Template variables: <hostname>, <tier>, <role>
|
|
'';
|
|
};
|
|
|
|
discoverSubject = lib.mkOption {
|
|
type = lib.types.str;
|
|
default = "deploy.discover";
|
|
description = "NATS subject for host discovery requests";
|
|
};
|
|
|
|
environment = lib.mkOption {
|
|
type = lib.types.attrsOf lib.types.str;
|
|
default = { };
|
|
description = "Additional environment variables for the service";
|
|
example = { GIT_SSH_COMMAND = "ssh -i /run/secrets/deploy-key"; };
|
|
};
|
|
};
|
|
}
|
|
```
|
|
|
|
### Systemd Service
|
|
|
|
The module creates a hardened systemd service:
|
|
|
|
```nix
|
|
systemd.services.homelab-deploy-listener = {
|
|
description = "homelab-deploy listener";
|
|
wantedBy = [ "multi-user.target" ];
|
|
after = [ "network-online.target" ];
|
|
wants = [ "network-online.target" ];
|
|
|
|
environment = cfg.environment;
|
|
|
|
serviceConfig = {
|
|
Type = "simple";
|
|
ExecStart = "${cfg.package}/bin/homelab-deploy listener ...";
|
|
Restart = "always";
|
|
RestartSec = 10;
|
|
|
|
# Hardening (compatible with nixos-rebuild requirements)
|
|
NoNewPrivileges = false; # nixos-rebuild may need to spawn privileged processes
|
|
ProtectSystem = "false"; # nixos-rebuild modifies /nix/store and /run
|
|
ProtectHome = "read-only";
|
|
PrivateTmp = true;
|
|
PrivateDevices = true;
|
|
ProtectKernelTunables = true;
|
|
ProtectKernelModules = true;
|
|
ProtectControlGroups = true;
|
|
RestrictAddressFamilies = [ "AF_UNIX" "AF_INET" "AF_INET6" ];
|
|
RestrictNamespaces = false; # nix build uses namespaces
|
|
RestrictSUIDSGID = true;
|
|
LockPersonality = true;
|
|
MemoryDenyWriteExecute = false; # nix may need this
|
|
SystemCallArchitectures = "native";
|
|
};
|
|
};
|
|
```
|
|
|
|
**Note:** Some hardening options are relaxed because `nixos-rebuild` requires:
|
|
- Write access to `/nix/store` for building
|
|
- Ability to activate system configurations
|
|
- Network access for fetching from git/cache
|
|
- Namespace support for nix sandbox builds
|
|
|
|
## NATS Authentication
|
|
|
|
All NATS connections use NKey authentication. NKeys are ed25519 keypairs where:
|
|
- The seed (private key) is stored in a file readable by the service
|
|
- The public key is configured in the NATS server's user list
|
|
|
|
### Credential Types
|
|
|
|
| Credential | Purpose | Publish Permissions | Subscribe Permissions |
|
|
|------------|---------|---------------------|----------------------|
|
|
| listener | Host listener service | `deploy.responses.>` | `deploy.*.>` |
|
|
| mcp-deployer | MCP test-tier access | `deploy.test.>` | `deploy.responses.>`, `deploy.discover` |
|
|
| admin-deployer | Full deployment access | `deploy.test.>`, `deploy.prod.>` | `deploy.responses.>`, `deploy.discover` |
|
|
|
|
## Flake Structure
|
|
|
|
The flake.nix should provide:
|
|
|
|
1. **Package**: The Go binary
|
|
2. **NixOS module**: The listener service configuration
|
|
3. **Development shell**: Go toolchain for development
|
|
|
|
```nix
|
|
{
|
|
inputs = {
|
|
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
|
|
};
|
|
|
|
outputs = { self, nixpkgs }: {
|
|
packages.x86_64-linux.default = /* Go package build */;
|
|
packages.x86_64-linux.homelab-deploy = self.packages.x86_64-linux.default;
|
|
|
|
nixosModules.default = import ./nixos/module.nix;
|
|
nixosModules.homelab-deploy = self.nixosModules.default;
|
|
|
|
devShells.x86_64-linux.default = /* Go dev shell */;
|
|
};
|
|
}
|
|
```
|
|
|
|
## Implementation Notes
|
|
|
|
### Go Dependencies
|
|
|
|
Recommended libraries:
|
|
- `github.com/urfave/cli/v3` - CLI framework
|
|
- `github.com/nats-io/nats.go` - NATS client
|
|
- `github.com/mark3labs/mcp-go` - MCP server implementation
|
|
- Standard library for JSON, logging, process execution
|
|
|
|
### Error Handling
|
|
|
|
- NATS connection errors: Retry with exponential backoff
|
|
- nixos-rebuild failures: Capture stdout/stderr, report in response message
|
|
- Timeout: Kill the nixos-rebuild process, report timeout error
|
|
|
|
### Testing
|
|
|
|
- Unit tests for message parsing and validation
|
|
- Integration tests using a local NATS server
|
|
- End-to-end tests with a NixOS VM (optional, can be done in consuming repo)
|
|
|
|
## Security Considerations
|
|
|
|
- **Privilege**: Listener runs as root to execute nixos-rebuild
|
|
- **Input validation**: Strictly validate revision format (alphanumeric, dashes, underscores, dots, slashes for branch names; hex for commit hashes)
|
|
- **Command injection**: Never interpolate user input into shell commands without validation
|
|
- **Rate limiting**: Consider adding rate limiting to prevent rapid-fire deployments
|
|
- **Audit logging**: Log all deployment requests with full context
|
|
- **Network isolation**: NATS should only be accessible from trusted networks
|
|
|
|
## Future Enhancements
|
|
|
|
These are not required for initial implementation:
|
|
|
|
1. **Deployment locking** - Cluster-wide lock to prevent fleet-wide concurrent deploys
|
|
2. **Prometheus metrics** - Export deployment count, duration, success/failure rates
|
|
3. **Webhook triggers** - HTTP endpoint for CI/CD integration
|
|
4. **Scheduled deployments** - Deploy at specific times (though this overlaps with existing auto-upgrade)
|