Commit Graph

10 Commits

Author SHA1 Message Date
c934d1ba38 feat: add --debug flag for metrics troubleshooting
Add a --debug flag to the listener command that enables debug-level
logging. When enabled, the listener logs detailed information about
metrics recording including:

- When deployment start/end metrics are recorded
- The action, success status, and duration being recorded
- Whether metrics are enabled or disabled (skipped)

This helps troubleshoot issues where deployment metrics appear to
remain at zero after deployments.

Also add extraArgs option to the NixOS module to allow passing
additional arguments like --debug to the service.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 01:26:12 +01:00
79db119d1c feat: add Prometheus metrics to listener service
Add an optional Prometheus metrics HTTP endpoint to the listener for
monitoring deployment operations. Includes four metrics:

- homelab_deploy_deployments_total (counter with status/action/error_code)
- homelab_deploy_deployment_duration_seconds (histogram with action/success)
- homelab_deploy_deployment_in_progress (gauge)
- homelab_deploy_info (gauge with hostname/tier/role/version)

New CLI flags: --metrics-enabled, --metrics-addr (default :9972)
New NixOS options: metrics.enable, metrics.address, metrics.openFirewall

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 07:58:22 +01:00
95b795dcfd fix: remove systemd hardening to allow nix sandbox namespace creation
The previous hardening options (ProtectControlGroups, LockPersonality,
SystemCallArchitectures, etc.) prevented Nix from creating the kernel
namespaces required for build sandboxing. Following the approach of
the NixOS auto-upgrade module which has no hardening since nixos-rebuild
requires broad system access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:52:16 +01:00
71d6aa8b61 fix: disable PrivateDevices to allow nix sandbox namespace creation
The PrivateDevices=true systemd hardening option was preventing Nix
from creating the kernel namespaces required for its build sandbox.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:26:53 +01:00
ac3c9c7de6 fix: prevent listener service from restarting during deployment
Add stopIfChanged and restartIfChanged options to prevent the listener
from being interrupted when nixos-rebuild switch activates a new
configuration that changes the service definition.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 06:06:47 +01:00
9f205fee5e fix: add writable cache directory for nix git flake fetching
The listener service had ProtectHome=read-only which prevented Nix
from writing to /root/.cache when fetching git flakes. This adds a
CacheDirectory managed by systemd and sets XDG_CACHE_HOME to use it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:57:59 +01:00
5f3cfc3d21 fix: add nixos-rebuild to PATH and fix CLI hanging after deploy failure
- Add nixos-rebuild to listener service PATH in NixOS module
- Fix CLI deploy command hanging after receiving final status by properly
  tracking lastResponse time and exiting when all hosts have responded

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:53:22 +01:00
c9b85435ba fix: add git to listener service PATH for revision validation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:43:23 +01:00
cf3b1ce2c9 refactor: use flake package directly in NixOS module
Instead of requiring users to provide the package via overlay,
the module now receives `self` from the flake and uses the
package directly from `self.packages`.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 05:08:02 +01:00
fa49e9322a feat: implement NATS-based NixOS deployment system
Implement the complete homelab-deploy system with three operational modes:

- Listener mode: Runs on NixOS hosts as a systemd service, subscribes to
  NATS subjects with configurable templates, executes nixos-rebuild on
  deployment requests with concurrency control

- MCP mode: MCP server exposing deploy, deploy_admin, and list_hosts
  tools for AI assistants with tiered access control

- CLI mode: Manual deployment commands with subject alias support via
  environment variables

Key components:
- internal/messages: Request/response types with validation
- internal/nats: Client wrapper with NKey authentication
- internal/deploy: Executor with timeout and lock for concurrency
- internal/listener: Subject template expansion and request handling
- internal/cli: Deploy logic with alias resolution
- internal/mcp: MCP server with mcp-go integration
- nixos/module.nix: NixOS module with hardened systemd service

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 04:19:47 +01:00