docs: document --debug flag and extraArgs module option

Add documentation for:
- --debug flag in Listener Flags table
- --heartbeat-interval flag (was missing)
- extraArgs NixOS module option
- New Troubleshooting section with debug logging examples
  and guidance for diagnosing metrics issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-09 01:28:21 +01:00
parent c934d1ba38
commit c272ce6903

View File

@@ -63,6 +63,8 @@ homelab-deploy listener \
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) | | `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint | | `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) | | `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
| `--heartbeat-interval` | No | Status update interval in seconds during deployment (default: 15) |
| `--debug` | No | Enable debug logging for troubleshooting |
#### Subject Templates #### Subject Templates
@@ -214,6 +216,7 @@ Add the module to your NixOS configuration:
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint | | `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
| `metrics.address` | string | `":9972"` | Metrics HTTP server address | | `metrics.address` | string | `":9972"` | Metrics HTTP server address |
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port | | `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
| `extraArgs` | list of string | `[]` | Extra command line arguments (e.g., `["--debug"]`) |
Default `deploySubjects`: Default `deploySubjects`:
```nix ```nix
@@ -298,6 +301,57 @@ histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[
sum(homelab_deploy_deployment_in_progress) sum(homelab_deploy_deployment_in_progress)
``` ```
## Troubleshooting
### Debug Logging
Enable debug logging to diagnose issues with deployments or metrics:
**CLI:**
```bash
homelab-deploy listener --debug \
--hostname myhost \
--tier prod \
--nats-url nats://nats.example.com:4222 \
--nkey-file /run/secrets/listener.nkey \
--flake-url git+https://git.example.com/user/nixos-configs.git \
--metrics-enabled
```
**NixOS module:**
```nix
services.homelab-deploy.listener = {
enable = true;
tier = "prod";
natsUrl = "nats://nats.example.com:4222";
nkeyFile = "/run/secrets/homelab-deploy-nkey";
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
metrics.enable = true;
extraArgs = [ "--debug" ];
};
```
With debug logging enabled, the listener outputs detailed information about metrics recording:
```json
{"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true}
{"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5}
```
### Metrics Showing Zero
If deployment metrics remain at zero after deployments:
1. **Check metrics are enabled**: Verify `--metrics-enabled` is set and the metrics endpoint is accessible at `/metrics`
2. **Enable debug logging**: Use `--debug` to confirm metrics recording is being called
3. **Check deployment status**: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with `status="rejected"` but don't record duration
4. **Check after restart**: After a successful `switch` deployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics
5. **Verify Prometheus scrape timing**: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts
## Message Protocol ## Message Protocol
### Deploy Request ### Deploy Request