docs: document --debug flag and extraArgs module option
Add documentation for: - --debug flag in Listener Flags table - --heartbeat-interval flag (was missing) - extraArgs NixOS module option - New Troubleshooting section with debug logging examples and guidance for diagnosing metrics issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
54
README.md
54
README.md
@@ -63,6 +63,8 @@ homelab-deploy listener \
|
|||||||
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
|
| `--discover-subject` | No | Discovery subject (default: `deploy.discover`) |
|
||||||
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
|
| `--metrics-enabled` | No | Enable Prometheus metrics endpoint |
|
||||||
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
|
| `--metrics-addr` | No | Metrics HTTP server address (default: `:9972`) |
|
||||||
|
| `--heartbeat-interval` | No | Status update interval in seconds during deployment (default: 15) |
|
||||||
|
| `--debug` | No | Enable debug logging for troubleshooting |
|
||||||
|
|
||||||
#### Subject Templates
|
#### Subject Templates
|
||||||
|
|
||||||
@@ -214,6 +216,7 @@ Add the module to your NixOS configuration:
|
|||||||
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
|
| `metrics.enable` | bool | `false` | Enable Prometheus metrics endpoint |
|
||||||
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
|
| `metrics.address` | string | `":9972"` | Metrics HTTP server address |
|
||||||
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
|
| `metrics.openFirewall` | bool | `false` | Open firewall for metrics port |
|
||||||
|
| `extraArgs` | list of string | `[]` | Extra command line arguments (e.g., `["--debug"]`) |
|
||||||
|
|
||||||
Default `deploySubjects`:
|
Default `deploySubjects`:
|
||||||
```nix
|
```nix
|
||||||
@@ -298,6 +301,57 @@ histogram_quantile(0.95, rate(homelab_deploy_deployment_duration_seconds_bucket[
|
|||||||
sum(homelab_deploy_deployment_in_progress)
|
sum(homelab_deploy_deployment_in_progress)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Debug Logging
|
||||||
|
|
||||||
|
Enable debug logging to diagnose issues with deployments or metrics:
|
||||||
|
|
||||||
|
**CLI:**
|
||||||
|
```bash
|
||||||
|
homelab-deploy listener --debug \
|
||||||
|
--hostname myhost \
|
||||||
|
--tier prod \
|
||||||
|
--nats-url nats://nats.example.com:4222 \
|
||||||
|
--nkey-file /run/secrets/listener.nkey \
|
||||||
|
--flake-url git+https://git.example.com/user/nixos-configs.git \
|
||||||
|
--metrics-enabled
|
||||||
|
```
|
||||||
|
|
||||||
|
**NixOS module:**
|
||||||
|
```nix
|
||||||
|
services.homelab-deploy.listener = {
|
||||||
|
enable = true;
|
||||||
|
tier = "prod";
|
||||||
|
natsUrl = "nats://nats.example.com:4222";
|
||||||
|
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
||||||
|
flakeUrl = "git+https://git.example.com/user/nixos-configs.git";
|
||||||
|
metrics.enable = true;
|
||||||
|
extraArgs = [ "--debug" ];
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
With debug logging enabled, the listener outputs detailed information about metrics recording:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"level":"DEBUG","msg":"recording deployment start metric","metrics_enabled":true}
|
||||||
|
{"level":"DEBUG","msg":"recording deployment end metric (success)","action":"switch","success":true,"duration_seconds":120.5}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Metrics Showing Zero
|
||||||
|
|
||||||
|
If deployment metrics remain at zero after deployments:
|
||||||
|
|
||||||
|
1. **Check metrics are enabled**: Verify `--metrics-enabled` is set and the metrics endpoint is accessible at `/metrics`
|
||||||
|
|
||||||
|
2. **Enable debug logging**: Use `--debug` to confirm metrics recording is being called
|
||||||
|
|
||||||
|
3. **Check deployment status**: Metrics are only recorded for deployments that complete (success or failure). Rejected requests (e.g., already running) increment the counter with `status="rejected"` but don't record duration
|
||||||
|
|
||||||
|
4. **Check after restart**: After a successful `switch` deployment, the listener restarts. Metrics reset to zero in the new instance. The listener waits up to 60 seconds for a Prometheus scrape before restarting to capture the final metrics
|
||||||
|
|
||||||
|
5. **Verify Prometheus scrape timing**: Ensure Prometheus scrapes frequently enough to capture metrics before the listener restarts
|
||||||
|
|
||||||
## Message Protocol
|
## Message Protocol
|
||||||
|
|
||||||
### Deploy Request
|
### Deploy Request
|
||||||
|
|||||||
Reference in New Issue
Block a user