docs: update Loki queries from host to hostname label
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Update all LogQL examples, agent instructions, and scripts to use the hostname label instead of host, matching the Prometheus label naming convention. Also update pipe-to-loki and bootstrap scripts to push hostname instead of host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -50,7 +50,7 @@ homelab.host.tier = "test"; # or "prod"
|
||||
During the bootstrap process, status updates are sent to Loki. Query bootstrap logs with:
|
||||
|
||||
```
|
||||
{job="bootstrap", host="<hostname>"}
|
||||
{job="bootstrap", hostname="<hostname>"}
|
||||
```
|
||||
|
||||
### Bootstrap Stages
|
||||
@@ -72,7 +72,7 @@ The bootstrap process reports these stages via the `stage` label:
|
||||
|
||||
```
|
||||
# All bootstrap activity for a host
|
||||
{job="bootstrap", host="myhost"}
|
||||
{job="bootstrap", hostname="myhost"}
|
||||
|
||||
# Track all failures
|
||||
{job="bootstrap", stage="failed"}
|
||||
@@ -87,7 +87,7 @@ Once the VM reboots with its full configuration, it will start publishing metric
|
||||
|
||||
1. Check bootstrap completed successfully:
|
||||
```
|
||||
{job="bootstrap", host="<hostname>", stage="success"}
|
||||
{job="bootstrap", hostname="<hostname>", stage="success"}
|
||||
```
|
||||
|
||||
2. Verify the host is up and reporting metrics:
|
||||
@@ -102,7 +102,7 @@ Once the VM reboots with its full configuration, it will start publishing metric
|
||||
|
||||
4. Check logs are flowing:
|
||||
```
|
||||
{host="<hostname>"}
|
||||
{hostname="<hostname>"}
|
||||
```
|
||||
|
||||
5. Confirm expected services are running and producing logs
|
||||
@@ -119,7 +119,7 @@ Once the VM reboots with its full configuration, it will start publishing metric
|
||||
|
||||
1. Check bootstrap logs in Loki - if they never progress past `building`, the rebuild likely consumed all resources:
|
||||
```
|
||||
{job="bootstrap", host="<hostname>"}
|
||||
{job="bootstrap", hostname="<hostname>"}
|
||||
```
|
||||
|
||||
2. **USER**: SSH into the host and check the bootstrap service:
|
||||
@@ -149,7 +149,7 @@ Usually caused by running the `create-host` script without proper credentials, o
|
||||
|
||||
2. Check bootstrap logs for vault-related stages:
|
||||
```
|
||||
{job="bootstrap", host="<hostname>", stage=~"vault.*"}
|
||||
{job="bootstrap", hostname="<hostname>", stage=~"vault.*"}
|
||||
```
|
||||
|
||||
3. **USER**: Regenerate and provision credentials manually:
|
||||
|
||||
@@ -86,13 +86,13 @@ These are generous limits that shouldn't affect normal operation but protect aga
|
||||
- The `varlog` scrape config uses `hostname` while journal uses `host` (different label name)
|
||||
- No `tier` or `role` labels, making it hard to filter logs by deployment tier or host function
|
||||
|
||||
**Recommendations:**
|
||||
**Implemented:** Standardized on `hostname` to match Prometheus labels. The journal scrape previously used a relabel from `__journal__hostname` to `host`; now both scrape configs use a static `hostname` label from `config.networking.hostName`. Also updated `pipe-to-loki` and bootstrap scripts to use `hostname` instead of `host`.
|
||||
|
||||
1. **Fix varlog label:** Rename `hostname` to `host` for consistency with journal scrape config
|
||||
2. **Add `tier` label:** Static label from `config.homelab.host.tier` (`test`/`prod`) on both scrape configs
|
||||
3. **Add `role` label:** Static label from `config.homelab.host.role` on both scrape configs, only when set (10 hosts have no role, so omit to keep labels clean)
|
||||
1. **Standardized label:** Both scrape configs use `hostname` (matching Prometheus) via shared `hostLabels`
|
||||
2. **Added `tier` label:** Static label from `config.homelab.host.tier` (`test`/`prod`) on both scrape configs
|
||||
3. **Added `role` label:** Static label from `config.homelab.host.role` on both scrape configs (conditionally, only when non-null)
|
||||
|
||||
No cardinality impact - `tier` and `role` are 1:1 with `host`, so they add metadata to existing streams without creating new ones.
|
||||
No cardinality impact - `tier` and `role` are 1:1 with `hostname`, so they add metadata to existing streams without creating new ones.
|
||||
|
||||
This enables queries like:
|
||||
- `{tier="prod"} |= "error"` - all errors on prod hosts
|
||||
@@ -167,10 +167,10 @@ For each service, check whether it supports a JSON log format option and whether
|
||||
1. Add `compactor` section to `services/monitoring/loki.nix`
|
||||
2. Add `limits_config` with 30-day retention and basic rate limits
|
||||
3. Update `system/monitoring/logs.nix`:
|
||||
- Fix `hostname` → `host` label in varlog scrape config
|
||||
- Add `tier` static label from `config.homelab.host.tier` to both scrape configs
|
||||
- Add `role` static label from `config.homelab.host.role` (conditionally, only when set) to both scrape configs
|
||||
- Add pipeline stages to journal scrape config: `json` to extract PRIORITY, `template` to map to level name, `labels` to attach as `level`
|
||||
- ~~Fix `hostname` → `host` label in varlog scrape config~~ Done: standardized on `hostname` (matching Prometheus)
|
||||
- ~~Add `tier` static label from `config.homelab.host.tier` to both scrape configs~~ Done
|
||||
- ~~Add `role` static label from `config.homelab.host.role` (conditionally, only when set) to both scrape configs~~ Done
|
||||
- ~~Add pipeline stages to journal scrape config: `json` to extract PRIORITY, `template` to map to level name, `labels` to attach as `level`~~ Done
|
||||
4. Deploy to monitoring01, verify compactor runs and old data gets cleaned
|
||||
5. Verify `level` label works: `{level="error"}` should return results, and match cases where `detected_level="unknown"`
|
||||
|
||||
|
||||
Reference in New Issue
Block a user