docs: update Loki queries from host to hostname label
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Update all LogQL examples, agent instructions, and scripts to use the hostname label instead of host, matching the Prometheus label naming convention. Also update pipe-to-loki and bootstrap scripts to push hostname instead of host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -30,11 +30,13 @@ Use the `lab-monitoring` MCP server tools:
|
||||
### Label Reference
|
||||
|
||||
Available labels for log queries:
|
||||
- `host` - Hostname (e.g., `ns1`, `monitoring01`, `ha1`)
|
||||
- `hostname` - Hostname (e.g., `ns1`, `monitoring01`, `ha1`) - matches the Prometheus `hostname` label
|
||||
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `nixos-upgrade.service`)
|
||||
- `job` - Either `systemd-journal` (most logs), `varlog` (file-based logs), or `bootstrap` (VM bootstrap logs)
|
||||
- `filename` - For `varlog` job, the log file path
|
||||
- `hostname` - Alternative to `host` for some streams
|
||||
- `tier` - Deployment tier (`test` or `prod`)
|
||||
- `role` - Host role (e.g., `dns`, `vault`, `monitoring`) - matches the Prometheus `role` label
|
||||
- `level` - Log level mapped from journal PRIORITY (`critical`, `error`, `warning`, `notice`, `info`, `debug`) - journal scrape only
|
||||
|
||||
### Log Format
|
||||
|
||||
@@ -47,12 +49,12 @@ Journal logs are JSON-formatted. Key fields:
|
||||
|
||||
**Logs from a specific service on a host:**
|
||||
```logql
|
||||
{host="ns1", systemd_unit="nsd.service"}
|
||||
{hostname="ns1", systemd_unit="nsd.service"}
|
||||
```
|
||||
|
||||
**All logs from a host:**
|
||||
```logql
|
||||
{host="monitoring01"}
|
||||
{hostname="monitoring01"}
|
||||
```
|
||||
|
||||
**Logs from a service across all hosts:**
|
||||
@@ -62,12 +64,12 @@ Journal logs are JSON-formatted. Key fields:
|
||||
|
||||
**Substring matching (case-sensitive):**
|
||||
```logql
|
||||
{host="ha1"} |= "error"
|
||||
{hostname="ha1"} |= "error"
|
||||
```
|
||||
|
||||
**Exclude pattern:**
|
||||
```logql
|
||||
{host="ns1"} != "routine"
|
||||
{hostname="ns1"} != "routine"
|
||||
```
|
||||
|
||||
**Regex matching:**
|
||||
@@ -75,6 +77,20 @@ Journal logs are JSON-formatted. Key fields:
|
||||
{systemd_unit="prometheus.service"} |~ "scrape.*failed"
|
||||
```
|
||||
|
||||
**Filter by level (journal scrape only):**
|
||||
```logql
|
||||
{level="error"} # All errors across the fleet
|
||||
{level=~"critical|error", tier="prod"} # Prod errors and criticals
|
||||
{hostname="ns1", level="warning"} # Warnings from a specific host
|
||||
```
|
||||
|
||||
**Filter by tier/role:**
|
||||
```logql
|
||||
{tier="prod"} |= "error" # All errors on prod hosts
|
||||
{role="dns"} # All DNS server logs
|
||||
{tier="test", job="systemd-journal"} # Journal logs from test hosts
|
||||
```
|
||||
|
||||
**File-based logs (caddy access logs, etc):**
|
||||
```logql
|
||||
{job="varlog", hostname="nix-cache01"}
|
||||
@@ -106,7 +122,7 @@ Useful systemd units for troubleshooting:
|
||||
|
||||
VMs provisioned from template2 send bootstrap progress directly to Loki via curl (before promtail is available). These logs use `job="bootstrap"` with additional labels:
|
||||
|
||||
- `host` - Target hostname
|
||||
- `hostname` - Target hostname
|
||||
- `branch` - Git branch being deployed
|
||||
- `stage` - Bootstrap stage (see table below)
|
||||
|
||||
@@ -127,7 +143,7 @@ VMs provisioned from template2 send bootstrap progress directly to Loki via curl
|
||||
|
||||
```logql
|
||||
{job="bootstrap"} # All bootstrap logs
|
||||
{job="bootstrap", host="myhost"} # Specific host
|
||||
{job="bootstrap", hostname="myhost"} # Specific host
|
||||
{job="bootstrap", stage="failed"} # All failures
|
||||
{job="bootstrap", stage=~"building|success"} # Track build progress
|
||||
```
|
||||
@@ -308,8 +324,8 @@ Current host labels:
|
||||
|
||||
1. Check `up{job="<service>"}` or `up{hostname="<host>"}` for scrape failures
|
||||
2. Use `list_targets` to see target health details
|
||||
3. Query service logs: `{host="<host>", systemd_unit="<service>.service"}`
|
||||
4. Search for errors: `{host="<host>"} |= "error"`
|
||||
3. Query service logs: `{hostname="<host>", systemd_unit="<service>.service"}`
|
||||
4. Search for errors: `{hostname="<host>"} |= "error"`
|
||||
5. Check `list_alerts` for related alerts
|
||||
6. Use role filters for group issues: `up{role="dns"}` to check all DNS servers
|
||||
|
||||
@@ -324,17 +340,17 @@ Current host labels:
|
||||
|
||||
When provisioning new VMs, track bootstrap progress:
|
||||
|
||||
1. Watch bootstrap logs: `{job="bootstrap", host="<hostname>"}`
|
||||
2. Check for failures: `{job="bootstrap", host="<hostname>", stage="failed"}`
|
||||
1. Watch bootstrap logs: `{job="bootstrap", hostname="<hostname>"}`
|
||||
2. Check for failures: `{job="bootstrap", hostname="<hostname>", stage="failed"}`
|
||||
3. After success, verify host appears in metrics: `up{hostname="<hostname>"}`
|
||||
4. Check logs are flowing: `{host="<hostname>"}`
|
||||
4. Check logs are flowing: `{hostname="<hostname>"}`
|
||||
|
||||
See [docs/host-creation.md](../../../docs/host-creation.md) for the full host creation pipeline.
|
||||
|
||||
### Debug SSH/Access Issues
|
||||
|
||||
```logql
|
||||
{host="<host>", systemd_unit="sshd.service"}
|
||||
{hostname="<host>", systemd_unit="sshd.service"}
|
||||
```
|
||||
|
||||
### Check Recent Upgrades
|
||||
|
||||
Reference in New Issue
Block a user