monitoring01: remove host and migrate services to monitoring02
Remove monitoring01 host configuration and unused service modules (prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox, exportarr, and pve exporters to monitoring02 with scrape configs moved to VictoriaMetrics. Update alert rules, terraform vault policies/secrets, http-proxy entries, and documentation to reflect the monitoring02 migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -30,7 +30,7 @@ Use the `lab-monitoring` MCP server tools:
|
||||
### Label Reference
|
||||
|
||||
Available labels for log queries:
|
||||
- `hostname` - Hostname (e.g., `ns1`, `monitoring01`, `ha1`) - matches the Prometheus `hostname` label
|
||||
- `hostname` - Hostname (e.g., `ns1`, `monitoring02`, `ha1`) - matches the Prometheus `hostname` label
|
||||
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `nixos-upgrade.service`)
|
||||
- `job` - Either `systemd-journal` (most logs), `varlog` (file-based logs), or `bootstrap` (VM bootstrap logs)
|
||||
- `filename` - For `varlog` job, the log file path
|
||||
@@ -54,7 +54,7 @@ Journal logs are JSON-formatted. Key fields:
|
||||
|
||||
**All logs from a host:**
|
||||
```logql
|
||||
{hostname="monitoring01"}
|
||||
{hostname="monitoring02"}
|
||||
```
|
||||
|
||||
**Logs from a service across all hosts:**
|
||||
@@ -74,7 +74,7 @@ Journal logs are JSON-formatted. Key fields:
|
||||
|
||||
**Regex matching:**
|
||||
```logql
|
||||
{systemd_unit="prometheus.service"} |~ "scrape.*failed"
|
||||
{systemd_unit="victoriametrics.service"} |~ "scrape.*failed"
|
||||
```
|
||||
|
||||
**Filter by level (journal scrape only):**
|
||||
@@ -109,7 +109,7 @@ Default lookback is 1 hour. Use `start` parameter for older logs:
|
||||
Useful systemd units for troubleshooting:
|
||||
- `nixos-upgrade.service` - Daily auto-upgrade logs
|
||||
- `nsd.service` - DNS server (ns1/ns2)
|
||||
- `prometheus.service` - Metrics collection
|
||||
- `victoriametrics.service` - Metrics collection
|
||||
- `loki.service` - Log aggregation
|
||||
- `caddy.service` - Reverse proxy
|
||||
- `home-assistant.service` - Home automation
|
||||
@@ -152,7 +152,7 @@ VMs provisioned from template2 send bootstrap progress directly to Loki via curl
|
||||
|
||||
Parse JSON and filter on fields:
|
||||
```logql
|
||||
{systemd_unit="prometheus.service"} | json | PRIORITY="3"
|
||||
{systemd_unit="victoriametrics.service"} | json | PRIORITY="3"
|
||||
```
|
||||
|
||||
---
|
||||
@@ -242,12 +242,11 @@ All available Prometheus job names:
|
||||
- `unbound` - DNS resolver metrics (ns1, ns2)
|
||||
- `wireguard` - VPN tunnel metrics (http-proxy)
|
||||
|
||||
**Monitoring stack (localhost on monitoring01):**
|
||||
- `prometheus` - Prometheus self-metrics
|
||||
**Monitoring stack (localhost on monitoring02):**
|
||||
- `victoriametrics` - VictoriaMetrics self-metrics
|
||||
- `loki` - Loki self-metrics
|
||||
- `grafana` - Grafana self-metrics
|
||||
- `alertmanager` - Alertmanager metrics
|
||||
- `pushgateway` - Push-based metrics gateway
|
||||
|
||||
**External/infrastructure:**
|
||||
- `pve-exporter` - Proxmox hypervisor metrics
|
||||
@@ -262,7 +261,7 @@ All scrape targets have these labels:
|
||||
**Standard labels:**
|
||||
- `instance` - Full target address (`<hostname>.home.2rjus.net:<port>`)
|
||||
- `job` - Job name (e.g., `node-exporter`, `unbound`, `nixos-exporter`)
|
||||
- `hostname` - Short hostname (e.g., `ns1`, `monitoring01`) - use this for host filtering
|
||||
- `hostname` - Short hostname (e.g., `ns1`, `monitoring02`) - use this for host filtering
|
||||
|
||||
**Host metadata labels** (when configured in `homelab.host`):
|
||||
- `role` - Host role (e.g., `dns`, `build-host`, `vault`)
|
||||
@@ -275,7 +274,7 @@ Use the `hostname` label for easy host filtering across all jobs:
|
||||
|
||||
```promql
|
||||
{hostname="ns1"} # All metrics from ns1
|
||||
node_load1{hostname="monitoring01"} # Specific metric by hostname
|
||||
node_load1{hostname="monitoring02"} # Specific metric by hostname
|
||||
up{hostname="ha1"} # Check if ha1 is up
|
||||
```
|
||||
|
||||
@@ -283,10 +282,10 @@ This is simpler than wildcarding the `instance` label:
|
||||
|
||||
```promql
|
||||
# Old way (still works but verbose)
|
||||
up{instance=~"monitoring01.*"}
|
||||
up{instance=~"monitoring02.*"}
|
||||
|
||||
# New way (preferred)
|
||||
up{hostname="monitoring01"}
|
||||
up{hostname="monitoring02"}
|
||||
```
|
||||
|
||||
### Filtering by Role/Tier
|
||||
|
||||
Reference in New Issue
Block a user