Remove monitoring01 host configuration and unused service modules
(prometheus, grafana, loki, tempo, pyroscope). Migrate blackbox,
exportarr, and pve exporters to monitoring02 with scrape configs
moved to VictoriaMetrics. Update alert rules, terraform vault
policies/secrets, http-proxy entries, and documentation to reflect
the monitoring02 migration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch vmalert from blackhole mode to sending alerts to local
Alertmanager
- Import alerttonotify service so alerts route to NATS notifications
- Move alertmanager and grafana CNAMEs from http-proxy to monitoring02
- Add monitoring CNAME to monitoring02
- Add Caddy reverse proxy entries for alertmanager and grafana
- Remove prometheus, alertmanager, and grafana Caddy entries from
http-proxy (now served directly by monitoring02)
- Move monitoring02 Vault AppRole to hosts-generated.tf with
extra_policies support and prometheus-metrics policy
- Update Promtail to use authenticated loki.home.2rjus.net endpoint
only (remove unauthenticated monitoring01 client)
- Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with
basic auth from Vault secret
- Move migration plan to completed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add standalone Loki service module (services/loki/) with same config as
monitoring01 and import it on monitoring02. Update Grafana Loki datasource
to localhost. Defer Tempo and Pyroscope migration (not actively used).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add metrics.home.2rjus.net and vmalert.home.2rjus.net CNAMEs with
Caddy TLS termination via internal ACME CA.
Refactors Grafana's Caddy config from configFile to globalConfig +
virtualHosts so both modules can contribute routes to the same
Caddy instance.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set up the core metrics stack on monitoring02 as Phase 2 of the
monitoring migration. VictoriaMetrics replaces Prometheus with
identical scrape configs (22 jobs including auto-generated targets).
- VictoriaMetrics with 3-month retention and all scrape configs
- vmalert evaluating existing rules.yml (notifier disabled)
- Alertmanager with same routing config (no alerts during parallel op)
- Grafana datasources updated: local VictoriaMetrics as default
- Static user override for credential file access (OpenBao, Apiary)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assign roles to hosts for better organization and filtering:
- ha1: home-automation
- monitoring01, monitoring02: monitoring
- jelly01: media
- nats1: messaging
- http-proxy: proxy
- testvm01-03: test
Also promote kanidm01 and monitoring02 from test to prod tier.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deploy Grafana test instance on monitoring02 with:
- Kanidm OIDC authentication (admins -> Admin role, others -> Viewer)
- PKCE enabled for secure OAuth2 flow (required by Kanidm)
- Declarative datasources for Prometheus and Loki on monitoring01
- Local Caddy for TLS termination via internal ACME CA
- DNS CNAME grafana-test.home.2rjus.net
Terraform changes add OAuth2 client secret and AppRole policies for
kanidm01 and monitoring02.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New test-tier host for monitoring stack expansion with:
- Static IP 10.69.13.24
- 4 CPU cores, 4GB RAM, 20GB disk
- Vault integration and NATS-based deployment enabled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>