nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	aff4473a2b	monitoring02: enable alerting and migrate CNAMEs from http-proxy Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details - Switch vmalert from blackhole mode to sending alerts to local Alertmanager - Import alerttonotify service so alerts route to NATS notifications - Move alertmanager and grafana CNAMEs from http-proxy to monitoring02 - Add monitoring CNAME to monitoring02 - Add Caddy reverse proxy entries for alertmanager and grafana Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:03:09 +01:00
Torjus Håkestad	74e7c9faa4	monitoring02: add Loki service Some checks failed Run nix flake check / flake-check (push) Failing after 3m19s Details Add standalone Loki service module (services/loki/) with same config as monitoring01 and import it on monitoring02. Update Grafana Loki datasource to localhost. Defer Tempo and Pyroscope migration (not actively used). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 19:42:19 +01:00
Torjus Håkestad	e329f87b0b	monitoring02: add VictoriaMetrics, vmalert, and Alertmanager Set up the core metrics stack on monitoring02 as Phase 2 of the monitoring migration. VictoriaMetrics replaces Prometheus with identical scrape configs (22 jobs including auto-generated targets). - VictoriaMetrics with 3-month retention and all scrape configs - vmalert evaluating existing rules.yml (notifier disabled) - Alertmanager with same routing config (no alerts during parallel op) - Grafana datasources updated: local VictoriaMetrics as default - Static user override for credential file access (OpenBao, Apiary) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 00:55:08 +01:00
Torjus Håkestad	030e8518c5	grafana: add Grafana on monitoring02 with Kanidm OIDC Some checks failed Run nix flake check / flake-check (push) Failing after 4m3s Details Deploy Grafana test instance on monitoring02 with: - Kanidm OIDC authentication (admins -> Admin role, others -> Viewer) - PKCE enabled for secure OAuth2 flow (required by Kanidm) - Declarative datasources for Prometheus and Loki on monitoring01 - Local Caddy for TLS termination via internal ACME CA - DNS CNAME grafana-test.home.2rjus.net Terraform changes add OAuth2 client secret and AppRole policies for kanidm01 and monitoring02. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:23:26 +01:00
Torjus Håkestad	0b977808ca	hosts: add monitoring02 configuration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details New test-tier host for monitoring stack expansion with: - Static IP 10.69.13.24 - 4 CPU cores, 4GB RAM, 20GB disk - Vault integration and NATS-based deployment enabled Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 19:19:38 +01:00

5 Commits