nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	a6013d3950	monitoring02: enable alerting and migrate CNAMEs from http-proxy Some checks failed Run nix flake check / flake-check (push) Failing after 6m25s Details Run nix flake check / flake-check (pull_request) Failing after 3m52s Details - Switch vmalert from blackhole mode to sending alerts to local Alertmanager - Import alerttonotify service so alerts route to NATS notifications - Move alertmanager and grafana CNAMEs from http-proxy to monitoring02 - Add monitoring CNAME to monitoring02 - Add Caddy reverse proxy entries for alertmanager and grafana - Remove prometheus, alertmanager, and grafana Caddy entries from http-proxy (now served directly by monitoring02) - Move monitoring02 Vault AppRole to hosts-generated.tf with extra_policies support and prometheus-metrics policy - Update Promtail to use authenticated loki.home.2rjus.net endpoint only (remove unauthenticated monitoring01 client) - Update pipe-to-loki and bootstrap to use loki.home.2rjus.net with basic auth from Vault secret - Move migration plan to completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:23:21 +01:00
Torjus Håkestad	87d8571d62	promtail: fix vault secret ownership for loki auth Some checks failed Run nix flake check / flake-check (push) Failing after 12m24s Details The secret file needs to be owned by promtail since Promtail runs as a dedicated user and can't read root-owned files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:17:02 +01:00
Torjus Håkestad	c13921d302	loki: add basic auth for log push and dual-ship promtail Some checks failed Run nix flake check / flake-check (push) Failing after 4m36s Details - Loki bound to localhost, Caddy reverse proxy with basic_auth - Vault secret (shared/loki/push-auth) for password, bcrypt hash generated at boot for Caddy environment - Promtail dual-ships to monitoring01 (direct) and loki.home.2rjus.net (with basic auth), conditional on vault.enable - Terraform: new shared loki-push policy added to all AppRoles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:00:08 +01:00
Torjus Håkestad	7b804450a3	promtail: add hostname/tier/role labels and journal priority level mapping Align Promtail labels with Prometheus by adding hostname, tier, and role static labels to both journal and varlog scrape configs. Add pipeline stages to map journal PRIORITY field to a level label for reliable severity filtering across the fleet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 23:40:14 +01:00
Torjus Håkestad	4091e51f41	nixos-exporter: use nkeySeedFile option Some checks failed Run nix flake check / flake-check (push) Failing after 4m26s Details Use the new nkeySeedFile option instead of credentialsFile for NATS authentication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:34:22 +01:00
Torjus Håkestad	4efc798c38	nixos-exporter: fix nkey file permissions All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Set owner/group to nixos-exporter so the service can read the NATS credentials file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:18:10 +01:00
Torjus Håkestad	60c04a2052	nixos-exporter: enable NATS cache sharing Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m17s Details Run nix flake check / flake-check (push) Failing after 5m16s Details When one host fetches the latest flake revision, it publishes to NATS and all other hosts receive the update immediately. This reduces redundant nix flake metadata calls across the fleet. - Add nkeys to devshell for key generation - Add nixos-exporter user to NATS HOMELAB account - Add Vault secret for NKey storage - Configure all hosts to use NATS for revision sharing - Update nixos-exporter input to version with NATS support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 23:57:28 +01:00
Torjus Håkestad	97ff774d3f	monitoring: add nixos-exporter to all hosts All checks were successful Run nix flake check / flake-check (push) Successful in 3m16s Details Run nix flake check / flake-check (pull_request) Successful in 3m14s Details Add nixos-exporter prometheus exporter to track NixOS generation metrics and flake revision status across all hosts. Changes: - Add nixos-exporter flake input - Add commonModules list in flake.nix for modules shared by all hosts - Enable nixos-exporter in system/monitoring/metrics.nix - Configure Prometheus to scrape nixos-exporter on all hosts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 23:55:29 +01:00
Torjus Håkestad	1f5b7b13e2	monitoring: enable restart-count and ip-accounting collectors All checks were successful Run nix flake check / flake-check (push) Successful in 2m11s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 21:30:47 +01:00
Torjus Håkestad	c53e36c3f3	Revert "monitoring: enable additional systemd-exporter collectors" This reverts commit `04a252b857`.	2026-02-06 21:30:05 +01:00
Torjus Håkestad	04a252b857	monitoring: enable additional systemd-exporter collectors Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Enables restart-count, file-descriptor-size, and ip-accounting collectors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 21:28:44 +01:00
Torjus Håkestad	5d26f52e0d	Revert "monitoring: enable cpu, memory, io collectors for systemd-exporter" This reverts commit `506a692548`.	2026-02-06 21:26:20 +01:00
Torjus Håkestad	506a692548	monitoring: enable cpu, memory, io collectors for systemd-exporter Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 21:23:19 +01:00
Torjus Håkestad	3cccfc0487	monitoring: implement monitoring gaps coverage Some checks failed Run nix flake check / flake-check (push) Failing after 7m36s Details Add exporters and scrape targets for services lacking monitoring: - PostgreSQL: postgres-exporter on pgdb1 - Authelia: native telemetry metrics on auth01 - Unbound: unbound-exporter with remote-control on ns1/ns2 - NATS: HTTP monitoring endpoint on nats1 - OpenBao: telemetry config and Prometheus scrape with token auth - Systemd: systemd-exporter on all hosts for per-service metrics Add alert rules for postgres, auth (authelia + lldap), jellyfin, vault (openbao), plus extend existing nats and unbound rules. Add Terraform config for Prometheus metrics policy and token. The token is created via vault_token resource and stored in KV, so no manual token creation is needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 21:44:13 +01:00
Torjus Håkestad	4d2fbff6d0	Fix error in journald config Some checks failed Run nix flake check / flake-check (push) Failing after 4m0s Details	2025-02-07 13:22:50 +01:00
Torjus Håkestad	f29edfe34a	Configure journald storage Some checks failed Run nix flake check / flake-check (push) Failing after 34s Details	2025-02-07 13:21:43 +01:00
Torjus Håkestad	e366a05204	Fix caddy logging Some checks failed Run nix flake check / flake-check (push) Failing after 9m1s Details Periodic flake update / flake-update (push) Successful in 1m36s Details	2025-01-28 00:49:22 +01:00
Torjus Håkestad	8545807dd8	Add job label to promtail journald logs Some checks failed Run nix flake check / flake-check (push) Failing after 4m51s Details	2025-01-23 19:50:25 +01:00
Torjus Håkestad	02ef7e861b	Add qemu guest agent to all VMs	2024-12-05 18:35:06 +01:00
Torjus Håkestad	a4592ffda3	Improve monitoring stuff Some checks failed Run nix flake check / flake-check (push) Failing after 23m19s Details	2024-12-01 20:51:14 +01:00
Torjus Håkestad	32425807fc	Add promtail for journal Some checks failed Run nix flake check / flake-check (push) Failing after 7m47s Details	2024-12-01 03:00:07 +01:00

21 Commits