nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	7ff3d2a09b	docs: move openbao-kanidm-oidc plan to completed All checks were successful Run nix flake check / flake-check (push) Successful in 2m7s Details	2026-02-09 19:44:06 +01:00
Torjus Håkestad	e85f15b73d	vault: add OpenBao OIDC integration with Kanidm All checks were successful Run nix flake check / flake-check (push) Successful in 2m9s Details Enable Kanidm users to authenticate to OpenBao via OIDC for Web UI access. Members of the admins group get full read/write access to secrets. Changes: - Add OIDC auth backend in Terraform (oidc.tf) - Add oidc-admin and oidc-default policies - Add openbao OAuth2 client to Kanidm - Enable legacy crypto (RS256) for OpenBao compatibility - Allow imperative group membership management in Kanidm Limitations: - CLI login not supported (Kanidm requires HTTPS for confidential client redirects) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 19:42:26 +01:00
Torjus Håkestad	2f5a2a4bf1	grafana: use instant queries for fleet dashboard stat panels All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Prevents stat panels from being affected by dashboard time range selection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 19:00:33 +01:00
Torjus Håkestad	287141c623	hosts: add role metadata to all hosts Some checks failed Run nix flake check / flake-check (push) Failing after 13m51s Details Assign roles to hosts for better organization and filtering: - ha1: home-automation - monitoring01, monitoring02: monitoring - jelly01: media - nats1: messaging - http-proxy: proxy - testvm01-03: test Also promote kanidm01 and monitoring02 from test to prod tier. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 16:21:08 +01:00
Torjus Håkestad	9ed11b712f	home-assistant: fix Jinja2 battery template syntax All checks were successful Run nix flake check / flake-check (push) Successful in 2m13s Details The template used \| min(100) \| max(0) which is invalid Jinja2 syntax. These filters expect iterables (lists), not scalar arguments. This caused TypeError warnings on every MQTT message and left battery sensors unavailable. Fixed by using proper list-based min/max: [[[value, 100] \| min, 0] \| max Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 16:12:59 +01:00
Torjus Håkestad	ffad2dd205	monitoring: increase zigbee_sensor_stale threshold to 4 hours The 2-hour threshold was too aggressive for temperature sensors in stable environments. Historical data shows gaps up to 2.75 hours when temperature hasn't changed (Home Assistant only updates last_updated when values change). Increasing to 4 hours avoids false positives while still catching genuine failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 16:10:54 +01:00
Torjus Håkestad	ed7d2aa727	grafana: add deployment metrics to nixos-fleet dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 15:58:28 +01:00
Torjus Håkestad	bf7a025364	flake: update homelab-deploy input Some checks failed Run nix flake check / flake-check (push) Failing after 3m49s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 15:45:30 +01:00
torjus-bot	4ae99dbc89	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:nixos/nixpkgs/e576e3c9cf9bad747afcddd9e34f51d18c855b4e?narHash=sha256-tlFqNG/uzz2%2B%2BaAmn4v8J0vAkV3z7XngeIIB3rM3650%3D' (2026-02-03) → 'github:nixos/nixpkgs/23d72dabcb3b12469f57b37170fcbc1789bd7457?narHash=sha256-z5NJPSBwsLf/OfD8WTmh79tlSU8XgIbwmk6qB1/TFzY%3D' (2026-02-07) • Updated input 'nixpkgs-unstable': 'github:nixos/nixpkgs/00c21e4c93d963c50d4c0c89bfa84ed6e0694df2?narHash=sha256-AYqlWrX09%2BHvGs8zM6ebZ1pwUqjkfpnv8mewYwAo%2BiM%3D' (2026-02-04) → 'github:nixos/nixpkgs/d6c71932130818840fc8fe9509cf50be8c64634f?narHash=sha256-ub1gpAONMFsT/GU2hV6ZWJjur8rJ6kKxdm9IlCT0j84%3D' (2026-02-08)	2026-02-09 00:01:58 +00:00
Torjus Håkestad	5c142b1323	flake: update homelab-deploy input Some checks failed Run nix flake check / flake-check (push) Failing after 10m7s Details Periodic flake update / flake-update (push) Successful in 2m51s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:42:51 +01:00
Torjus Håkestad	4091e51f41	nixos-exporter: use nkeySeedFile option Some checks failed Run nix flake check / flake-check (push) Failing after 4m26s Details Use the new nkeySeedFile option instead of credentialsFile for NATS authentication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:34:22 +01:00
Torjus Håkestad	a8e558a6b7	flake: update nixos-exporter input Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:32:56 +01:00
Torjus Håkestad	4efc798c38	nixos-exporter: fix nkey file permissions All checks were successful Run nix flake check / flake-check (push) Successful in 2m6s Details Set owner/group to nixos-exporter so the service can read the NATS credentials file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:18:10 +01:00
Torjus Håkestad	016f8c9119	terraform: add nixos-exporter shared policy Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details - Create shared policy granting all hosts access to nixos-exporter nkey - Add policy to both manual and generated host AppRoles - Remove duplicate kanidm01/monitoring02 entries from hosts-generated.tf Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 00:04:17 +01:00
Torjus Håkestad	fec2a261ab	Merge pull request 'nixos-exporter: enable NATS cache sharing' (#38 ) from nixos-exporter-nats-cache into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m18s Details Reviewed-on: #38	2026-02-08 22:58:24 +00:00
Torjus Håkestad	60c04a2052	nixos-exporter: enable NATS cache sharing Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m17s Details Run nix flake check / flake-check (push) Failing after 5m16s Details When one host fetches the latest flake revision, it publishes to NATS and all other hosts receive the update immediately. This reduces redundant nix flake metadata calls across the fleet. - Add nkeys to devshell for key generation - Add nixos-exporter user to NATS HOMELAB account - Add Vault secret for NKey storage - Configure all hosts to use NATS for revision sharing - Update nixos-exporter input to version with NATS support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 23:57:28 +01:00
Torjus Håkestad	39e3f37263	flake: update homelab-deploy input Some checks failed Run nix flake check / flake-check (push) Failing after 15m17s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 22:49:44 +01:00
Torjus Håkestad	a2d93baba8	Merge pull request 'grafana: add NixOS operations dashboard' (#37 ) from grafana-nixos-operations-dashboard into master All checks were successful Run nix flake check / flake-check (push) Successful in 3m54s Details Reviewed-on: #37	2026-02-08 21:04:19 +00:00
Torjus Håkestad	f66dfc753c	grafana: add NixOS operations dashboard All checks were successful Run nix flake check / flake-check (push) Successful in 3m24s Details Run nix flake check / flake-check (pull_request) Successful in 4m5s Details Loki-based dashboard for tracking NixOS operations including: - Upgrade activity and success/failure stats - Build activity during upgrades - Bootstrap logs for new VM deployments - ACME certificate renewal activity Log panels use LogQL json parsing with \| keep host to show clean messages with host labels. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 22:03:28 +01:00
Torjus Håkestad	79a6a72719	Merge pull request 'grafana-dashboards-permissions' (#36 ) from grafana-dashboards-permissions into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m4s Details Reviewed-on: #36	2026-02-08 20:18:22 +00:00
Torjus Håkestad	89d0a6f358	grafana: add systemd services dashboard Some checks failed Run nix flake check / flake-check (push) Failing after 8m30s Details Run nix flake check / flake-check (pull_request) Failing after 16m49s Details Dashboard for monitoring systemd across the fleet: - Summary stats: failed/active/inactive units, restarts, timers - Failed units table (shows any units in failed state) - Service restarts table (top 15 services by restart count) - Active units per host bar chart - NixOS upgrade timer table with last trigger time - Backup timers table (restic jobs) - Service restarts over time chart - Hostname filter to focus on specific hosts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 21:06:59 +01:00
Torjus Håkestad	03ebee4d82	grafana: fix proxmox table __name__ column All checks were successful Run nix flake check / flake-check (push) Successful in 2m9s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 21:04:41 +01:00
Torjus Håkestad	05630eb4d4	grafana: add Proxmox dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Dashboard for monitoring Proxmox VMs: - Summary stats: VMs running/stopped, node CPU/memory, uptime - VM status table with name, status, CPU%, memory%, uptime - VM CPU usage over time - VM memory usage over time - Network traffic (RX/TX) per VM - Disk I/O (read/write) per VM - Storage usage gauges and capacity table - VM filter to focus on specific VMs Filters out template VMs, shows only actual guests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 21:02:28 +01:00
Torjus Håkestad	1e52eec02a	monitoring: always include tier label in scrape configs All checks were successful Run nix flake check / flake-check (push) Successful in 2m8s Details Previously tier was only included if non-default (not "prod"), which meant prod hosts had no tier label. This made the Grafana tier filter only show "test" since "prod" never appeared in label_values(). Now tier is always included, so both "prod" and "test" appear in the fleet dashboard tier selector. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:58:52 +01:00
Torjus Håkestad	d333aa0164	grafana: fix fleet table __name__ columns All checks were successful Run nix flake check / flake-check (push) Successful in 2m5s Details Exclude the __name__ columns that were leaking through the table transformations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:52:39 +01:00
Torjus Håkestad	a5d5827dcc	grafana: add NixOS fleet dashboard Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Dashboard for monitoring NixOS deployments across the homelab: - Hosts behind remote / needing reboot stat panels - Fleet status table with revision, behind status, reboot needed, age - Generation age bar chart (shows stale configs) - Generations per host bar chart - Deployment activity time series (see when hosts were updated) - Flake input ages table - Pie charts for hosts by revision and tier - Tier filter variable Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:50:08 +01:00
Torjus Håkestad	1c13ec12a4	grafana: add temperature dashboard All checks were successful Run nix flake check / flake-check (push) Successful in 2m5s Details Dashboard includes: - Current temperatures per room (stat panel) - Average home temperature (gauge) - Current humidity (stat panel) - 30-day temperature history with mean/min/max in legend - Temperature trend (rate of change per hour) - 24h min/max/avg table per room - 30-day humidity history Filters out device_temperature (internal sensor) metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:45:52 +01:00
Torjus Håkestad	4bf0eeeadb	grafana: add dashboards and fix permissions All checks were successful Run nix flake check / flake-check (push) Successful in 2m3s Details - Change default OIDC role from Viewer to Editor for Explore access - Add declarative dashboard provisioning - Add node-exporter dashboard (CPU, memory, disk, load, network, I/O) - Add Loki logs dashboard with host/job filters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:39:21 +01:00
Torjus Håkestad	304cb117ce	Merge pull request 'grafana-kanidm-oidc' (#35 ) from grafana-kanidm-oidc into master All checks were successful Run nix flake check / flake-check (push) Successful in 2m7s Details Reviewed-on: #35	2026-02-08 19:30:20 +00:00
Torjus Håkestad	02270a0e4a	docs: update plans with Grafana OIDC progress Some checks failed Run nix flake check / flake-check (pull_request) Successful in 2m7s Details Run nix flake check / flake-check (push) Failing after 16m31s Details - auth-system-replacement.md: Mark OAuth2 client (Grafana) as completed, document key findings (PKCE, attribute paths, user requirements) - monitoring-migration-victoriametrics.md: Note Grafana deployment on monitoring02 with Kanidm OIDC as test instance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:28:10 +01:00
Torjus Håkestad	030e8518c5	grafana: add Grafana on monitoring02 with Kanidm OIDC Some checks failed Run nix flake check / flake-check (push) Failing after 4m3s Details Deploy Grafana test instance on monitoring02 with: - Kanidm OIDC authentication (admins -> Admin role, others -> Viewer) - PKCE enabled for secure OAuth2 flow (required by Kanidm) - Declarative datasources for Prometheus and Loki on monitoring01 - Local Caddy for TLS termination via internal ACME CA - DNS CNAME grafana-test.home.2rjus.net Terraform changes add OAuth2 client secret and AppRole policies for kanidm01 and monitoring02. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 20:23:26 +01:00
Torjus Håkestad	9ffdd4f862	terraform: increase monitoring02 disk to 60G Some checks failed Run nix flake check / flake-check (push) Failing after 11m8s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 19:23:40 +01:00
Torjus Håkestad	0b977808ca	hosts: add monitoring02 configuration Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details New test-tier host for monitoring stack expansion with: - Static IP 10.69.13.24 - 4 CPU cores, 4GB RAM, 20GB disk - Vault integration and NATS-based deployment enabled Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 19:19:38 +01:00
Torjus Håkestad	8786113f8f	docs: add OpenBao + Kanidm OIDC integration plan Some checks failed Run nix flake check / flake-check (push) Failing after 3m10s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:45:44 +01:00
Torjus Håkestad	fdb2c31f84	docs: add pipe-to-loki documentation to CLAUDE.md All checks were successful Run nix flake check / flake-check (push) Successful in 2m1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:34:01 +01:00
Torjus Håkestad	78eb04205f	system: add pipe-to-loki helper script Some checks failed Run nix flake check / flake-check (push) Has been cancelled Details Adds a system-wide script for sending command output or interactive sessions to Loki for easy sharing with Claude. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:30:53 +01:00
Torjus Håkestad	19cb61ebbc	Merge pull request 'kanidm-pam-client' (#34 ) from kanidm-pam-client into master All checks were successful Run nix flake check / flake-check (push) Successful in 3m19s Details Reviewed-on: #34	2026-02-08 14:14:53 +00:00
Torjus Håkestad	9ed09c9a9c	docs: add user-management documentation All checks were successful Run nix flake check / flake-check (pull_request) Successful in 3m33s Details Run nix flake check / flake-check (push) Successful in 2m0s Details - CLI workflows for creating users and groups - Troubleshooting guide (nscd, cache invalidation) - Home directory behavior (UUID-based with symlinks) - Update auth-system-replacement plan with progress Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:14:21 +01:00
Torjus Håkestad	b31c64f1b9	kanidm: remove declarative user provisioning Keep base groups (admins, users, ssh-users) provisioned declaratively but manage regular users via the kanidm CLI. This allows setting POSIX attributes and passwords in a single workflow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:14:03 +01:00
Torjus Håkestad	54b6e37420	flake: add kanidm to devshell Add kanidm_1_8 CLI for administering the Kanidm server. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:12:19 +01:00
Torjus Håkestad	b845a8bb8b	system: add kanidm PAM/NSS client module Add homelab.kanidm.enable option for central authentication via Kanidm. The module configures: - PAM/NSS integration with kanidm-unixd - Client connection to auth.home.2rjus.net - Login authorization for ssh-users group Enable on testvm01-03 for testing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:12:19 +01:00
Torjus Håkestad	bfbf0cea68	template2: enable zram for bootstrap Some checks failed Run nix flake check / flake-check (push) Failing after 3m34s Details Prevents OOM during initial nixos-rebuild on 2GB VMs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 13:34:08 +01:00
Torjus Håkestad	3abe5e83a7	docs: add memory ballooning as fallback option All checks were successful Run nix flake check / flake-check (push) Successful in 2m5s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 13:29:42 +01:00
Torjus Håkestad	67c27555f3	docs: add memory issues follow-up plan All checks were successful Run nix flake check / flake-check (push) Successful in 2m2s Details Track zram change effectiveness for OOM prevention during upgrades. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 13:26:31 +01:00
Torjus Håkestad	1674b6a844	system: enable zram swap for all hosts Some checks failed Run nix flake check / flake-check (push) Failing after 12m6s Details Provides compressed swap in RAM to prevent OOM kills during nixos-rebuild on low-memory VMs (2GB). Removes duplicate zram configs from jelly01 and nix-cache01. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 13:02:58 +01:00
Torjus Håkestad	311be282b6	docs: add security hardening plan Some checks failed Run nix flake check / flake-check (push) Failing after 2s Details Based on security review findings, covering SSH hardening, firewall enablement, log transport TLS, security alerting, and secrets management. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 05:26:15 +01:00
Torjus Håkestad	11cbb64097	claude: make auditor delegation explicit in investigate-alarm Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - Changed section 4 from "if needed" to always spawn auditor - Added explicit "Do NOT query audit logs yourself" guidance - Listed specific scenarios requiring auditor (service stopped, etc.) - Added manual intervention as first common cause - Updated guidelines to emphasize mandatory delegation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 05:11:09 +01:00
Torjus Håkestad	e2dd21c994	claude: add auditor agent and git-explorer MCP Add new auditor agent for security-focused audit log analysis: - SSH session tracking, command execution, sudo usage - Suspicious activity detection patterns - Can be used standalone or as sub-agent by investigate-alarm Update investigate-alarm to delegate audit analysis to auditor and add git-explorer MCP for configuration drift detection. Add git-explorer to .mcp.json for repository inspection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 04:48:55 +01:00
Torjus Håkestad	463342133e	kanidm: remove non-functional metrics scrape target All checks were successful Run nix flake check / flake-check (push) Successful in 1m56s Details Kanidm does not expose a Prometheus /metrics endpoint. The scrape target was causing 404 errors after the TLS certificate issue was fixed. Also add SSH command restriction to CLAUDE.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 03:34:12 +01:00
Torjus Håkestad	de36b9d016	kanidm: add hostname SAN to ACME certificate Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Include both auth.home.2rjus.net (CNAME) and kanidm01.home.2rjus.net (A record) as SANs in the TLS certificate. This fixes Prometheus scraping which connects via the hostname, not the CNAME. Fixes: x509: certificate is valid for auth.home.2rjus.net, not kanidm01.home.2rjus.net Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 03:29:54 +01:00

... 2 3 4 5 6 ...

1033 Commits