nixos-servers

Author	SHA1	Message	Date
Torjus Håkestad	d99c82c74c	kanidm: fix service ordering for vault secret Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Ensure vault-secret-kanidm-idm-admin runs before kanidm.service by adding services dependency. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 00:21:11 +01:00
Torjus Håkestad	ca0e3fd629	kanidm01: add kanidm authentication server Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - New test-tier VM at 10.69.13.23 with role=auth - Kanidm 1.8 server with HTTPS (443) and LDAPS (636) - ACME certificate from internal CA (auth.home.2rjus.net) - Provisioned groups: admins, users, ssh-users - Provisioned user: torjus - Daily backups at 22:00 (7 versions) - Prometheus monitoring scrape target Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 00:13:59 +01:00
Torjus Håkestad	732e9b8c22	docs: move bootstrap-cache plan to completed Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:41:05 +01:00
Torjus Håkestad	3a14ffd6b5	template2: add nix cache configuration Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details New VMs bootstrapped from template2 will now use the local nix cache during initial nixos-rebuild, speeding up bootstrap times. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:40:53 +01:00
Torjus Håkestad	f9a3961457	docs: move ns1-recreation plan to completed Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:35:04 +01:00
Torjus Håkestad	003d4ccf03	docs: mark ns1 migration to OpenTofu as complete Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:34:44 +01:00
Torjus Håkestad	735b8a9ee3	terraform: add dns and homelab-deploy secrets to ns1 policy Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details ns1 needs access to shared/dns/* for zone transfer key and shared/homelab-deploy/* for the NATS listener. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:33:36 +01:00
Torjus Håkestad	94feae82a0	ns1: recreate with OpenTofu workflow Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Old VM had incorrect hardware-configuration.nix with hardcoded UUIDs that didn't match actual disk layout, causing boot failure (emergency mode). Recreated using template2-based configuration for OpenTofu provisioning. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 23:18:08 +01:00
Torjus Håkestad	3f94f7ee95	docs: update pgdb1 decommission progress Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:55:55 +01:00
Torjus Håkestad	b7e398c9a7	terraform: remove pgdb1 vault approle Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:55:39 +01:00
Torjus Håkestad	8ec2a083bd	pgdb1: decommission postgresql host Remove pgdb1 host configuration and postgres service module. The only consumer (Open WebUI on gunter) has migrated to local PostgreSQL. Removed: - hosts/pgdb1/ - host configuration - services/postgres/ - service module (only used by pgdb1) - postgres_rules from monitoring rules - rebuild-all.sh (obsolete script) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:54:50 +01:00
Torjus Håkestad	ec4ac1477e	docs: mark pgdb1 for decommissioning instead of migration Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Only consumer was Open WebUI on gunter, which will migrate to local PostgreSQL. Removed pgdb1 backup/migration phases and added to decommission list. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:49:53 +01:00
Torjus Håkestad	e937c68965	docs: mark auth01, ca, and sops-nix removal as complete Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - auth01 host and services (authelia, lldap) already removed - ca host and services already removed (PKI migrated to OpenBao) - sops-nix fully removed (secrets/, .sops.yaml gone) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:33:18 +01:00
Torjus Håkestad	98e808cd6c	docs: mark jump host decommissioning as complete Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:31:14 +01:00
Torjus Håkestad	ba9f47f914	jump: remove unused host configuration Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Host was decommissioned and not in flake.nix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:30:37 +01:00
Torjus Håkestad	1066e81ba8	docs: update opentofu migration plan with current state Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - ns2 migrated to OpenTofu - testvm02, testvm03 added to managed hosts - Remove vaulttest01 (no longer exists) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 22:26:49 +01:00
Torjus Håkestad	f0950b33de	docs: add plan for nix-cache01 reprovision Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 20:34:52 +01:00
Torjus Håkestad	bf199bd7c6	ns/resolver: add redundant stub-zone addresses Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Configure Unbound to query both ns1 and ns2 for the home.2rjus.net zone, in addition to local NSD. This provides redundancy during bootstrap or if local NSD is temporarily unavailable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 20:10:17 +01:00
Torjus Håkestad	4e8ecb8a99	Merge pull request 'migrate-ns2-opentofu' (#33 ) from migrate-ns2-opentofu into master Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Reviewed-on: #33	2026-02-07 19:07:32 +00:00
Torjus Håkestad	38c104ea8c	docs: add plan for configuring template2 with nix cache Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Run nix flake check / flake-check (pull_request) Failing after 1s Details Bootstrap times can be improved by configuring the base template to use the local nix cache during initial builds. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 20:06:55 +01:00
Torjus Håkestad	536daee4c7	ns2: migrate to OpenTofu management Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - Remove hosts/template/ (legacy template1) and give each legacy host its own hardware-configuration.nix copy - Recreate ns2 using create-host with template2 base - Add secondary DNS services (NSD + Unbound resolver) - Configure Vault policy for shared DNS secrets - Fix create-host IP uniqueness validator to check CIDR notation (prevents false positives from DNS resolver entries) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 19:28:35 +01:00
Torjus Håkestad	4c1debf0a3	Merge pull request 'decommission-ca-host' (#32 ) from decommission-ca-host into master Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Reviewed-on: #32	2026-02-07 17:50:44 +00:00
Torjus Håkestad	f36457ee0d	cleanup: remove legacy secrets directory and move TODO.md to completed plans Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Run nix flake check / flake-check (pull_request) Failing after 1s Details - Remove secrets/ directory (sops-nix no longer in use, all hosts use Vault) - Move TODO.md to docs/plans/completed/automated-host-deployment-pipeline.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 18:49:31 +01:00
Torjus Håkestad	aedccbd9a0	flake: remove sops-nix (no longer used) Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details All secrets are now managed by OpenBao (Vault). Remove the legacy sops-nix infrastructure that is no longer in use. Removed: - sops-nix flake input - system/sops.nix module - .sops.yaml configuration file - Age key generation from template prepare-host scripts Updated: - flake.nix - removed sops-nix references from all hosts - flake.lock - removed sops-nix input - scripts/create-host/ - removed sops references - CLAUDE.md - removed SOPS documentation Note: secrets/ directory should be manually removed by the user. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 18:46:24 +01:00
Torjus Håkestad	bdc6057689	hosts: decommission ca host and remove labmon Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Remove the step-ca host and labmon flake input now that ACME has been migrated to OpenBao PKI. Removed: - hosts/ca/ - step-ca host configuration - services/ca/ - step-ca service module - labmon flake input and module (no longer used) Updated: - flake.nix - removed ca host and labmon references - flake.lock - removed labmon input - rebuild-all.sh - removed ca from host list - CLAUDE.md - updated documentation Note: secrets/ca/ should be manually removed by the user. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 18:41:49 +01:00
Torjus Håkestad	3a25e3f7bc	Merge pull request 'migrate-to-openbao-pki' (#31 ) from migrate-to-openbao-pki into master Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Reviewed-on: #31	2026-02-07 17:33:46 +00:00
Torjus Håkestad	46f03871f1	docs: update CLAUDE.md for PR creation and labmon removal Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Run nix flake check / flake-check (pull_request) Failing after 1s Details - Add note that gh pr create is not supported - Remove labmon from Prometheus job names list - Remove labmon from flake inputs list Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 18:32:27 +01:00
Torjus Håkestad	9d019f2b9a	testvm01: add nginx with ACME certificate for PKI testing Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Set up a simple nginx server with an ACME certificate from the new OpenBao PKI infrastructure. This allows testing the ACME migration before deploying to production hosts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 18:22:28 +01:00
Torjus Håkestad	21db7e9573	acme: migrate from step-ca to OpenBao PKI Switch all ACME certificate issuance from step-ca (ca.home.2rjus.net) to OpenBao PKI (vault.home.2rjus.net:8200/v1/pki_int/acme/directory). - Update default ACME server in system/acme.nix - Update Caddy acme_ca in http-proxy and nix-cache services - Remove labmon service from monitoring01 (step-ca monitoring) - Remove labmon scrape target and certificate_rules alerts - Remove alloy.nix (only used for labmon profiling) - Add docs/plans/cert-monitoring.md for future cert monitoring needs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 18:20:10 +01:00
Torjus Håkestad	979040aaf7	vault01: enable homelab-deploy listener Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Enable vault.enable and homelab.deploy.enable on vault01 so it can receive NATS-based remote deployments. Vault fetches secrets from itself using AppRole after auto-unseal. Add systemd ordering to ensure vault-secret services wait for openbao to be unsealed before attempting to fetch secrets. Also adds vault01 AppRole entry to Terraform. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:55:09 +01:00
Torjus Håkestad	8791c29402	hosts: enable homelab-deploy listener on pgdb1, nats1, jelly01 Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Enable vault.enable and homelab.deploy.enable for these hosts to allow NATS-based remote deployments and expose metrics on port 9972. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:43:06 +01:00
Torjus Håkestad	c7a067d7b3	flake: update homelab-deploy input Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:31:24 +01:00
Torjus Håkestad	c518093578	docs: move prometheus-scrape-target-labels plan to completed Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:29:31 +01:00
Torjus Håkestad	0b462f0a96	Merge pull request 'prometheus-scrape-target-labels' (#30 ) from prometheus-scrape-target-labels into master Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Reviewed-on: #30	2026-02-07 16:27:38 +00:00
Torjus Håkestad	116abf3bec	CLAUDE.md: document homelab-deploy CLI for prod hosts Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Run nix flake check / flake-check (pull_request) Failing after 1s Details Add instructions for deploying to prod hosts using the CLI directly, since the MCP server only handles test-tier deployments. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:23:10 +01:00
Torjus Håkestad	b794aa89db	skills: update observability with new target labels Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Document the new hostname and host metadata labels available on all Prometheus scrape targets: - hostname: short hostname for easy filtering - role: host role (dns, build-host, vault) - tier: deployment tier (test for test VMs) - dns_role: primary/secondary for DNS servers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:12:17 +01:00
Torjus Håkestad	50a85daa44	docs: update plan with hostname label documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:09:46 +01:00
Torjus Håkestad	23e561cf49	monitoring: add hostname label to all scrape targets Add a `hostname` label to all Prometheus scrape targets, making it easy to query all metrics for a host without wildcarding the instance label. Example queries: - {hostname="ns1"} - all metrics from ns1 - node_cpu_seconds_total{hostname="monitoring01"} - specific metric For external targets (like gunter), the hostname is extracted from the target string. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:09:19 +01:00
Torjus Håkestad	7d291f85bf	monitoring: propagate host labels to Prometheus scrape targets Extract homelab.host metadata (tier, priority, role, labels) from host configurations and propagate them to Prometheus scrape targets. This enables semantic alert filtering using labels instead of hardcoded instance names. Changes: - lib/monitoring.nix: Extract host metadata, group targets by labels - prometheus.nix: Use structured static_configs with labels - rules.yml: Replace instance filters with role-based filters Example labels in Prometheus: - ns1/ns2: role=dns, dns_role=primary/secondary - nix-cache01: role=build-host - testvm*: tier=test Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 17:04:50 +01:00
Torjus Håkestad	2a842c655a	docs: update plan status and move completed nats-deploy plan Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - Move nats-deploy-service.md to completed/ folder - Update prometheus-scrape-target-labels.md with implementation status - Add status table showing which steps are complete/partial/not started - Update cross-references to point to new location Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 16:44:00 +01:00
Torjus Håkestad	1f4a5571dc	CLAUDE.md: update documentation from audit Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details - Fix OpenBao CLI name (bao, not vault) - Add vault01, testvm01-03 to hosts list - Document nixos-exporter and homelab-deploy flake inputs - Add vault/ and actions-runner/ services - Document homelab.host and homelab.deploy options - Document automatic Vault credential provisioning via wrapped tokens - Consolidate homelab module options into dedicated section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 16:37:38 +01:00
Torjus Håkestad	13d6d0ea3a	Merge pull request 'improve-bootstrap-visibility' (#29 ) from improve-bootstrap-visibility into master Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Reviewed-on: #29	2026-02-07 15:00:09 +00:00
Torjus Håkestad	eea000b337	CLAUDE.md: document bootstrap logs in Loki Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Run nix flake check / flake-check (pull_request) Failing after 4s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:57:51 +01:00
Torjus Håkestad	f19ba2f4b6	CLAUDE.md: use tofu -chdir instead of cd Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:41:59 +01:00
Torjus Håkestad	a90d9c33d5	CLAUDE.md: prefer nix develop -c for devshell commands Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:39:56 +01:00
Torjus Håkestad	09c9df1bbe	terraform: regenerate wrapped token for testvm01 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:36:25 +01:00
Torjus Håkestad	ae3039af19	template2: send bootstrap status to Loki for remote monitoring Adds log_to_loki function that pushes structured log entries to Loki at key bootstrap stages (starting, network_ok, vault_*, building, success, failed). Enables querying bootstrap state via LogQL without console access. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:34:47 +01:00
Torjus Håkestad	11261c4636	template2: revert to journal+console output for bootstrap Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details TTY output was causing nixos-rebuild to fail. Keep the custom greeting line to indicate bootstrap image, but use journal+console for reliable logging. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:24:39 +01:00
Torjus Håkestad	4ca3c8890f	terraform: add flake_branch and token for testvm01 Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:14:57 +01:00
Torjus Håkestad	78e8d7a600	template2: add ncurses for clear command in bootstrap Some checks failed Run nix flake check / flake-check (push) Failing after 1s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-07 15:10:25 +01:00

1 2 3 4 5 ...

825 Commits