Commit Graph

694 Commits

Author SHA1 Message Date
2c9d86eaf2 vault-fetch: fix multiline secret values being truncated
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m5s
Run nix flake check / flake-check (push) Failing after 16m11s
The read-based loop split multiline values on newlines, causing only
the first line to be written. Use jq -j to write each key's value
directly to files, preserving multiline content.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 19:36:51 +01:00
ccb1c3fe2e terraform: auto-generate backup password instead of manual
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m19s
Remove backup_helper_secret variable and switch shared/backup/password
to auto_generate. New password will be added alongside existing restic
repository key.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 18:58:39 +01:00
0700033c0a secrets: migrate all hosts from sops to OpenBao vault
Replace sops-nix secrets with OpenBao vault secrets across all hosts.
Hardcode root password hash, add extractKey option to vault-secrets
module, update Terraform with secrets/policies for all hosts, and
create AppRole provisioning playbook.

Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01
Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 18:43:09 +01:00
4d33018285 docs: add ha1 memory recommendation to migration plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m28s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:48:45 +01:00
678fd3d6de docs: add systemd-exporter findings to monitoring gaps plan
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 10:19:33 +01:00
9d74aa5c04 docs: add zigbee sensor battery monitoring findings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 09:21:54 +01:00
fe80ec3576 docs: add monitoring gaps audit plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 20m32s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 03:19:20 +01:00
870fb3e532 docs: add plan for remote access to homelab services
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:53:27 +01:00
e602e8d70b docs: add plan for prometheus scrape target labels
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m7s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:36:41 +01:00
28b8d7c115 monitoring: increase high_cpu_load duration for nix-cache01 to 2h
nix-cache01 regularly hits high CPU during nix builds, causing flappy
alerts. Keep the 15m threshold for all other hosts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:28:48 +01:00
64f2688349 nix: configure gc to delete generations older than 14d
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m27s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:21:19 +01:00
09d9d71e2b docs: note to establish hostname naming conventions before migration
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:04:58 +01:00
cc799f5929 docs: note USB passthrough requirement for ha1 migration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:02:14 +01:00
0abdda8e8a docs: add plan for migrating existing hosts to opentofu
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m28s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:59:51 +01:00
4076361bf7 Merge pull request 'hosts: remove decommissioned media1, ns3, ns4, nixos-test1' (#18) from host-cleanup into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m36s
Reviewed-on: #18
2026-02-05 00:38:56 +00:00
0ef63ad874 hosts: remove decommissioned media1, ns3, ns4, nixos-test1
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m47s
Run nix flake check / flake-check (pull_request) Successful in 3m20s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:36:57 +01:00
8f29141dd1 Merge pull request 'monitoring: exclude step-ca serving cert from general expiry alert' (#17) from monitoring-cleanup into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 5m30s
Reviewed-on: #17
2026-02-05 00:22:15 +00:00
3a9a47f1ad monitoring: exclude step-ca serving cert from general expiry alert
Some checks failed
Run nix flake check / flake-check (push) Failing after 6m23s
Run nix flake check / flake-check (pull_request) Failing after 4m46s
The step-ca serving certificate is auto-renewed with a 24h lifetime,
so it always triggers the general < 86400s threshold. Exclude it and
add a dedicated step_ca_serving_cert_expiring alert at < 1h instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:12:42 +01:00
fa6380e767 monitoring: fix nix-cache_caddy scrape target TLS error
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m43s
Move nix-cache_caddy back to a manual config in prometheus.nix using the
service CNAME (nix-cache.home.2rjus.net) instead of the hostname. The
auto-generated target used nix-cache01.home.2rjus.net which doesn't
match the TLS certificate SAN.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:04:50 +01:00
86a077e152 docs: add host cleanup plan for decommissioned hosts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:04:50 +01:00
9da57c6a2f flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/e6eae2ee2110f3d31110d5c222cd395303343b08?narHash=sha256-KHFT9UWOF2yRPlAnSXQJh6uVcgNcWlFqqiAZ7OVlHNc%3D' (2026-02-03)
  → 'github:nixos/nixpkgs/bf922a59c5c9998a6584645f7d0de689512e444c?narHash=sha256-ksTL7P9QC1WfZasNlaAdLOzqD8x5EPyods69YBqxSfk%3D' (2026-02-04)
2026-02-05 00:01:37 +00:00
da9dd02d10 Merge pull request 'monitoring: auto-generate Prometheus scrape targets from host configs' (#16) from monitoring-improvements into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 6m32s
Periodic flake update / flake-update (push) Successful in 1m54s
Reviewed-on: #16
2026-02-04 23:53:46 +00:00
e7980978c7 docs: document monitoring auto-generation in CLAUDE.md
Some checks failed
Run nix flake check / flake-check (push) Failing after 5m33s
Run nix flake check / flake-check (pull_request) Successful in 6m48s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:52:39 +01:00
dd1b64de27 monitoring: auto-generate Prometheus scrape targets from host configs
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m49s
Run nix flake check / flake-check (push) Has been cancelled
Add homelab.monitoring NixOS options (enable, scrapeTargets) following
the same pattern as homelab.dns. Prometheus scrape configs are now
auto-generated from flake host configurations and external targets,
replacing hardcoded target lists.

Also cleans up alert rules: snake_case naming, fix zigbee2mqtt typo,
remove duplicate pushgateway alert, add for clauses to monitoring_rules,
remove hardcoded WireGuard public key, and add new alerts for
certificates, proxmox, caddy, smartctl temperature, filesystem
prediction, systemd state, file descriptors, and host reboots.

Fixes grafana scrape target port from 3100 to 3000.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:49:07 +01:00
4e8cc124f2 docs: add plan management workflow and lab-monitoring MCP server
Some checks failed
Run nix flake check / flake-check (push) Failing after 11m30s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:21:08 +01:00
a2a55f3955 docs: add docs directory info and nixos options improvement plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m12s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:27:11 +01:00
c38034ba41 docs: rewrite README with current infrastructure overview
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m41s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:20:49 +01:00
d7d4b0846c docs: move dns-automation plan to completed
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m17s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:13:38 +01:00
8ca7c4e402 Merge pull request 'dns-automation' (#15) from dns-automation into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m18s
Reviewed-on: #15
2026-02-04 21:02:24 +00:00
106912499b docs: add git workflow note about not committing to master
Some checks failed
Run nix flake check / flake-check (pull_request) Successful in 2m16s
Run nix flake check / flake-check (push) Failing after 17m2s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:57:40 +01:00
83af00458b dns: remove defunct external hosts
Remove hosts that no longer respond to ping:
- kube-blue1-10 (entire k8s cluster)
- virt-mini1, mpnzb, inc2, testing
- CNAMEs: rook, git (pointed to removed kube-blue nodes)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:50:56 +01:00
67d5de3eb8 docs: update CLAUDE.md for DNS automation
- Add /modules/ and /lib/ to directory structure
- Document homelab.dns options and zone auto-generation
- Update "Adding a New Host" workflow (no manual zone editing)
- Expand DNS Architecture section with auto-generation details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:45:16 +01:00
cee1b264cd dns: auto-generate zone entries from host configurations
Replace static zone file with dynamically generated records:
- Add homelab.dns module with enable/cnames options
- Extract IPs from systemd.network configs (filters VPN interfaces)
- Use git commit timestamp as zone serial number
- Move external hosts to separate external-hosts.nix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:43:44 +01:00
4ceee04308 docs: update MCP config for nixpkgs-options and add nixpkgs-packages
Some checks failed
Run nix flake check / flake-check (push) Failing after 14m50s
Rename nixos-options to nixpkgs-options and add new nixpkgs-packages
server for package search functionality. Update CLAUDE.md to document
both MCP servers and their available tools.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 20:50:36 +01:00
e3ced5bcda flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/41e216c0ca66c83b12ab7a98cc326b5db01db646?narHash=sha256-I7Lmgj3owOTBGuauy9FL6qdpeK2umDoe07lM4V%2BPnyA%3D' (2026-01-31)
  → 'github:nixos/nixpkgs/e576e3c9cf9bad747afcddd9e34f51d18c855b4e?narHash=sha256-tlFqNG/uzz2%2B%2BaAmn4v8J0vAkV3z7XngeIIB3rM3650%3D' (2026-02-03)
• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/cb369ef2efd432b3cdf8622b0ffc0a97a02f3137?narHash=sha256-VKS4ZLNx4PNrABoB0L8KUpc1fE7CLpQXQs985tGfaCU%3D' (2026-02-02)
  → 'github:nixos/nixpkgs/e6eae2ee2110f3d31110d5c222cd395303343b08?narHash=sha256-KHFT9UWOF2yRPlAnSXQJh6uVcgNcWlFqqiAZ7OVlHNc%3D' (2026-02-03)
• Updated input 'sops-nix':
    'github:Mic92/sops-nix/1e89149dcfc229e7e2ae24a8030f124a31e4f24f?narHash=sha256-twBMKGQvaztZQxFxbZnkg7y/50BW9yjtCBWwdjtOZew%3D' (2026-02-01)
  → 'github:Mic92/sops-nix/17eea6f3816ba6568b8c81db8a4e6ca438b30b7c?narHash=sha256-ktjWTq%2BD5MTXQcL9N6cDZXUf9kX8JBLLBLT0ZyOTSYY%3D' (2026-02-03)
2026-02-04 00:01:04 +00:00
15459870cd Merge pull request 'backup: migrate to native services.restic.backups' (#14) from migrate-to-native-restic-backups into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 4m4s
Periodic flake update / flake-update (push) Successful in 1m10s
Reviewed-on: #14
2026-02-03 23:47:11 +00:00
d1861eefb5 docs: add clipboard note and update flake inputs
Some checks failed
Run nix flake check / flake-check (push) Successful in 4m10s
Run nix flake check / flake-check (pull_request) Failing after 18m29s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:45:37 +01:00
d25fc99e1d backup: migrate to native services.restic.backups
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Run nix flake check / flake-check (pull_request) Successful in 4m0s
Replace custom backup-helper flake input with NixOS native
services.restic.backups module for ha1, monitoring01, and nixos-test1.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:41:40 +01:00
b5da9431aa docs: add nixos-options MCP configuration
Some checks failed
Run nix flake check / flake-check (push) Failing after 13m51s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:01:00 +01:00
0e5dea635e Merge pull request 'create-host: add delete feature' (#13) from create-host-delete-feature into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
Reviewed-on: #13
2026-02-03 12:06:32 +00:00
86249c466b create-host: add delete feature
Some checks failed
Run nix flake check / flake-check (push) Failing after 21m31s
Run nix flake check / flake-check (pull_request) Failing after 15m17s
2026-02-03 12:11:41 +01:00
5d560267cf Merge pull request 'pki-migration' (#12) from pki-migration into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m21s
Reviewed-on: #12
2026-02-03 05:56:53 +00:00
63662b89e0 docs: update TODO.md
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m51s
Run nix flake check / flake-check (pull_request) Successful in 2m53s
2026-02-03 06:53:59 +01:00
7ae474fd3e pki: add new vault root ca to pki 2026-02-03 06:53:59 +01:00
f0525b5c74 ns: add vaulttest01 to zone
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m19s
2026-02-03 06:42:05 +01:00
42c391b355 ns: add vault cname to zone
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m7s
2026-02-03 06:00:59 +01:00
048536ba70 docs: move dns automation from TODO.md to nixos-improvements.md
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
2026-02-03 04:51:27 +01:00
cccce09406 Merge pull request 'vault: implement bootstrap integration' (#11) from vault-bootstrap-integration into master
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Reviewed-on: #11
2026-02-03 03:46:25 +00:00
01d4812280 vault: implement bootstrap integration
Some checks failed
Run nix flake check / flake-check (push) Successful in 2m31s
Run nix flake check / flake-check (pull_request) Failing after 14m16s
2026-02-03 01:10:36 +01:00
b5364d2ccc flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs-unstable':
    'github:nixos/nixpkgs/62c8382960464ceb98ea593cb8321a2cf8f9e3e5?narHash=sha256-kKB3bqYJU5nzYeIROI82Ef9VtTbu4uA3YydSk/Bioa8%3D' (2026-01-30)
  → 'github:nixos/nixpkgs/cb369ef2efd432b3cdf8622b0ffc0a97a02f3137?narHash=sha256-VKS4ZLNx4PNrABoB0L8KUpc1fE7CLpQXQs985tGfaCU%3D' (2026-02-02)
2026-02-03 00:01:39 +00:00