Commit Graph

27 Commits

Author SHA1 Message Date
787c14c7a6 docs: add dns_role label to scrape target labels plan
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m3s
Add proposed dns_role label to distinguish primary/secondary DNS
resolvers. This addresses the unbound_low_cache_hit_ratio alert
firing on ns2, which has a cold cache due to low traffic.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 01:23:34 +01:00
3dc4422ba0 docs: add NAS integration notes to auth plan
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
Document TrueNAS CORE LDAP integration approach (NFS-only) and
future NixOS NAS migration path with native Kanidm PAM/NSS.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 23:24:37 +01:00
f0963624bc docs: add auth system replacement plan
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Evaluate options for replacing LLDAP+Authelia with a unified auth solution.
Recommends Kanidm for its native NixOS PAM/NSS integration and built-in OIDC.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 23:18:38 +01:00
32968147b5 docs: move zigbee battery plan to completed
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m17s
Run nix flake check / flake-check (pull_request) Successful in 2m19s
Updated plan with:
- Full device inventory from ha1
- Backup verification details
- Branch and commit references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:49:49 +01:00
c515a6b4e1 home-assistant: fix zigbee sensor battery reporting
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
WSDCGQ12LM sensors report battery: 0 due to firmware quirk. Override
battery calculation using voltage via homeassistant value_template.

Also adds zigbee_sensor_stale alert for detecting dead sensors regardless
of battery reporting accuracy (1 hour threshold).

Device configuration moved from external devices.yaml to inline NixOS
config for declarative management.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:41:07 +01:00
3cccfc0487 monitoring: implement monitoring gaps coverage
Some checks failed
Run nix flake check / flake-check (push) Failing after 7m36s
Add exporters and scrape targets for services lacking monitoring:
- PostgreSQL: postgres-exporter on pgdb1
- Authelia: native telemetry metrics on auth01
- Unbound: unbound-exporter with remote-control on ns1/ns2
- NATS: HTTP monitoring endpoint on nats1
- OpenBao: telemetry config and Prometheus scrape with token auth
- Systemd: systemd-exporter on all hosts for per-service metrics

Add alert rules for postgres, auth (authelia + lldap), jellyfin,
vault (openbao), plus extend existing nats and unbound rules.

Add Terraform config for Prometheus metrics policy and token. The
token is created via vault_token resource and stored in KV, so no
manual token creation is needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 21:44:13 +01:00
7d92c55d37 docs: update for sops-to-openbao migration completion
Some checks failed
Run nix flake check / flake-check (push) Failing after 18m17s
Update CLAUDE.md and README.md to reflect that secrets are now managed
by OpenBao, with sops only remaining for ca. Update migration plans
with sops cleanup checklist and auth01 decommission.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 20:06:21 +01:00
6d117d68ca docs: move sops-to-openbao migration plan to completed
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 19:45:42 +01:00
0700033c0a secrets: migrate all hosts from sops to OpenBao vault
Replace sops-nix secrets with OpenBao vault secrets across all hosts.
Hardcode root password hash, add extractKey option to vault-secrets
module, update Terraform with secrets/policies for all hosts, and
create AppRole provisioning playbook.

Hosts migrated: ha1, monitoring01, ns1, ns2, http-proxy, nix-cache01
Wave 1 hosts (nats1, jelly01, pgdb1) get AppRole policies only.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 18:43:09 +01:00
4d33018285 docs: add ha1 memory recommendation to migration plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m28s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:48:45 +01:00
678fd3d6de docs: add systemd-exporter findings to monitoring gaps plan
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 10:19:33 +01:00
9d74aa5c04 docs: add zigbee sensor battery monitoring findings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 09:21:54 +01:00
fe80ec3576 docs: add monitoring gaps audit plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 20m32s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 03:19:20 +01:00
870fb3e532 docs: add plan for remote access to homelab services
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m4s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:53:27 +01:00
e602e8d70b docs: add plan for prometheus scrape target labels
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m7s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:36:41 +01:00
09d9d71e2b docs: note to establish hostname naming conventions before migration
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:04:58 +01:00
cc799f5929 docs: note USB passthrough requirement for ha1 migration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:02:14 +01:00
0abdda8e8a docs: add plan for migrating existing hosts to opentofu
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m28s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:59:51 +01:00
0ef63ad874 hosts: remove decommissioned media1, ns3, ns4, nixos-test1
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m47s
Run nix flake check / flake-check (pull_request) Successful in 3m20s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:36:57 +01:00
86a077e152 docs: add host cleanup plan for decommissioned hosts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 01:04:50 +01:00
a2a55f3955 docs: add docs directory info and nixos options improvement plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 4m12s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:27:11 +01:00
d7d4b0846c docs: move dns-automation plan to completed
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m17s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:13:38 +01:00
048536ba70 docs: move dns automation from TODO.md to nixos-improvements.md
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m20s
2026-02-03 04:51:27 +01:00
01d4812280 vault: implement bootstrap integration
Some checks failed
Run nix flake check / flake-check (push) Successful in 2m31s
Run nix flake check / flake-check (pull_request) Failing after 14m16s
2026-02-03 01:10:36 +01:00
7fc69c40a6 docs: add truenas-migration plan
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m18s
Periodic flake update / flake-update (push) Successful in 1m13s
2026-02-02 18:29:11 +01:00
34a2f2ab50 docs: add infrastructure documentation
Some checks failed
Run nix flake check / flake-check (push) Failing after 11m9s
2026-02-02 17:36:55 +01:00
c694b9889a vault: add auto-unseal
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m16s
2026-02-02 00:28:24 +01:00