- Mark PAM/NSS client module as complete
- Mark documentation as complete
- Update provisioning approach (declarative groups, imperative users)
- Add details on client module and verified functionality
- Update next steps
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keep base groups (admins, users, ssh-users) provisioned declaratively
but manage regular users via the kanidm CLI. This allows setting POSIX
attributes and passwords in a single workflow.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace declarative NixOS provisioning examples with full CLI workflows.
POSIX users and groups are now managed entirely via kanidm CLI, which
allows setting all attributes (including UNIX passwords) in one step.
Declarative provisioning may still be used for OIDC clients later.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document UUID-based home directories with symlinks
- List currently enabled hosts (testvm01-03)
- Add cache-invalidate command to troubleshooting
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use home_alias instead of home_attr - this creates a symlink from
/home/torjus to the actual home directory, providing a convenient
short path without breaking the underlying storage.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add troubleshooting tips discovered during testing:
- kanidm-unix status command for checking connectivity
- nscd restart required after config changes
- Direct PAM auth test with kanidm-unix auth-test
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure uid_attr_map and gid_attr_map to "name" to return short
usernames (torjus) instead of SPN format (torjus@home.2rjus.net).
This fixes "PAM user mismatch" errors with SSH authentication.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Kanidm 1.8 requires:
- version = "2" at top level
- pam_allowed_login_groups inside [kanidm] section
The NixOS module also requires pam_allowed_login_groups at top level,
so we provide it at both places.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure uid_attr_map and gid_attr_map to use short names instead of
SPN format. This fixes SSH failing with "PAM user mismatch" because
getent returned "torjus@home.2rjus.net" instead of "torjus".
Also add user-management documentation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add homelab.kanidm.enable option for central authentication via Kanidm.
The module configures:
- PAM/NSS integration with kanidm-unixd
- Client connection to auth.home.2rjus.net
- Login authorization for ssh-users group
Enable on testvm01-03 for testing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Provides compressed swap in RAM to prevent OOM kills during
nixos-rebuild on low-memory VMs (2GB). Removes duplicate zram
configs from jelly01 and nix-cache01.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Based on security review findings, covering SSH hardening, firewall
enablement, log transport TLS, security alerting, and secrets management.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed section 4 from "if needed" to always spawn auditor
- Added explicit "Do NOT query audit logs yourself" guidance
- Listed specific scenarios requiring auditor (service stopped, etc.)
- Added manual intervention as first common cause
- Updated guidelines to emphasize mandatory delegation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add new auditor agent for security-focused audit log analysis:
- SSH session tracking, command execution, sudo usage
- Suspicious activity detection patterns
- Can be used standalone or as sub-agent by investigate-alarm
Update investigate-alarm to delegate audit analysis to auditor
and add git-explorer MCP for configuration drift detection.
Add git-explorer to .mcp.json for repository inspection.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Kanidm does not expose a Prometheus /metrics endpoint.
The scrape target was causing 404 errors after the TLS
certificate issue was fixed.
Also add SSH command restriction to CLAUDE.md.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Include both auth.home.2rjus.net (CNAME) and kanidm01.home.2rjus.net
(A record) as SANs in the TLS certificate. This fixes Prometheus
scraping which connects via the hostname, not the CNAME.
Fixes: x509: certificate is valid for auth.home.2rjus.net, not kanidm01.home.2rjus.net
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add best practices for querying Loki to avoid overwhelming responses:
- Start with narrow filters and small limits
- Filter audit logs to EXECVE only
- Exclude verbose noise (PATH, PROCTITLE, SYSCALL, BPF)
- Expand queries incrementally if needed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable Linux audit to log execve syscalls from interactive SSH sessions.
Uses auid filter to exclude system services and nix builds.
Logs forwarded to journald for Loki ingestion. Query with:
{host="testvmXX"} |= "EXECVE"
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sub-agent for investigating system alarms using Prometheus metrics
and Loki logs. Provides root cause analysis with timeline of events.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move detailed Prometheus/Loki reference from CLAUDE.md to the
observability skill
- Add complete list of Prometheus jobs organized by category
- Add bootstrap log documentation with stages table
- Add kanidm01 to host labels table
- CLAUDE.md now references the skill instead of duplicating info
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the end-to-end host creation workflow including:
- Prerequisites and step-by-step process
- Tier specification (test vs prod)
- Bootstrap observability via Loki
- Verification steps
- Troubleshooting guide
- Related files reference
Update CLAUDE.md to reference the new document.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Plan for migrating from Prometheus to VictoriaMetrics on new monitoring02
host with parallel operation, declarative Grafana dashboards, and CNAME-based
cutover.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Mark completed implementation steps
- Document deployed kanidm01 configuration
- Record UID/GID range decision (65,536-69,999)
- Add verified working items (WebUI, LDAP, certs)
- Update next steps and resolved questions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set owner/group to kanidm so the post-start provisioning
script can read the idm_admin password.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New test-tier VM at 10.69.13.23 with role=auth
- Kanidm 1.8 server with HTTPS (443) and LDAPS (636)
- ACME certificate from internal CA (auth.home.2rjus.net)
- Provisioned groups: admins, users, ssh-users
- Provisioned user: torjus
- Daily backups at 22:00 (7 versions)
- Prometheus monitoring scrape target
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New VMs bootstrapped from template2 will now use the local nix cache
during initial nixos-rebuild, speeding up bootstrap times.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ns1 needs access to shared/dns/* for zone transfer key and
shared/homelab-deploy/* for the NATS listener.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Old VM had incorrect hardware-configuration.nix with hardcoded UUIDs
that didn't match actual disk layout, causing boot failure (emergency mode).
Recreated using template2-based configuration for OpenTofu provisioning.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove pgdb1 host configuration and postgres service module.
The only consumer (Open WebUI on gunter) has migrated to local PostgreSQL.
Removed:
- hosts/pgdb1/ - host configuration
- services/postgres/ - service module (only used by pgdb1)
- postgres_rules from monitoring rules
- rebuild-all.sh (obsolete script)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Only consumer was Open WebUI on gunter, which will migrate to local
PostgreSQL. Removed pgdb1 backup/migration phases and added to
decommission list.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- ns2 migrated to OpenTofu
- testvm02, testvm03 added to managed hosts
- Remove vaulttest01 (no longer exists)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure Unbound to query both ns1 and ns2 for the home.2rjus.net
zone, in addition to local NSD. This provides redundancy during
bootstrap or if local NSD is temporarily unavailable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>