Compare commits
9 Commits
11cbb64097
...
kanidm-pam
| Author | SHA1 | Date | |
|---|---|---|---|
|
9ed09c9a9c
|
|||
|
b31c64f1b9
|
|||
|
54b6e37420
|
|||
|
b845a8bb8b
|
|||
|
bfbf0cea68
|
|||
|
3abe5e83a7
|
|||
|
67c27555f3
|
|||
|
1674b6a844
|
|||
|
311be282b6
|
@@ -66,9 +66,9 @@ This future migration path is a strong argument for Kanidm over LDAP-only soluti
|
||||
- Vault integration for idm_admin password
|
||||
- LDAPS on port 636
|
||||
|
||||
2. **Configure declarative provisioning** ✅
|
||||
- Groups: `admins`, `users`, `ssh-users`
|
||||
- User: `torjus` (member of all groups)
|
||||
2. **Configure provisioning** ✅
|
||||
- Groups provisioned declaratively: `admins`, `users`, `ssh-users`
|
||||
- Users managed imperatively via CLI (allows setting POSIX passwords in one step)
|
||||
- POSIX attributes enabled (UID/GID range 65,536-69,999)
|
||||
|
||||
3. **Test NAS integration** (in progress)
|
||||
@@ -80,14 +80,16 @@ This future migration path is a strong argument for Kanidm over LDAP-only soluti
|
||||
- Grafana
|
||||
- Other services as needed
|
||||
|
||||
5. **Create client module** in `system/` for PAM/NSS
|
||||
- Enable on all hosts that need central auth
|
||||
- Configure trusted CA
|
||||
5. **Create client module** in `system/` for PAM/NSS ✅
|
||||
- Module: `system/kanidm-client.nix`
|
||||
- `homelab.kanidm.enable = true` enables PAM/NSS
|
||||
- Short usernames (not SPN format)
|
||||
- Home directory symlinks via `home_alias`
|
||||
- Enabled on test tier: testvm01, testvm02, testvm03
|
||||
|
||||
6. **Documentation**
|
||||
- User management procedures
|
||||
- Adding new OAuth2 clients
|
||||
- Troubleshooting PAM/NSS issues
|
||||
6. **Documentation** ✅
|
||||
- `docs/user-management.md` - CLI workflows, troubleshooting
|
||||
- User/group creation procedures verified working
|
||||
|
||||
## Progress
|
||||
|
||||
@@ -106,14 +108,37 @@ This future migration path is a strong argument for Kanidm over LDAP-only soluti
|
||||
- Prometheus monitoring scrape target configured
|
||||
|
||||
**Provisioned entities:**
|
||||
- Groups: `admins`, `users`, `ssh-users`
|
||||
- User: `torjus` (member of all groups, POSIX enabled with GID 65536)
|
||||
- Groups: `admins`, `users`, `ssh-users` (declarative)
|
||||
- Users managed via CLI (imperative)
|
||||
|
||||
**Verified working:**
|
||||
- WebUI login with idm_admin
|
||||
- LDAP bind and search with POSIX-enabled user
|
||||
- LDAPS with valid internal CA certificate
|
||||
|
||||
### Completed (2026-02-08) - PAM/NSS Client
|
||||
|
||||
**Client module deployed (`system/kanidm-client.nix`):**
|
||||
- `homelab.kanidm.enable = true` enables PAM/NSS integration
|
||||
- Connects to auth.home.2rjus.net
|
||||
- Short usernames (`torjus` instead of `torjus@home.2rjus.net`)
|
||||
- Home directory symlinks (`/home/torjus` → UUID-based dir)
|
||||
- Login restricted to `ssh-users` group
|
||||
|
||||
**Enabled on test tier:**
|
||||
- testvm01, testvm02, testvm03
|
||||
|
||||
**Verified working:**
|
||||
- User/group resolution via `getent`
|
||||
- SSH login with Kanidm unix passwords
|
||||
- Home directory creation with symlinks
|
||||
- Imperative user/group creation via CLI
|
||||
|
||||
**Documentation:**
|
||||
- `docs/user-management.md` with full CLI workflows
|
||||
- Password requirements (min 10 chars)
|
||||
- Troubleshooting guide (nscd, cache invalidation)
|
||||
|
||||
### UID/GID Range (Resolved)
|
||||
|
||||
**Range: 65,536 - 69,999** (manually allocated)
|
||||
@@ -128,10 +153,9 @@ Rationale:
|
||||
|
||||
### Next Steps
|
||||
|
||||
1. Deploy to monitoring01 to enable Prometheus scraping
|
||||
1. Enable PAM/NSS on production hosts (after test tier validation)
|
||||
2. Configure TrueNAS LDAP client for NAS integration testing
|
||||
3. Add OAuth2 clients (Grafana first)
|
||||
4. Create PAM/NSS client module for other hosts
|
||||
|
||||
## References
|
||||
|
||||
|
||||
116
docs/plans/memory-issues-follow-up.md
Normal file
116
docs/plans/memory-issues-follow-up.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# Memory Issues Follow-up
|
||||
|
||||
Tracking the zram change to verify it resolves OOM issues during nixos-upgrade on low-memory hosts.
|
||||
|
||||
## Background
|
||||
|
||||
On 2026-02-08, ns2 (2GB RAM) experienced an OOM kill during nixos-upgrade. The Nix evaluation process consumed ~1.6GB before being killed by the kernel. ns1 (manually increased to 4GB) succeeded with the same upgrade.
|
||||
|
||||
Root cause: 2GB RAM is insufficient for Nix flake evaluation without swap.
|
||||
|
||||
## Fix Applied
|
||||
|
||||
**Commit:** `1674b6a` - system: enable zram swap for all hosts
|
||||
|
||||
**Merged:** 2026-02-08 ~12:15 UTC
|
||||
|
||||
**Change:** Added `zramSwap.enable = true` to `system/zram.nix`, providing ~2GB compressed swap on all hosts.
|
||||
|
||||
## Timeline
|
||||
|
||||
| Time (UTC) | Event |
|
||||
|------------|-------|
|
||||
| 05:00:46 | ns2 nixos-upgrade OOM killed |
|
||||
| 05:01:47 | `nixos_upgrade_failed` alert fired |
|
||||
| 12:15 | zram commit merged to master |
|
||||
| 12:19 | ns2 rebooted with zram enabled |
|
||||
| 12:20 | ns1 rebooted (memory reduced to 2GB via tofu) |
|
||||
|
||||
## Hosts Affected
|
||||
|
||||
All 2GB VMs that run nixos-upgrade:
|
||||
- ns1, ns2 (DNS)
|
||||
- vault01
|
||||
- testvm01, testvm02, testvm03
|
||||
- kanidm01
|
||||
|
||||
## Metrics to Monitor
|
||||
|
||||
Check these in Grafana or via PromQL to verify the fix:
|
||||
|
||||
### Swap availability (should be ~2GB after upgrade)
|
||||
```promql
|
||||
node_memory_SwapTotal_bytes / 1024 / 1024
|
||||
```
|
||||
|
||||
### Swap usage during upgrades
|
||||
```promql
|
||||
(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / 1024 / 1024
|
||||
```
|
||||
|
||||
### Zswap compressed bytes (active compression)
|
||||
```promql
|
||||
node_memory_Zswap_bytes / 1024 / 1024
|
||||
```
|
||||
|
||||
### Upgrade failures (should be 0)
|
||||
```promql
|
||||
node_systemd_unit_state{name="nixos-upgrade.service", state="failed"}
|
||||
```
|
||||
|
||||
### Memory available during upgrades
|
||||
```promql
|
||||
node_memory_MemAvailable_bytes / 1024 / 1024
|
||||
```
|
||||
|
||||
## Verification Steps
|
||||
|
||||
After a few days (allow auto-upgrades to run on all hosts):
|
||||
|
||||
1. Check all hosts have swap enabled:
|
||||
```promql
|
||||
node_memory_SwapTotal_bytes > 0
|
||||
```
|
||||
|
||||
2. Check for any upgrade failures since the fix:
|
||||
```promql
|
||||
count_over_time(ALERTS{alertname="nixos_upgrade_failed"}[7d])
|
||||
```
|
||||
|
||||
3. Review if any hosts used swap during upgrades (check historical graphs)
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- No `nixos_upgrade_failed` alerts due to OOM after 2026-02-08
|
||||
- All hosts show ~2GB swap available
|
||||
- Upgrades complete successfully on 2GB VMs
|
||||
|
||||
## Fallback Options
|
||||
|
||||
If zram is insufficient:
|
||||
|
||||
1. **Increase VM memory** - Update `terraform/vms.tf` to 4GB for affected hosts
|
||||
2. **Enable memory ballooning** - Configure VMs with dynamic memory allocation (see below)
|
||||
3. **Use remote builds** - Configure `nix.buildMachines` to offload evaluation
|
||||
4. **Reduce flake size** - Split configurations to reduce evaluation memory
|
||||
|
||||
### Memory Ballooning
|
||||
|
||||
Proxmox supports memory ballooning, which allows VMs to dynamically grow/shrink memory allocation based on demand. The balloon driver inside the guest communicates with the hypervisor to release or reclaim memory pages.
|
||||
|
||||
Configuration in `terraform/vms.tf`:
|
||||
```hcl
|
||||
memory = 4096 # maximum memory
|
||||
balloon = 2048 # minimum memory (shrinks to this when idle)
|
||||
```
|
||||
|
||||
Pros:
|
||||
- VMs get memory on-demand without reboots
|
||||
- Better host memory utilization
|
||||
- Solves upgrade OOM without permanently allocating 4GB
|
||||
|
||||
Cons:
|
||||
- Requires QEMU guest agent running in guest
|
||||
- Guest can experience memory pressure if host is overcommitted
|
||||
|
||||
Ballooning and zram are complementary - ballooning provides headroom from the host, zram provides overflow within the guest.
|
||||
224
docs/plans/security-hardening.md
Normal file
224
docs/plans/security-hardening.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# Security Hardening Plan
|
||||
|
||||
## Overview
|
||||
|
||||
Address security gaps identified in infrastructure review. Focus areas: SSH hardening, network security, logging improvements, and secrets management.
|
||||
|
||||
## Current State
|
||||
|
||||
- SSH allows password auth and unrestricted root login (`system/sshd.nix`)
|
||||
- Firewall disabled on all hosts (`networking.firewall.enable = false`)
|
||||
- Promtail ships logs over HTTP to Loki
|
||||
- Loki has no authentication (`auth_enabled = false`)
|
||||
- AppRole secret-IDs never expire (`secret_id_ttl = 0`)
|
||||
- Vault TLS verification disabled by default (`skipTlsVerify = true`)
|
||||
- Audit logging exists (`common/ssh-audit.nix`) but not applied globally
|
||||
- Alert rules focus on availability, no security event detection
|
||||
|
||||
## Priority Matrix
|
||||
|
||||
| Issue | Severity | Effort | Priority |
|
||||
|-------|----------|--------|----------|
|
||||
| SSH password auth | High | Low | **P1** |
|
||||
| Firewall disabled | High | Medium | **P1** |
|
||||
| Promtail HTTP (no TLS) | High | Medium | **P2** |
|
||||
| No security alerting | Medium | Low | **P2** |
|
||||
| Audit logging not global | Low | Low | **P2** |
|
||||
| Loki no auth | Medium | Medium | **P3** |
|
||||
| Secret-ID TTL | Medium | Medium | **P3** |
|
||||
| Vault skipTlsVerify | Medium | Low | **P3** |
|
||||
|
||||
## Phase 1: Quick Wins (P1)
|
||||
|
||||
### 1.1 SSH Hardening
|
||||
|
||||
Edit `system/sshd.nix`:
|
||||
|
||||
```nix
|
||||
services.openssh = {
|
||||
enable = true;
|
||||
settings = {
|
||||
PermitRootLogin = "prohibit-password"; # Key-only root login
|
||||
PasswordAuthentication = false;
|
||||
KbdInteractiveAuthentication = false;
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
**Prerequisite:** Verify all hosts have SSH keys deployed for root.
|
||||
|
||||
### 1.2 Enable Firewall
|
||||
|
||||
Create `system/firewall.nix` with default deny policy:
|
||||
|
||||
```nix
|
||||
{ ... }: {
|
||||
networking.firewall.enable = true;
|
||||
|
||||
# Use openssh's built-in firewall integration
|
||||
services.openssh.openFirewall = true;
|
||||
}
|
||||
```
|
||||
|
||||
**Useful firewall options:**
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `networking.firewall.trustedInterfaces` | Accept all traffic from these interfaces (e.g., `[ "lo" ]`) |
|
||||
| `networking.firewall.interfaces.<name>.allowedTCPPorts` | Per-interface port rules |
|
||||
| `networking.firewall.extraInputRules` | Custom nftables rules (for complex filtering) |
|
||||
|
||||
**Network range restrictions:** Consider restricting SSH to the infrastructure subnet (`10.69.13.0/24`) using `extraInputRules` for defense in depth. However, this adds complexity and may not be necessary given the trusted network model.
|
||||
|
||||
#### Per-Interface Rules (http-proxy WireGuard)
|
||||
|
||||
The `http-proxy` host has a WireGuard interface (`wg0`) that may need different rules than the LAN interface. Use `networking.firewall.interfaces` to apply per-interface policies:
|
||||
|
||||
```nix
|
||||
# Example: http-proxy with different rules per interface
|
||||
networking.firewall = {
|
||||
enable = true;
|
||||
|
||||
# Default: only SSH (via openFirewall)
|
||||
allowedTCPPorts = [ ];
|
||||
|
||||
# LAN interface: allow HTTP/HTTPS
|
||||
interfaces.ens18 = {
|
||||
allowedTCPPorts = [ 80 443 ];
|
||||
};
|
||||
|
||||
# WireGuard interface: restrict to specific services or trust fully
|
||||
interfaces.wg0 = {
|
||||
allowedTCPPorts = [ 80 443 ];
|
||||
# Or use trustedInterfaces = [ "wg0" ] if fully trusted
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
**TODO:** Investigate current WireGuard usage on http-proxy to determine appropriate rules.
|
||||
|
||||
Then per-host, open required ports:
|
||||
|
||||
| Host | Additional Ports |
|
||||
|------|------------------|
|
||||
| ns1/ns2 | 53 (TCP/UDP) |
|
||||
| vault01 | 8200 |
|
||||
| monitoring01 | 3100, 9090, 3000, 9093 |
|
||||
| http-proxy | 80, 443 |
|
||||
| nats1 | 4222 |
|
||||
| ha1 | 1883, 8123 |
|
||||
| jelly01 | 8096 |
|
||||
| nix-cache01 | 5000 |
|
||||
|
||||
## Phase 2: Logging & Detection (P2)
|
||||
|
||||
### 2.1 Enable TLS for Promtail → Loki
|
||||
|
||||
Update `system/monitoring/logs.nix`:
|
||||
|
||||
```nix
|
||||
clients = [{
|
||||
url = "https://monitoring01.home.2rjus.net:3100/loki/api/v1/push";
|
||||
tls_config = {
|
||||
ca_file = "/etc/ssl/certs/homelab-root-ca.pem";
|
||||
};
|
||||
}];
|
||||
```
|
||||
|
||||
Requires:
|
||||
- Configure Loki with TLS certificate (use internal ACME)
|
||||
- Ensure all hosts trust root CA (already done via `system/pki/root-ca.nix`)
|
||||
|
||||
### 2.2 Security Alert Rules
|
||||
|
||||
Add to `services/monitoring/rules.yml`:
|
||||
|
||||
```yaml
|
||||
- name: security_rules
|
||||
rules:
|
||||
- alert: ssh_auth_failures
|
||||
expr: increase(node_logind_sessions_total[5m]) > 20
|
||||
for: 0m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Unusual login activity on {{ $labels.instance }}"
|
||||
|
||||
- alert: vault_secret_fetch_failure
|
||||
expr: increase(vault_secret_failures[5m]) > 5
|
||||
for: 0m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Vault secret fetch failures on {{ $labels.instance }}"
|
||||
```
|
||||
|
||||
Also add Loki-based alerts for:
|
||||
- Failed SSH attempts: `{job="systemd-journal"} |= "Failed password"`
|
||||
- sudo usage: `{job="systemd-journal"} |= "sudo"`
|
||||
|
||||
### 2.3 Global Audit Logging
|
||||
|
||||
Add `./common/ssh-audit.nix` import to `system/default.nix`:
|
||||
|
||||
```nix
|
||||
imports = [
|
||||
# ... existing imports
|
||||
../common/ssh-audit.nix
|
||||
];
|
||||
```
|
||||
|
||||
## Phase 3: Defense in Depth (P3)
|
||||
|
||||
### 3.1 Loki Authentication
|
||||
|
||||
Options:
|
||||
1. **Basic auth via reverse proxy** - Put Loki behind Caddy with auth
|
||||
2. **Loki multi-tenancy** - Enable `auth_enabled = true` and use tenant IDs
|
||||
3. **Network isolation** - Bind Loki only to localhost, expose via authenticated proxy
|
||||
|
||||
Recommendation: Option 1 (reverse proxy) is simplest for homelab.
|
||||
|
||||
### 3.2 AppRole Secret Rotation
|
||||
|
||||
Update `terraform/vault/approle.tf`:
|
||||
|
||||
```hcl
|
||||
secret_id_ttl = 2592000 # 30 days
|
||||
```
|
||||
|
||||
Add documentation for manual rotation procedure or implement automated rotation via the existing `restartTrigger` mechanism in `vault-secrets.nix`.
|
||||
|
||||
### 3.3 Enable Vault TLS Verification
|
||||
|
||||
Change default in `system/vault-secrets.nix`:
|
||||
|
||||
```nix
|
||||
skipTlsVerify = mkOption {
|
||||
type = types.bool;
|
||||
default = false; # Changed from true
|
||||
};
|
||||
```
|
||||
|
||||
**Prerequisite:** Verify all hosts trust the internal CA that signed the Vault certificate.
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. **Test on test-tier first** - Deploy phases 1-2 to testvm01/02/03
|
||||
2. **Validate SSH access** - Ensure key-based login works before disabling passwords
|
||||
3. **Document firewall ports** - Create reference of ports per host before enabling
|
||||
4. **Phase prod rollout** - Deploy to prod hosts one at a time, verify each
|
||||
|
||||
## Open Questions
|
||||
|
||||
- [ ] Do all hosts have SSH keys configured for root access?
|
||||
- [ ] Should firewall rules be per-host or use a central definition with roles?
|
||||
- [ ] Should Loki authentication use the existing Kanidm setup?
|
||||
|
||||
**Resolved:** Password-based SSH access for recovery is not required - most hosts have console access through Proxmox or physical access, which provides an out-of-band recovery path if SSH keys fail.
|
||||
|
||||
## Notes
|
||||
|
||||
- Firewall changes are the highest risk - test thoroughly on test-tier
|
||||
- SSH hardening must not lock out access - verify keys first
|
||||
- Consider creating a "break glass" procedure for emergency access if keys fail
|
||||
267
docs/user-management.md
Normal file
267
docs/user-management.md
Normal file
@@ -0,0 +1,267 @@
|
||||
# User Management with Kanidm
|
||||
|
||||
Central authentication for the homelab using Kanidm.
|
||||
|
||||
## Overview
|
||||
|
||||
- **Server**: kanidm01.home.2rjus.net (auth.home.2rjus.net)
|
||||
- **WebUI**: https://auth.home.2rjus.net
|
||||
- **LDAPS**: port 636
|
||||
|
||||
## CLI Setup
|
||||
|
||||
The `kanidm` CLI is available in the devshell:
|
||||
|
||||
```bash
|
||||
nix develop
|
||||
|
||||
# Login as idm_admin
|
||||
kanidm login --name idm_admin --url https://auth.home.2rjus.net
|
||||
```
|
||||
|
||||
## User Management
|
||||
|
||||
POSIX users are managed imperatively via the `kanidm` CLI. This allows setting
|
||||
all attributes (including UNIX password) in one workflow.
|
||||
|
||||
### Creating a POSIX User
|
||||
|
||||
```bash
|
||||
# Create the person
|
||||
kanidm person create <username> "<Display Name>"
|
||||
|
||||
# Add to groups
|
||||
kanidm group add-members ssh-users <username>
|
||||
|
||||
# Enable POSIX (UID is auto-assigned)
|
||||
kanidm person posix set <username>
|
||||
|
||||
# Set UNIX password (required for SSH login, min 10 characters)
|
||||
kanidm person posix set-password <username>
|
||||
|
||||
# Optionally set login shell
|
||||
kanidm person posix set <username> --shell /bin/zsh
|
||||
```
|
||||
|
||||
### Example: Full User Creation
|
||||
|
||||
```bash
|
||||
kanidm person create testuser "Test User"
|
||||
kanidm group add-members ssh-users testuser
|
||||
kanidm person posix set testuser
|
||||
kanidm person posix set-password testuser
|
||||
kanidm person get testuser
|
||||
```
|
||||
|
||||
After creation, verify on a client host:
|
||||
```bash
|
||||
getent passwd testuser
|
||||
ssh testuser@testvm01.home.2rjus.net
|
||||
```
|
||||
|
||||
### Viewing User Details
|
||||
|
||||
```bash
|
||||
kanidm person get <username>
|
||||
```
|
||||
|
||||
### Removing a User
|
||||
|
||||
```bash
|
||||
kanidm person delete <username>
|
||||
```
|
||||
|
||||
## Group Management
|
||||
|
||||
Groups for POSIX access are also managed via CLI.
|
||||
|
||||
### Creating a POSIX Group
|
||||
|
||||
```bash
|
||||
# Create the group
|
||||
kanidm group create <group-name>
|
||||
|
||||
# Enable POSIX with a specific GID
|
||||
kanidm group posix set <group-name> --gidnumber <gid>
|
||||
```
|
||||
|
||||
### Adding Members
|
||||
|
||||
```bash
|
||||
kanidm group add-members <group-name> <username>
|
||||
```
|
||||
|
||||
### Viewing Group Details
|
||||
|
||||
```bash
|
||||
kanidm group get <group-name>
|
||||
kanidm group list-members <group-name>
|
||||
```
|
||||
|
||||
### Example: Full Group Creation
|
||||
|
||||
```bash
|
||||
kanidm group create testgroup
|
||||
kanidm group posix set testgroup --gidnumber 68010
|
||||
kanidm group add-members testgroup testuser
|
||||
kanidm group get testgroup
|
||||
```
|
||||
|
||||
After creation, verify on a client host:
|
||||
```bash
|
||||
getent group testgroup
|
||||
```
|
||||
|
||||
### Current Groups
|
||||
|
||||
| Group | GID | Purpose |
|
||||
|-------|-----|---------|
|
||||
| ssh-users | 68000 | SSH login access |
|
||||
| admins | 68001 | Administrative access |
|
||||
| users | 68002 | General users |
|
||||
|
||||
### UID/GID Allocation
|
||||
|
||||
Kanidm auto-assigns UIDs/GIDs from its configured range. For manually assigned GIDs:
|
||||
|
||||
| Range | Purpose |
|
||||
|-------|---------|
|
||||
| 65,536+ | Users (auto-assigned) |
|
||||
| 68,000 - 68,999 | Groups (manually assigned) |
|
||||
|
||||
## PAM/NSS Client Configuration
|
||||
|
||||
Enable central authentication on a host:
|
||||
|
||||
```nix
|
||||
homelab.kanidm.enable = true;
|
||||
```
|
||||
|
||||
This configures:
|
||||
- `services.kanidm.enablePam = true`
|
||||
- Client connection to auth.home.2rjus.net
|
||||
- Login authorization for `ssh-users` group
|
||||
- Short usernames (`torjus` instead of `torjus@home.2rjus.net`)
|
||||
- Home directory symlinks (`/home/torjus` → UUID-based directory)
|
||||
|
||||
### Enabled Hosts
|
||||
|
||||
- testvm01, testvm02, testvm03 (test tier)
|
||||
|
||||
### Options
|
||||
|
||||
```nix
|
||||
homelab.kanidm = {
|
||||
enable = true;
|
||||
server = "https://auth.home.2rjus.net"; # default
|
||||
allowedLoginGroups = [ "ssh-users" ]; # default
|
||||
};
|
||||
```
|
||||
|
||||
### Home Directories
|
||||
|
||||
Home directories use UUID-based paths for stability (so renaming a user doesn't
|
||||
require moving their home directory). Symlinks provide convenient access:
|
||||
|
||||
```
|
||||
/home/torjus -> /home/e4f4c56c-4aee-4c20-846f-90cb69807733
|
||||
```
|
||||
|
||||
The symlinks are created by `kanidm-unixd-tasks` on first login.
|
||||
|
||||
## Testing
|
||||
|
||||
### Verify NSS Resolution
|
||||
|
||||
```bash
|
||||
# Check user resolution
|
||||
getent passwd <username>
|
||||
|
||||
# Check group resolution
|
||||
getent group <group-name>
|
||||
```
|
||||
|
||||
### Test SSH Login
|
||||
|
||||
```bash
|
||||
ssh <username>@<hostname>.home.2rjus.net
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "PAM user mismatch" error
|
||||
|
||||
SSH fails with "fatal: PAM user mismatch" in logs. This happens when Kanidm returns
|
||||
usernames in SPN format (`torjus@home.2rjus.net`) but SSH expects short names (`torjus`).
|
||||
|
||||
**Solution**: Configure `uid_attr_map = "name"` in unixSettings (already set in our module).
|
||||
|
||||
Check current format:
|
||||
```bash
|
||||
getent passwd torjus
|
||||
# Should show: torjus:x:65536:...
|
||||
# NOT: torjus@home.2rjus.net:x:65536:...
|
||||
```
|
||||
|
||||
### User resolves but SSH fails immediately
|
||||
|
||||
The user's login group (e.g., `ssh-users`) likely doesn't have POSIX enabled:
|
||||
|
||||
```bash
|
||||
# Check if group has POSIX
|
||||
getent group ssh-users
|
||||
|
||||
# If empty, enable POSIX on the server
|
||||
kanidm group posix set ssh-users --gidnumber 68000
|
||||
```
|
||||
|
||||
### User doesn't resolve via getent
|
||||
|
||||
1. Check kanidm-unixd service is running:
|
||||
```bash
|
||||
systemctl status kanidm-unixd
|
||||
```
|
||||
|
||||
2. Check unixd can reach server:
|
||||
```bash
|
||||
kanidm-unix status
|
||||
# Should show: system: online, Kanidm: online
|
||||
```
|
||||
|
||||
3. Check client can reach server:
|
||||
```bash
|
||||
curl -s https://auth.home.2rjus.net/status
|
||||
```
|
||||
|
||||
4. Check user has POSIX enabled on server:
|
||||
```bash
|
||||
kanidm person get <username>
|
||||
```
|
||||
|
||||
5. Restart nscd to clear stale cache:
|
||||
```bash
|
||||
systemctl restart nscd
|
||||
```
|
||||
|
||||
6. Invalidate kanidm cache:
|
||||
```bash
|
||||
kanidm-unix cache-invalidate
|
||||
```
|
||||
|
||||
### Changes not taking effect after deployment
|
||||
|
||||
NixOS uses nsncd (a Rust reimplementation of nscd) for NSS caching. After deploying
|
||||
kanidm-unixd config changes, you may need to restart both services:
|
||||
|
||||
```bash
|
||||
systemctl restart kanidm-unixd
|
||||
systemctl restart nscd
|
||||
```
|
||||
|
||||
### Test PAM authentication directly
|
||||
|
||||
Use the kanidm-unix CLI to test PAM auth without SSH:
|
||||
|
||||
```bash
|
||||
kanidm-unix auth-test --name <username>
|
||||
```
|
||||
@@ -207,6 +207,7 @@
|
||||
pkgs.ansible
|
||||
pkgs.opentofu
|
||||
pkgs.openbao
|
||||
pkgs.kanidm_1_8
|
||||
(pkgs.callPackage ./scripts/create-host { })
|
||||
homelab-deploy.packages.${pkgs.system}.default
|
||||
];
|
||||
|
||||
@@ -64,9 +64,5 @@
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
zramSwap = {
|
||||
enable = true;
|
||||
};
|
||||
|
||||
system.stateVersion = "23.11"; # Did you read the comment?
|
||||
}
|
||||
|
||||
@@ -4,6 +4,5 @@
|
||||
./configuration.nix
|
||||
../../services/nix-cache
|
||||
../../services/actions-runner
|
||||
./zram.nix
|
||||
];
|
||||
}
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
{ ... }:
|
||||
{
|
||||
zramSwap = {
|
||||
enable = true;
|
||||
};
|
||||
}
|
||||
@@ -79,5 +79,8 @@
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
# Compressed swap in RAM - prevents OOM during bootstrap nixos-rebuild
|
||||
zramSwap.enable = true;
|
||||
|
||||
system.stateVersion = "25.11";
|
||||
}
|
||||
|
||||
@@ -25,6 +25,9 @@
|
||||
# Enable remote deployment via NATS
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
# Enable Kanidm PAM/NSS for central authentication
|
||||
homelab.kanidm.enable = true;
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
|
||||
@@ -25,6 +25,9 @@
|
||||
# Enable remote deployment via NATS
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
# Enable Kanidm PAM/NSS for central authentication
|
||||
homelab.kanidm.enable = true;
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
|
||||
@@ -25,6 +25,9 @@
|
||||
# Enable remote deployment via NATS
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
# Enable Kanidm PAM/NSS for central authentication
|
||||
homelab.kanidm.enable = true;
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
|
||||
@@ -17,7 +17,8 @@
|
||||
};
|
||||
};
|
||||
|
||||
# Provisioning - initial users/groups
|
||||
# Provision base groups only - users are managed via CLI
|
||||
# See docs/user-management.md for details
|
||||
provision = {
|
||||
enable = true;
|
||||
idmAdminPasswordFile = config.vault.secrets.kanidm-idm-admin.outputDir;
|
||||
@@ -28,10 +29,7 @@
|
||||
ssh-users = { };
|
||||
};
|
||||
|
||||
persons.torjus = {
|
||||
displayName = "Torjus";
|
||||
groups = [ "admins" "users" "ssh-users" ];
|
||||
};
|
||||
# Regular users (persons) are managed imperatively via kanidm CLI
|
||||
};
|
||||
};
|
||||
|
||||
@@ -46,7 +44,7 @@
|
||||
extraDomainNames = [ "${config.networking.hostName}.home.2rjus.net" ];
|
||||
};
|
||||
|
||||
# Vault secret for idm_admin password
|
||||
# Vault secret for idm_admin password (used for provisioning)
|
||||
vault.secrets.kanidm-idm-admin = {
|
||||
secretPath = "kanidm/idm-admin-password";
|
||||
extractKey = "password";
|
||||
|
||||
@@ -4,6 +4,7 @@
|
||||
./acme.nix
|
||||
./autoupgrade.nix
|
||||
./homelab-deploy.nix
|
||||
./kanidm-client.nix
|
||||
./monitoring
|
||||
./motd.nix
|
||||
./packages.nix
|
||||
@@ -12,5 +13,6 @@
|
||||
./pki/root-ca.nix
|
||||
./sshd.nix
|
||||
./vault-secrets.nix
|
||||
./zram.nix
|
||||
];
|
||||
}
|
||||
|
||||
42
system/kanidm-client.nix
Normal file
42
system/kanidm-client.nix
Normal file
@@ -0,0 +1,42 @@
|
||||
{ lib, config, pkgs, ... }:
|
||||
let
|
||||
cfg = config.homelab.kanidm;
|
||||
in
|
||||
{
|
||||
options.homelab.kanidm = {
|
||||
enable = lib.mkEnableOption "Kanidm PAM/NSS client for central authentication";
|
||||
|
||||
server = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
default = "https://auth.home.2rjus.net";
|
||||
description = "URI of the Kanidm server";
|
||||
};
|
||||
|
||||
allowedLoginGroups = lib.mkOption {
|
||||
type = lib.types.listOf lib.types.str;
|
||||
default = [ "ssh-users" ];
|
||||
description = "Groups allowed to log in via PAM";
|
||||
};
|
||||
};
|
||||
|
||||
config = lib.mkIf cfg.enable {
|
||||
services.kanidm = {
|
||||
package = pkgs.kanidm_1_8;
|
||||
enablePam = true;
|
||||
|
||||
clientSettings = {
|
||||
uri = cfg.server;
|
||||
};
|
||||
|
||||
unixSettings = {
|
||||
pam_allowed_login_groups = cfg.allowedLoginGroups;
|
||||
# Use short names (torjus) instead of SPN format (torjus@home.2rjus.net)
|
||||
# This prevents "PAM user mismatch" errors with SSH
|
||||
uid_attr_map = "name";
|
||||
gid_attr_map = "name";
|
||||
# Create symlink /home/torjus -> /home/torjus@home.2rjus.net
|
||||
home_alias = "name";
|
||||
};
|
||||
};
|
||||
};
|
||||
}
|
||||
8
system/zram.nix
Normal file
8
system/zram.nix
Normal file
@@ -0,0 +1,8 @@
|
||||
# Compressed swap in RAM
|
||||
#
|
||||
# Provides overflow memory during Nix builds and upgrades.
|
||||
# Prevents OOM kills on low-memory hosts (2GB VMs).
|
||||
{ ... }:
|
||||
{
|
||||
zramSwap.enable = true;
|
||||
}
|
||||
Reference in New Issue
Block a user