11 Commits

Author SHA1 Message Date
78eb04205f system: add pipe-to-loki helper script
Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Adds a system-wide script for sending command output or interactive
sessions to Loki for easy sharing with Claude.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:30:53 +01:00
19cb61ebbc Merge pull request 'kanidm-pam-client' (#34) from kanidm-pam-client into master
All checks were successful
Run nix flake check / flake-check (push) Successful in 3m19s
Reviewed-on: #34
2026-02-08 14:14:53 +00:00
9ed09c9a9c docs: add user-management documentation
All checks were successful
Run nix flake check / flake-check (pull_request) Successful in 3m33s
Run nix flake check / flake-check (push) Successful in 2m0s
- CLI workflows for creating users and groups
- Troubleshooting guide (nscd, cache invalidation)
- Home directory behavior (UUID-based with symlinks)
- Update auth-system-replacement plan with progress

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:14:21 +01:00
b31c64f1b9 kanidm: remove declarative user provisioning
Keep base groups (admins, users, ssh-users) provisioned declaratively
but manage regular users via the kanidm CLI. This allows setting POSIX
attributes and passwords in a single workflow.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:14:03 +01:00
54b6e37420 flake: add kanidm to devshell
Add kanidm_1_8 CLI for administering the Kanidm server.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:12:19 +01:00
b845a8bb8b system: add kanidm PAM/NSS client module
Add homelab.kanidm.enable option for central authentication via Kanidm.
The module configures:
- PAM/NSS integration with kanidm-unixd
- Client connection to auth.home.2rjus.net
- Login authorization for ssh-users group

Enable on testvm01-03 for testing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:12:19 +01:00
bfbf0cea68 template2: enable zram for bootstrap
Some checks failed
Run nix flake check / flake-check (push) Failing after 3m34s
Prevents OOM during initial nixos-rebuild on 2GB VMs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 13:34:08 +01:00
3abe5e83a7 docs: add memory ballooning as fallback option
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m5s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 13:29:42 +01:00
67c27555f3 docs: add memory issues follow-up plan
All checks were successful
Run nix flake check / flake-check (push) Successful in 2m2s
Track zram change effectiveness for OOM prevention during upgrades.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 13:26:31 +01:00
1674b6a844 system: enable zram swap for all hosts
Some checks failed
Run nix flake check / flake-check (push) Failing after 12m6s
Provides compressed swap in RAM to prevent OOM kills during
nixos-rebuild on low-memory VMs (2GB). Removes duplicate zram
configs from jelly01 and nix-cache01.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 13:02:58 +01:00
311be282b6 docs: add security hardening plan
Some checks failed
Run nix flake check / flake-check (push) Failing after 2s
Based on security review findings, covering SSH hardening, firewall
enablement, log transport TLS, security alerting, and secrets management.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 05:26:15 +01:00
17 changed files with 855 additions and 31 deletions

View File

@@ -66,9 +66,9 @@ This future migration path is a strong argument for Kanidm over LDAP-only soluti
- Vault integration for idm_admin password
- LDAPS on port 636
2. **Configure declarative provisioning**
- Groups: `admins`, `users`, `ssh-users`
- User: `torjus` (member of all groups)
2. **Configure provisioning**
- Groups provisioned declaratively: `admins`, `users`, `ssh-users`
- Users managed imperatively via CLI (allows setting POSIX passwords in one step)
- POSIX attributes enabled (UID/GID range 65,536-69,999)
3. **Test NAS integration** (in progress)
@@ -80,14 +80,16 @@ This future migration path is a strong argument for Kanidm over LDAP-only soluti
- Grafana
- Other services as needed
5. **Create client module** in `system/` for PAM/NSS
- Enable on all hosts that need central auth
- Configure trusted CA
5. **Create client module** in `system/` for PAM/NSS
- Module: `system/kanidm-client.nix`
- `homelab.kanidm.enable = true` enables PAM/NSS
- Short usernames (not SPN format)
- Home directory symlinks via `home_alias`
- Enabled on test tier: testvm01, testvm02, testvm03
6. **Documentation**
- User management procedures
- Adding new OAuth2 clients
- Troubleshooting PAM/NSS issues
6. **Documentation**
- `docs/user-management.md` - CLI workflows, troubleshooting
- User/group creation procedures verified working
## Progress
@@ -106,14 +108,37 @@ This future migration path is a strong argument for Kanidm over LDAP-only soluti
- Prometheus monitoring scrape target configured
**Provisioned entities:**
- Groups: `admins`, `users`, `ssh-users`
- User: `torjus` (member of all groups, POSIX enabled with GID 65536)
- Groups: `admins`, `users`, `ssh-users` (declarative)
- Users managed via CLI (imperative)
**Verified working:**
- WebUI login with idm_admin
- LDAP bind and search with POSIX-enabled user
- LDAPS with valid internal CA certificate
### Completed (2026-02-08) - PAM/NSS Client
**Client module deployed (`system/kanidm-client.nix`):**
- `homelab.kanidm.enable = true` enables PAM/NSS integration
- Connects to auth.home.2rjus.net
- Short usernames (`torjus` instead of `torjus@home.2rjus.net`)
- Home directory symlinks (`/home/torjus` → UUID-based dir)
- Login restricted to `ssh-users` group
**Enabled on test tier:**
- testvm01, testvm02, testvm03
**Verified working:**
- User/group resolution via `getent`
- SSH login with Kanidm unix passwords
- Home directory creation with symlinks
- Imperative user/group creation via CLI
**Documentation:**
- `docs/user-management.md` with full CLI workflows
- Password requirements (min 10 chars)
- Troubleshooting guide (nscd, cache invalidation)
### UID/GID Range (Resolved)
**Range: 65,536 - 69,999** (manually allocated)
@@ -128,10 +153,9 @@ Rationale:
### Next Steps
1. Deploy to monitoring01 to enable Prometheus scraping
1. Enable PAM/NSS on production hosts (after test tier validation)
2. Configure TrueNAS LDAP client for NAS integration testing
3. Add OAuth2 clients (Grafana first)
4. Create PAM/NSS client module for other hosts
## References

View File

@@ -0,0 +1,116 @@
# Memory Issues Follow-up
Tracking the zram change to verify it resolves OOM issues during nixos-upgrade on low-memory hosts.
## Background
On 2026-02-08, ns2 (2GB RAM) experienced an OOM kill during nixos-upgrade. The Nix evaluation process consumed ~1.6GB before being killed by the kernel. ns1 (manually increased to 4GB) succeeded with the same upgrade.
Root cause: 2GB RAM is insufficient for Nix flake evaluation without swap.
## Fix Applied
**Commit:** `1674b6a` - system: enable zram swap for all hosts
**Merged:** 2026-02-08 ~12:15 UTC
**Change:** Added `zramSwap.enable = true` to `system/zram.nix`, providing ~2GB compressed swap on all hosts.
## Timeline
| Time (UTC) | Event |
|------------|-------|
| 05:00:46 | ns2 nixos-upgrade OOM killed |
| 05:01:47 | `nixos_upgrade_failed` alert fired |
| 12:15 | zram commit merged to master |
| 12:19 | ns2 rebooted with zram enabled |
| 12:20 | ns1 rebooted (memory reduced to 2GB via tofu) |
## Hosts Affected
All 2GB VMs that run nixos-upgrade:
- ns1, ns2 (DNS)
- vault01
- testvm01, testvm02, testvm03
- kanidm01
## Metrics to Monitor
Check these in Grafana or via PromQL to verify the fix:
### Swap availability (should be ~2GB after upgrade)
```promql
node_memory_SwapTotal_bytes / 1024 / 1024
```
### Swap usage during upgrades
```promql
(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / 1024 / 1024
```
### Zswap compressed bytes (active compression)
```promql
node_memory_Zswap_bytes / 1024 / 1024
```
### Upgrade failures (should be 0)
```promql
node_systemd_unit_state{name="nixos-upgrade.service", state="failed"}
```
### Memory available during upgrades
```promql
node_memory_MemAvailable_bytes / 1024 / 1024
```
## Verification Steps
After a few days (allow auto-upgrades to run on all hosts):
1. Check all hosts have swap enabled:
```promql
node_memory_SwapTotal_bytes > 0
```
2. Check for any upgrade failures since the fix:
```promql
count_over_time(ALERTS{alertname="nixos_upgrade_failed"}[7d])
```
3. Review if any hosts used swap during upgrades (check historical graphs)
## Success Criteria
- No `nixos_upgrade_failed` alerts due to OOM after 2026-02-08
- All hosts show ~2GB swap available
- Upgrades complete successfully on 2GB VMs
## Fallback Options
If zram is insufficient:
1. **Increase VM memory** - Update `terraform/vms.tf` to 4GB for affected hosts
2. **Enable memory ballooning** - Configure VMs with dynamic memory allocation (see below)
3. **Use remote builds** - Configure `nix.buildMachines` to offload evaluation
4. **Reduce flake size** - Split configurations to reduce evaluation memory
### Memory Ballooning
Proxmox supports memory ballooning, which allows VMs to dynamically grow/shrink memory allocation based on demand. The balloon driver inside the guest communicates with the hypervisor to release or reclaim memory pages.
Configuration in `terraform/vms.tf`:
```hcl
memory = 4096 # maximum memory
balloon = 2048 # minimum memory (shrinks to this when idle)
```
Pros:
- VMs get memory on-demand without reboots
- Better host memory utilization
- Solves upgrade OOM without permanently allocating 4GB
Cons:
- Requires QEMU guest agent running in guest
- Guest can experience memory pressure if host is overcommitted
Ballooning and zram are complementary - ballooning provides headroom from the host, zram provides overflow within the guest.

View File

@@ -0,0 +1,224 @@
# Security Hardening Plan
## Overview
Address security gaps identified in infrastructure review. Focus areas: SSH hardening, network security, logging improvements, and secrets management.
## Current State
- SSH allows password auth and unrestricted root login (`system/sshd.nix`)
- Firewall disabled on all hosts (`networking.firewall.enable = false`)
- Promtail ships logs over HTTP to Loki
- Loki has no authentication (`auth_enabled = false`)
- AppRole secret-IDs never expire (`secret_id_ttl = 0`)
- Vault TLS verification disabled by default (`skipTlsVerify = true`)
- Audit logging exists (`common/ssh-audit.nix`) but not applied globally
- Alert rules focus on availability, no security event detection
## Priority Matrix
| Issue | Severity | Effort | Priority |
|-------|----------|--------|----------|
| SSH password auth | High | Low | **P1** |
| Firewall disabled | High | Medium | **P1** |
| Promtail HTTP (no TLS) | High | Medium | **P2** |
| No security alerting | Medium | Low | **P2** |
| Audit logging not global | Low | Low | **P2** |
| Loki no auth | Medium | Medium | **P3** |
| Secret-ID TTL | Medium | Medium | **P3** |
| Vault skipTlsVerify | Medium | Low | **P3** |
## Phase 1: Quick Wins (P1)
### 1.1 SSH Hardening
Edit `system/sshd.nix`:
```nix
services.openssh = {
enable = true;
settings = {
PermitRootLogin = "prohibit-password"; # Key-only root login
PasswordAuthentication = false;
KbdInteractiveAuthentication = false;
};
};
```
**Prerequisite:** Verify all hosts have SSH keys deployed for root.
### 1.2 Enable Firewall
Create `system/firewall.nix` with default deny policy:
```nix
{ ... }: {
networking.firewall.enable = true;
# Use openssh's built-in firewall integration
services.openssh.openFirewall = true;
}
```
**Useful firewall options:**
| Option | Description |
|--------|-------------|
| `networking.firewall.trustedInterfaces` | Accept all traffic from these interfaces (e.g., `[ "lo" ]`) |
| `networking.firewall.interfaces.<name>.allowedTCPPorts` | Per-interface port rules |
| `networking.firewall.extraInputRules` | Custom nftables rules (for complex filtering) |
**Network range restrictions:** Consider restricting SSH to the infrastructure subnet (`10.69.13.0/24`) using `extraInputRules` for defense in depth. However, this adds complexity and may not be necessary given the trusted network model.
#### Per-Interface Rules (http-proxy WireGuard)
The `http-proxy` host has a WireGuard interface (`wg0`) that may need different rules than the LAN interface. Use `networking.firewall.interfaces` to apply per-interface policies:
```nix
# Example: http-proxy with different rules per interface
networking.firewall = {
enable = true;
# Default: only SSH (via openFirewall)
allowedTCPPorts = [ ];
# LAN interface: allow HTTP/HTTPS
interfaces.ens18 = {
allowedTCPPorts = [ 80 443 ];
};
# WireGuard interface: restrict to specific services or trust fully
interfaces.wg0 = {
allowedTCPPorts = [ 80 443 ];
# Or use trustedInterfaces = [ "wg0" ] if fully trusted
};
};
```
**TODO:** Investigate current WireGuard usage on http-proxy to determine appropriate rules.
Then per-host, open required ports:
| Host | Additional Ports |
|------|------------------|
| ns1/ns2 | 53 (TCP/UDP) |
| vault01 | 8200 |
| monitoring01 | 3100, 9090, 3000, 9093 |
| http-proxy | 80, 443 |
| nats1 | 4222 |
| ha1 | 1883, 8123 |
| jelly01 | 8096 |
| nix-cache01 | 5000 |
## Phase 2: Logging & Detection (P2)
### 2.1 Enable TLS for Promtail → Loki
Update `system/monitoring/logs.nix`:
```nix
clients = [{
url = "https://monitoring01.home.2rjus.net:3100/loki/api/v1/push";
tls_config = {
ca_file = "/etc/ssl/certs/homelab-root-ca.pem";
};
}];
```
Requires:
- Configure Loki with TLS certificate (use internal ACME)
- Ensure all hosts trust root CA (already done via `system/pki/root-ca.nix`)
### 2.2 Security Alert Rules
Add to `services/monitoring/rules.yml`:
```yaml
- name: security_rules
rules:
- alert: ssh_auth_failures
expr: increase(node_logind_sessions_total[5m]) > 20
for: 0m
labels:
severity: warning
annotations:
summary: "Unusual login activity on {{ $labels.instance }}"
- alert: vault_secret_fetch_failure
expr: increase(vault_secret_failures[5m]) > 5
for: 0m
labels:
severity: warning
annotations:
summary: "Vault secret fetch failures on {{ $labels.instance }}"
```
Also add Loki-based alerts for:
- Failed SSH attempts: `{job="systemd-journal"} |= "Failed password"`
- sudo usage: `{job="systemd-journal"} |= "sudo"`
### 2.3 Global Audit Logging
Add `./common/ssh-audit.nix` import to `system/default.nix`:
```nix
imports = [
# ... existing imports
../common/ssh-audit.nix
];
```
## Phase 3: Defense in Depth (P3)
### 3.1 Loki Authentication
Options:
1. **Basic auth via reverse proxy** - Put Loki behind Caddy with auth
2. **Loki multi-tenancy** - Enable `auth_enabled = true` and use tenant IDs
3. **Network isolation** - Bind Loki only to localhost, expose via authenticated proxy
Recommendation: Option 1 (reverse proxy) is simplest for homelab.
### 3.2 AppRole Secret Rotation
Update `terraform/vault/approle.tf`:
```hcl
secret_id_ttl = 2592000 # 30 days
```
Add documentation for manual rotation procedure or implement automated rotation via the existing `restartTrigger` mechanism in `vault-secrets.nix`.
### 3.3 Enable Vault TLS Verification
Change default in `system/vault-secrets.nix`:
```nix
skipTlsVerify = mkOption {
type = types.bool;
default = false; # Changed from true
};
```
**Prerequisite:** Verify all hosts trust the internal CA that signed the Vault certificate.
## Implementation Order
1. **Test on test-tier first** - Deploy phases 1-2 to testvm01/02/03
2. **Validate SSH access** - Ensure key-based login works before disabling passwords
3. **Document firewall ports** - Create reference of ports per host before enabling
4. **Phase prod rollout** - Deploy to prod hosts one at a time, verify each
## Open Questions
- [ ] Do all hosts have SSH keys configured for root access?
- [ ] Should firewall rules be per-host or use a central definition with roles?
- [ ] Should Loki authentication use the existing Kanidm setup?
**Resolved:** Password-based SSH access for recovery is not required - most hosts have console access through Proxmox or physical access, which provides an out-of-band recovery path if SSH keys fail.
## Notes
- Firewall changes are the highest risk - test thoroughly on test-tier
- SSH hardening must not lock out access - verify keys first
- Consider creating a "break glass" procedure for emergency access if keys fail

267
docs/user-management.md Normal file
View File

@@ -0,0 +1,267 @@
# User Management with Kanidm
Central authentication for the homelab using Kanidm.
## Overview
- **Server**: kanidm01.home.2rjus.net (auth.home.2rjus.net)
- **WebUI**: https://auth.home.2rjus.net
- **LDAPS**: port 636
## CLI Setup
The `kanidm` CLI is available in the devshell:
```bash
nix develop
# Login as idm_admin
kanidm login --name idm_admin --url https://auth.home.2rjus.net
```
## User Management
POSIX users are managed imperatively via the `kanidm` CLI. This allows setting
all attributes (including UNIX password) in one workflow.
### Creating a POSIX User
```bash
# Create the person
kanidm person create <username> "<Display Name>"
# Add to groups
kanidm group add-members ssh-users <username>
# Enable POSIX (UID is auto-assigned)
kanidm person posix set <username>
# Set UNIX password (required for SSH login, min 10 characters)
kanidm person posix set-password <username>
# Optionally set login shell
kanidm person posix set <username> --shell /bin/zsh
```
### Example: Full User Creation
```bash
kanidm person create testuser "Test User"
kanidm group add-members ssh-users testuser
kanidm person posix set testuser
kanidm person posix set-password testuser
kanidm person get testuser
```
After creation, verify on a client host:
```bash
getent passwd testuser
ssh testuser@testvm01.home.2rjus.net
```
### Viewing User Details
```bash
kanidm person get <username>
```
### Removing a User
```bash
kanidm person delete <username>
```
## Group Management
Groups for POSIX access are also managed via CLI.
### Creating a POSIX Group
```bash
# Create the group
kanidm group create <group-name>
# Enable POSIX with a specific GID
kanidm group posix set <group-name> --gidnumber <gid>
```
### Adding Members
```bash
kanidm group add-members <group-name> <username>
```
### Viewing Group Details
```bash
kanidm group get <group-name>
kanidm group list-members <group-name>
```
### Example: Full Group Creation
```bash
kanidm group create testgroup
kanidm group posix set testgroup --gidnumber 68010
kanidm group add-members testgroup testuser
kanidm group get testgroup
```
After creation, verify on a client host:
```bash
getent group testgroup
```
### Current Groups
| Group | GID | Purpose |
|-------|-----|---------|
| ssh-users | 68000 | SSH login access |
| admins | 68001 | Administrative access |
| users | 68002 | General users |
### UID/GID Allocation
Kanidm auto-assigns UIDs/GIDs from its configured range. For manually assigned GIDs:
| Range | Purpose |
|-------|---------|
| 65,536+ | Users (auto-assigned) |
| 68,000 - 68,999 | Groups (manually assigned) |
## PAM/NSS Client Configuration
Enable central authentication on a host:
```nix
homelab.kanidm.enable = true;
```
This configures:
- `services.kanidm.enablePam = true`
- Client connection to auth.home.2rjus.net
- Login authorization for `ssh-users` group
- Short usernames (`torjus` instead of `torjus@home.2rjus.net`)
- Home directory symlinks (`/home/torjus` → UUID-based directory)
### Enabled Hosts
- testvm01, testvm02, testvm03 (test tier)
### Options
```nix
homelab.kanidm = {
enable = true;
server = "https://auth.home.2rjus.net"; # default
allowedLoginGroups = [ "ssh-users" ]; # default
};
```
### Home Directories
Home directories use UUID-based paths for stability (so renaming a user doesn't
require moving their home directory). Symlinks provide convenient access:
```
/home/torjus -> /home/e4f4c56c-4aee-4c20-846f-90cb69807733
```
The symlinks are created by `kanidm-unixd-tasks` on first login.
## Testing
### Verify NSS Resolution
```bash
# Check user resolution
getent passwd <username>
# Check group resolution
getent group <group-name>
```
### Test SSH Login
```bash
ssh <username>@<hostname>.home.2rjus.net
```
## Troubleshooting
### "PAM user mismatch" error
SSH fails with "fatal: PAM user mismatch" in logs. This happens when Kanidm returns
usernames in SPN format (`torjus@home.2rjus.net`) but SSH expects short names (`torjus`).
**Solution**: Configure `uid_attr_map = "name"` in unixSettings (already set in our module).
Check current format:
```bash
getent passwd torjus
# Should show: torjus:x:65536:...
# NOT: torjus@home.2rjus.net:x:65536:...
```
### User resolves but SSH fails immediately
The user's login group (e.g., `ssh-users`) likely doesn't have POSIX enabled:
```bash
# Check if group has POSIX
getent group ssh-users
# If empty, enable POSIX on the server
kanidm group posix set ssh-users --gidnumber 68000
```
### User doesn't resolve via getent
1. Check kanidm-unixd service is running:
```bash
systemctl status kanidm-unixd
```
2. Check unixd can reach server:
```bash
kanidm-unix status
# Should show: system: online, Kanidm: online
```
3. Check client can reach server:
```bash
curl -s https://auth.home.2rjus.net/status
```
4. Check user has POSIX enabled on server:
```bash
kanidm person get <username>
```
5. Restart nscd to clear stale cache:
```bash
systemctl restart nscd
```
6. Invalidate kanidm cache:
```bash
kanidm-unix cache-invalidate
```
### Changes not taking effect after deployment
NixOS uses nsncd (a Rust reimplementation of nscd) for NSS caching. After deploying
kanidm-unixd config changes, you may need to restart both services:
```bash
systemctl restart kanidm-unixd
systemctl restart nscd
```
### Test PAM authentication directly
Use the kanidm-unix CLI to test PAM auth without SSH:
```bash
kanidm-unix auth-test --name <username>
```

View File

@@ -207,6 +207,7 @@
pkgs.ansible
pkgs.opentofu
pkgs.openbao
pkgs.kanidm_1_8
(pkgs.callPackage ./scripts/create-host { })
homelab-deploy.packages.${pkgs.system}.default
];

View File

@@ -64,9 +64,5 @@
vault.enable = true;
homelab.deploy.enable = true;
zramSwap = {
enable = true;
};
system.stateVersion = "23.11"; # Did you read the comment?
}

View File

@@ -4,6 +4,5 @@
./configuration.nix
../../services/nix-cache
../../services/actions-runner
./zram.nix
];
}

View File

@@ -1,6 +0,0 @@
{ ... }:
{
zramSwap = {
enable = true;
};
}

View File

@@ -79,5 +79,8 @@
# Or disable the firewall altogether.
networking.firewall.enable = false;
# Compressed swap in RAM - prevents OOM during bootstrap nixos-rebuild
zramSwap.enable = true;
system.stateVersion = "25.11";
}

View File

@@ -25,6 +25,9 @@
# Enable remote deployment via NATS
homelab.deploy.enable = true;
# Enable Kanidm PAM/NSS for central authentication
homelab.kanidm.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";

View File

@@ -25,6 +25,9 @@
# Enable remote deployment via NATS
homelab.deploy.enable = true;
# Enable Kanidm PAM/NSS for central authentication
homelab.kanidm.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";

View File

@@ -25,6 +25,9 @@
# Enable remote deployment via NATS
homelab.deploy.enable = true;
# Enable Kanidm PAM/NSS for central authentication
homelab.kanidm.enable = true;
nixpkgs.config.allowUnfree = true;
boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/vda";

View File

@@ -17,7 +17,8 @@
};
};
# Provisioning - initial users/groups
# Provision base groups only - users are managed via CLI
# See docs/user-management.md for details
provision = {
enable = true;
idmAdminPasswordFile = config.vault.secrets.kanidm-idm-admin.outputDir;
@@ -28,10 +29,7 @@
ssh-users = { };
};
persons.torjus = {
displayName = "Torjus";
groups = [ "admins" "users" "ssh-users" ];
};
# Regular users (persons) are managed imperatively via kanidm CLI
};
};
@@ -46,7 +44,7 @@
extraDomainNames = [ "${config.networking.hostName}.home.2rjus.net" ];
};
# Vault secret for idm_admin password
# Vault secret for idm_admin password (used for provisioning)
vault.secrets.kanidm-idm-admin = {
secretPath = "kanidm/idm-admin-password";
extractKey = "password";

View File

@@ -4,13 +4,16 @@
./acme.nix
./autoupgrade.nix
./homelab-deploy.nix
./kanidm-client.nix
./monitoring
./motd.nix
./packages.nix
./nix.nix
./pipe-to-loki.nix
./root-user.nix
./pki/root-ca.nix
./sshd.nix
./vault-secrets.nix
./zram.nix
];
}

42
system/kanidm-client.nix Normal file
View File

@@ -0,0 +1,42 @@
{ lib, config, pkgs, ... }:
let
cfg = config.homelab.kanidm;
in
{
options.homelab.kanidm = {
enable = lib.mkEnableOption "Kanidm PAM/NSS client for central authentication";
server = lib.mkOption {
type = lib.types.str;
default = "https://auth.home.2rjus.net";
description = "URI of the Kanidm server";
};
allowedLoginGroups = lib.mkOption {
type = lib.types.listOf lib.types.str;
default = [ "ssh-users" ];
description = "Groups allowed to log in via PAM";
};
};
config = lib.mkIf cfg.enable {
services.kanidm = {
package = pkgs.kanidm_1_8;
enablePam = true;
clientSettings = {
uri = cfg.server;
};
unixSettings = {
pam_allowed_login_groups = cfg.allowedLoginGroups;
# Use short names (torjus) instead of SPN format (torjus@home.2rjus.net)
# This prevents "PAM user mismatch" errors with SSH
uid_attr_map = "name";
gid_attr_map = "name";
# Create symlink /home/torjus -> /home/torjus@home.2rjus.net
home_alias = "name";
};
};
};
}

140
system/pipe-to-loki.nix Normal file
View File

@@ -0,0 +1,140 @@
{
config,
pkgs,
lib,
...
}:
let
pipe-to-loki = pkgs.writeShellApplication {
name = "pipe-to-loki";
runtimeInputs = with pkgs; [
curl
jq
util-linux
coreutils
];
text = ''
set -euo pipefail
LOKI_URL="http://monitoring01.home.2rjus.net:3100/loki/api/v1/push"
HOSTNAME=$(hostname)
SESSION_ID=""
RECORD_MODE=false
usage() {
echo "Usage: pipe-to-loki [--id ID] [--record]"
echo ""
echo "Send command output or interactive sessions to Loki."
echo ""
echo "Options:"
echo " --id ID Set custom session ID (default: auto-generated)"
echo " --record Start interactive recording session"
echo ""
echo "Examples:"
echo " command | pipe-to-loki # Pipe command output"
echo " command | pipe-to-loki --id foo # Pipe with custom ID"
echo " pipe-to-loki --record # Start recording session"
exit 1
}
generate_id() {
local random_chars
random_chars=$(head -c 2 /dev/urandom | od -An -tx1 | tr -d ' \n')
echo "''${HOSTNAME}-$(date +%s)-''${random_chars}"
}
send_to_loki() {
local content="$1"
local type="$2"
local timestamp_ns
timestamp_ns=$(date +%s%N)
local payload
payload=$(jq -n \
--arg job "pipe-to-loki" \
--arg host "$HOSTNAME" \
--arg type "$type" \
--arg id "$SESSION_ID" \
--arg ts "$timestamp_ns" \
--arg content "$content" \
'{
streams: [{
stream: {
job: $job,
host: $host,
type: $type,
id: $id
},
values: [[$ts, $content]]
}]
}')
if curl -s -X POST "$LOKI_URL" \
-H "Content-Type: application/json" \
-d "$payload" > /dev/null; then
return 0
else
echo "Error: Failed to send to Loki" >&2
return 1
fi
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--id)
SESSION_ID="$2"
shift 2
;;
--record)
RECORD_MODE=true
shift
;;
--help|-h)
usage
;;
*)
echo "Unknown option: $1" >&2
usage
;;
esac
done
# Generate ID if not provided
if [[ -z "$SESSION_ID" ]]; then
SESSION_ID=$(generate_id)
fi
if $RECORD_MODE; then
# Session recording mode
SCRIPT_FILE=$(mktemp)
trap 'rm -f "$SCRIPT_FILE"' EXIT
echo "Recording session $SESSION_ID... (exit to send)"
# Use script to record the session
script -q "$SCRIPT_FILE"
# Read the transcript and send to Loki
content=$(cat "$SCRIPT_FILE")
if send_to_loki "$content" "session"; then
echo "Session $SESSION_ID sent to Loki"
fi
else
# Pipe mode - read from stdin
if [[ -t 0 ]]; then
echo "Error: No input provided. Pipe a command or use --record for interactive mode." >&2
exit 1
fi
content=$(cat)
if send_to_loki "$content" "command"; then
echo "Sent to Loki with id: $SESSION_ID"
fi
fi
'';
};
in
{
environment.systemPackages = [ pipe-to-loki ];
}

8
system/zram.nix Normal file
View File

@@ -0,0 +1,8 @@
# Compressed swap in RAM
#
# Provides overflow memory during Nix builds and upgrades.
# Prevents OOM kills on low-memory hosts (2GB VMs).
{ ... }:
{
zramSwap.enable = true;
}