From 311be282b60cb9939d52d1a9ca16037e515d8052 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torjus=20H=C3=A5kestad?= Date: Sun, 8 Feb 2026 05:26:15 +0100 Subject: [PATCH] docs: add security hardening plan Based on security review findings, covering SSH hardening, firewall enablement, log transport TLS, security alerting, and secrets management. Co-Authored-By: Claude Opus 4.5 --- docs/plans/security-hardening.md | 224 +++++++++++++++++++++++++++++++ 1 file changed, 224 insertions(+) create mode 100644 docs/plans/security-hardening.md diff --git a/docs/plans/security-hardening.md b/docs/plans/security-hardening.md new file mode 100644 index 0000000..f5a91b4 --- /dev/null +++ b/docs/plans/security-hardening.md @@ -0,0 +1,224 @@ +# Security Hardening Plan + +## Overview + +Address security gaps identified in infrastructure review. Focus areas: SSH hardening, network security, logging improvements, and secrets management. + +## Current State + +- SSH allows password auth and unrestricted root login (`system/sshd.nix`) +- Firewall disabled on all hosts (`networking.firewall.enable = false`) +- Promtail ships logs over HTTP to Loki +- Loki has no authentication (`auth_enabled = false`) +- AppRole secret-IDs never expire (`secret_id_ttl = 0`) +- Vault TLS verification disabled by default (`skipTlsVerify = true`) +- Audit logging exists (`common/ssh-audit.nix`) but not applied globally +- Alert rules focus on availability, no security event detection + +## Priority Matrix + +| Issue | Severity | Effort | Priority | +|-------|----------|--------|----------| +| SSH password auth | High | Low | **P1** | +| Firewall disabled | High | Medium | **P1** | +| Promtail HTTP (no TLS) | High | Medium | **P2** | +| No security alerting | Medium | Low | **P2** | +| Audit logging not global | Low | Low | **P2** | +| Loki no auth | Medium | Medium | **P3** | +| Secret-ID TTL | Medium | Medium | **P3** | +| Vault skipTlsVerify | Medium | Low | **P3** | + +## Phase 1: Quick Wins (P1) + +### 1.1 SSH Hardening + +Edit `system/sshd.nix`: + +```nix +services.openssh = { + enable = true; + settings = { + PermitRootLogin = "prohibit-password"; # Key-only root login + PasswordAuthentication = false; + KbdInteractiveAuthentication = false; + }; +}; +``` + +**Prerequisite:** Verify all hosts have SSH keys deployed for root. + +### 1.2 Enable Firewall + +Create `system/firewall.nix` with default deny policy: + +```nix +{ ... }: { + networking.firewall.enable = true; + + # Use openssh's built-in firewall integration + services.openssh.openFirewall = true; +} +``` + +**Useful firewall options:** + +| Option | Description | +|--------|-------------| +| `networking.firewall.trustedInterfaces` | Accept all traffic from these interfaces (e.g., `[ "lo" ]`) | +| `networking.firewall.interfaces..allowedTCPPorts` | Per-interface port rules | +| `networking.firewall.extraInputRules` | Custom nftables rules (for complex filtering) | + +**Network range restrictions:** Consider restricting SSH to the infrastructure subnet (`10.69.13.0/24`) using `extraInputRules` for defense in depth. However, this adds complexity and may not be necessary given the trusted network model. + +#### Per-Interface Rules (http-proxy WireGuard) + +The `http-proxy` host has a WireGuard interface (`wg0`) that may need different rules than the LAN interface. Use `networking.firewall.interfaces` to apply per-interface policies: + +```nix +# Example: http-proxy with different rules per interface +networking.firewall = { + enable = true; + + # Default: only SSH (via openFirewall) + allowedTCPPorts = [ ]; + + # LAN interface: allow HTTP/HTTPS + interfaces.ens18 = { + allowedTCPPorts = [ 80 443 ]; + }; + + # WireGuard interface: restrict to specific services or trust fully + interfaces.wg0 = { + allowedTCPPorts = [ 80 443 ]; + # Or use trustedInterfaces = [ "wg0" ] if fully trusted + }; +}; +``` + +**TODO:** Investigate current WireGuard usage on http-proxy to determine appropriate rules. + +Then per-host, open required ports: + +| Host | Additional Ports | +|------|------------------| +| ns1/ns2 | 53 (TCP/UDP) | +| vault01 | 8200 | +| monitoring01 | 3100, 9090, 3000, 9093 | +| http-proxy | 80, 443 | +| nats1 | 4222 | +| ha1 | 1883, 8123 | +| jelly01 | 8096 | +| nix-cache01 | 5000 | + +## Phase 2: Logging & Detection (P2) + +### 2.1 Enable TLS for Promtail → Loki + +Update `system/monitoring/logs.nix`: + +```nix +clients = [{ + url = "https://monitoring01.home.2rjus.net:3100/loki/api/v1/push"; + tls_config = { + ca_file = "/etc/ssl/certs/homelab-root-ca.pem"; + }; +}]; +``` + +Requires: +- Configure Loki with TLS certificate (use internal ACME) +- Ensure all hosts trust root CA (already done via `system/pki/root-ca.nix`) + +### 2.2 Security Alert Rules + +Add to `services/monitoring/rules.yml`: + +```yaml +- name: security_rules + rules: + - alert: ssh_auth_failures + expr: increase(node_logind_sessions_total[5m]) > 20 + for: 0m + labels: + severity: warning + annotations: + summary: "Unusual login activity on {{ $labels.instance }}" + + - alert: vault_secret_fetch_failure + expr: increase(vault_secret_failures[5m]) > 5 + for: 0m + labels: + severity: warning + annotations: + summary: "Vault secret fetch failures on {{ $labels.instance }}" +``` + +Also add Loki-based alerts for: +- Failed SSH attempts: `{job="systemd-journal"} |= "Failed password"` +- sudo usage: `{job="systemd-journal"} |= "sudo"` + +### 2.3 Global Audit Logging + +Add `./common/ssh-audit.nix` import to `system/default.nix`: + +```nix +imports = [ + # ... existing imports + ../common/ssh-audit.nix +]; +``` + +## Phase 3: Defense in Depth (P3) + +### 3.1 Loki Authentication + +Options: +1. **Basic auth via reverse proxy** - Put Loki behind Caddy with auth +2. **Loki multi-tenancy** - Enable `auth_enabled = true` and use tenant IDs +3. **Network isolation** - Bind Loki only to localhost, expose via authenticated proxy + +Recommendation: Option 1 (reverse proxy) is simplest for homelab. + +### 3.2 AppRole Secret Rotation + +Update `terraform/vault/approle.tf`: + +```hcl +secret_id_ttl = 2592000 # 30 days +``` + +Add documentation for manual rotation procedure or implement automated rotation via the existing `restartTrigger` mechanism in `vault-secrets.nix`. + +### 3.3 Enable Vault TLS Verification + +Change default in `system/vault-secrets.nix`: + +```nix +skipTlsVerify = mkOption { + type = types.bool; + default = false; # Changed from true +}; +``` + +**Prerequisite:** Verify all hosts trust the internal CA that signed the Vault certificate. + +## Implementation Order + +1. **Test on test-tier first** - Deploy phases 1-2 to testvm01/02/03 +2. **Validate SSH access** - Ensure key-based login works before disabling passwords +3. **Document firewall ports** - Create reference of ports per host before enabling +4. **Phase prod rollout** - Deploy to prod hosts one at a time, verify each + +## Open Questions + +- [ ] Do all hosts have SSH keys configured for root access? +- [ ] Should firewall rules be per-host or use a central definition with roles? +- [ ] Should Loki authentication use the existing Kanidm setup? + +**Resolved:** Password-based SSH access for recovery is not required - most hosts have console access through Proxmox or physical access, which provides an out-of-band recovery path if SSH keys fail. + +## Notes + +- Firewall changes are the highest risk - test thoroughly on test-tier +- SSH hardening must not lock out access - verify keys first +- Consider creating a "break glass" procedure for emergency access if keys fail