# Security Hardening Plan ## Overview Address security gaps identified in infrastructure review. Focus areas: SSH hardening, network security, logging improvements, and secrets management. ## Current State - SSH allows password auth and unrestricted root login (`system/sshd.nix`) - Firewall disabled on all hosts (`networking.firewall.enable = false`) - Promtail ships logs over HTTP to Loki - Loki has no authentication (`auth_enabled = false`) - AppRole secret-IDs never expire (`secret_id_ttl = 0`) - Vault TLS verification disabled by default (`skipTlsVerify = true`) - Audit logging exists (`common/ssh-audit.nix`) but not applied globally - Alert rules focus on availability, no security event detection ## Priority Matrix | Issue | Severity | Effort | Priority | |-------|----------|--------|----------| | SSH password auth | High | Low | **P1** | | Firewall disabled | High | Medium | **P1** | | Promtail HTTP (no TLS) | High | Medium | **P2** | | No security alerting | Medium | Low | **P2** | | Audit logging not global | Low | Low | **P2** | | Loki no auth | Medium | Medium | **P3** | | Secret-ID TTL | Medium | Medium | **P3** | | Vault skipTlsVerify | Medium | Low | **P3** | ## Phase 1: Quick Wins (P1) ### 1.1 SSH Hardening Edit `system/sshd.nix`: ```nix services.openssh = { enable = true; settings = { PermitRootLogin = "prohibit-password"; # Key-only root login PasswordAuthentication = false; KbdInteractiveAuthentication = false; }; }; ``` **Prerequisite:** Verify all hosts have SSH keys deployed for root. ### 1.2 Enable Firewall Create `system/firewall.nix` with default deny policy: ```nix { ... }: { networking.firewall.enable = true; # Use openssh's built-in firewall integration services.openssh.openFirewall = true; } ``` **Useful firewall options:** | Option | Description | |--------|-------------| | `networking.firewall.trustedInterfaces` | Accept all traffic from these interfaces (e.g., `[ "lo" ]`) | | `networking.firewall.interfaces..allowedTCPPorts` | Per-interface port rules | | `networking.firewall.extraInputRules` | Custom nftables rules (for complex filtering) | **Network range restrictions:** Consider restricting SSH to the infrastructure subnet (`10.69.13.0/24`) using `extraInputRules` for defense in depth. However, this adds complexity and may not be necessary given the trusted network model. #### Per-Interface Rules (http-proxy WireGuard) The `http-proxy` host has a WireGuard interface (`wg0`) that may need different rules than the LAN interface. Use `networking.firewall.interfaces` to apply per-interface policies: ```nix # Example: http-proxy with different rules per interface networking.firewall = { enable = true; # Default: only SSH (via openFirewall) allowedTCPPorts = [ ]; # LAN interface: allow HTTP/HTTPS interfaces.ens18 = { allowedTCPPorts = [ 80 443 ]; }; # WireGuard interface: restrict to specific services or trust fully interfaces.wg0 = { allowedTCPPorts = [ 80 443 ]; # Or use trustedInterfaces = [ "wg0" ] if fully trusted }; }; ``` **TODO:** Investigate current WireGuard usage on http-proxy to determine appropriate rules. Then per-host, open required ports: | Host | Additional Ports | |------|------------------| | ns1/ns2 | 53 (TCP/UDP) | | vault01 | 8200 | | monitoring01 | 3100, 9090, 3000, 9093 | | http-proxy | 80, 443 | | nats1 | 4222 | | ha1 | 1883, 8123 | | jelly01 | 8096 | | nix-cache01 | 5000 | ## Phase 2: Logging & Detection (P2) ### 2.1 Enable TLS for Promtail → Loki Update `system/monitoring/logs.nix`: ```nix clients = [{ url = "https://monitoring01.home.2rjus.net:3100/loki/api/v1/push"; tls_config = { ca_file = "/etc/ssl/certs/homelab-root-ca.pem"; }; }]; ``` Requires: - Configure Loki with TLS certificate (use internal ACME) - Ensure all hosts trust root CA (already done via `system/pki/root-ca.nix`) ### 2.2 Security Alert Rules Add to `services/monitoring/rules.yml`: ```yaml - name: security_rules rules: - alert: ssh_auth_failures expr: increase(node_logind_sessions_total[5m]) > 20 for: 0m labels: severity: warning annotations: summary: "Unusual login activity on {{ $labels.instance }}" - alert: vault_secret_fetch_failure expr: increase(vault_secret_failures[5m]) > 5 for: 0m labels: severity: warning annotations: summary: "Vault secret fetch failures on {{ $labels.instance }}" ``` Also add Loki-based alerts for: - Failed SSH attempts: `{job="systemd-journal"} |= "Failed password"` - sudo usage: `{job="systemd-journal"} |= "sudo"` ### 2.3 Global Audit Logging Add `./common/ssh-audit.nix` import to `system/default.nix`: ```nix imports = [ # ... existing imports ../common/ssh-audit.nix ]; ``` ## Phase 3: Defense in Depth (P3) ### 3.1 Loki Authentication Options: 1. **Basic auth via reverse proxy** - Put Loki behind Caddy with auth 2. **Loki multi-tenancy** - Enable `auth_enabled = true` and use tenant IDs 3. **Network isolation** - Bind Loki only to localhost, expose via authenticated proxy Recommendation: Option 1 (reverse proxy) is simplest for homelab. ### 3.2 AppRole Secret Rotation Update `terraform/vault/approle.tf`: ```hcl secret_id_ttl = 2592000 # 30 days ``` Add documentation for manual rotation procedure or implement automated rotation via the existing `restartTrigger` mechanism in `vault-secrets.nix`. ### 3.3 Enable Vault TLS Verification Change default in `system/vault-secrets.nix`: ```nix skipTlsVerify = mkOption { type = types.bool; default = false; # Changed from true }; ``` **Prerequisite:** Verify all hosts trust the internal CA that signed the Vault certificate. ## Implementation Order 1. **Test on test-tier first** - Deploy phases 1-2 to testvm01/02/03 2. **Validate SSH access** - Ensure key-based login works before disabling passwords 3. **Document firewall ports** - Create reference of ports per host before enabling 4. **Phase prod rollout** - Deploy to prod hosts one at a time, verify each ## Open Questions - [ ] Do all hosts have SSH keys configured for root access? - [ ] Should firewall rules be per-host or use a central definition with roles? - [ ] Should Loki authentication use the existing Kanidm setup? **Resolved:** Password-based SSH access for recovery is not required - most hosts have console access through Proxmox or physical access, which provides an out-of-band recovery path if SSH keys fail. ## Notes - Firewall changes are the highest risk - test thoroughly on test-tier - SSH hardening must not lock out access - verify keys first - Consider creating a "break glass" procedure for emergency access if keys fail