Based on security review findings, covering SSH hardening, firewall enablement, log transport TLS, security alerting, and secrets management. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.6 KiB
Security Hardening Plan
Overview
Address security gaps identified in infrastructure review. Focus areas: SSH hardening, network security, logging improvements, and secrets management.
Current State
- SSH allows password auth and unrestricted root login (
system/sshd.nix) - Firewall disabled on all hosts (
networking.firewall.enable = false) - Promtail ships logs over HTTP to Loki
- Loki has no authentication (
auth_enabled = false) - AppRole secret-IDs never expire (
secret_id_ttl = 0) - Vault TLS verification disabled by default (
skipTlsVerify = true) - Audit logging exists (
common/ssh-audit.nix) but not applied globally - Alert rules focus on availability, no security event detection
Priority Matrix
| Issue | Severity | Effort | Priority |
|---|---|---|---|
| SSH password auth | High | Low | P1 |
| Firewall disabled | High | Medium | P1 |
| Promtail HTTP (no TLS) | High | Medium | P2 |
| No security alerting | Medium | Low | P2 |
| Audit logging not global | Low | Low | P2 |
| Loki no auth | Medium | Medium | P3 |
| Secret-ID TTL | Medium | Medium | P3 |
| Vault skipTlsVerify | Medium | Low | P3 |
Phase 1: Quick Wins (P1)
1.1 SSH Hardening
Edit system/sshd.nix:
services.openssh = {
enable = true;
settings = {
PermitRootLogin = "prohibit-password"; # Key-only root login
PasswordAuthentication = false;
KbdInteractiveAuthentication = false;
};
};
Prerequisite: Verify all hosts have SSH keys deployed for root.
1.2 Enable Firewall
Create system/firewall.nix with default deny policy:
{ ... }: {
networking.firewall.enable = true;
# Use openssh's built-in firewall integration
services.openssh.openFirewall = true;
}
Useful firewall options:
| Option | Description |
|---|---|
networking.firewall.trustedInterfaces |
Accept all traffic from these interfaces (e.g., [ "lo" ]) |
networking.firewall.interfaces.<name>.allowedTCPPorts |
Per-interface port rules |
networking.firewall.extraInputRules |
Custom nftables rules (for complex filtering) |
Network range restrictions: Consider restricting SSH to the infrastructure subnet (10.69.13.0/24) using extraInputRules for defense in depth. However, this adds complexity and may not be necessary given the trusted network model.
Per-Interface Rules (http-proxy WireGuard)
The http-proxy host has a WireGuard interface (wg0) that may need different rules than the LAN interface. Use networking.firewall.interfaces to apply per-interface policies:
# Example: http-proxy with different rules per interface
networking.firewall = {
enable = true;
# Default: only SSH (via openFirewall)
allowedTCPPorts = [ ];
# LAN interface: allow HTTP/HTTPS
interfaces.ens18 = {
allowedTCPPorts = [ 80 443 ];
};
# WireGuard interface: restrict to specific services or trust fully
interfaces.wg0 = {
allowedTCPPorts = [ 80 443 ];
# Or use trustedInterfaces = [ "wg0" ] if fully trusted
};
};
TODO: Investigate current WireGuard usage on http-proxy to determine appropriate rules.
Then per-host, open required ports:
| Host | Additional Ports |
|---|---|
| ns1/ns2 | 53 (TCP/UDP) |
| vault01 | 8200 |
| monitoring01 | 3100, 9090, 3000, 9093 |
| http-proxy | 80, 443 |
| nats1 | 4222 |
| ha1 | 1883, 8123 |
| jelly01 | 8096 |
| nix-cache01 | 5000 |
Phase 2: Logging & Detection (P2)
2.1 Enable TLS for Promtail → Loki
Update system/monitoring/logs.nix:
clients = [{
url = "https://monitoring01.home.2rjus.net:3100/loki/api/v1/push";
tls_config = {
ca_file = "/etc/ssl/certs/homelab-root-ca.pem";
};
}];
Requires:
- Configure Loki with TLS certificate (use internal ACME)
- Ensure all hosts trust root CA (already done via
system/pki/root-ca.nix)
2.2 Security Alert Rules
Add to services/monitoring/rules.yml:
- name: security_rules
rules:
- alert: ssh_auth_failures
expr: increase(node_logind_sessions_total[5m]) > 20
for: 0m
labels:
severity: warning
annotations:
summary: "Unusual login activity on {{ $labels.instance }}"
- alert: vault_secret_fetch_failure
expr: increase(vault_secret_failures[5m]) > 5
for: 0m
labels:
severity: warning
annotations:
summary: "Vault secret fetch failures on {{ $labels.instance }}"
Also add Loki-based alerts for:
- Failed SSH attempts:
{job="systemd-journal"} |= "Failed password" - sudo usage:
{job="systemd-journal"} |= "sudo"
2.3 Global Audit Logging
Add ./common/ssh-audit.nix import to system/default.nix:
imports = [
# ... existing imports
../common/ssh-audit.nix
];
Phase 3: Defense in Depth (P3)
3.1 Loki Authentication
Options:
- Basic auth via reverse proxy - Put Loki behind Caddy with auth
- Loki multi-tenancy - Enable
auth_enabled = trueand use tenant IDs - Network isolation - Bind Loki only to localhost, expose via authenticated proxy
Recommendation: Option 1 (reverse proxy) is simplest for homelab.
3.2 AppRole Secret Rotation
Update terraform/vault/approle.tf:
secret_id_ttl = 2592000 # 30 days
Add documentation for manual rotation procedure or implement automated rotation via the existing restartTrigger mechanism in vault-secrets.nix.
3.3 Enable Vault TLS Verification
Change default in system/vault-secrets.nix:
skipTlsVerify = mkOption {
type = types.bool;
default = false; # Changed from true
};
Prerequisite: Verify all hosts trust the internal CA that signed the Vault certificate.
Implementation Order
- Test on test-tier first - Deploy phases 1-2 to testvm01/02/03
- Validate SSH access - Ensure key-based login works before disabling passwords
- Document firewall ports - Create reference of ports per host before enabling
- Phase prod rollout - Deploy to prod hosts one at a time, verify each
Open Questions
- Do all hosts have SSH keys configured for root access?
- Should firewall rules be per-host or use a central definition with roles?
- Should Loki authentication use the existing Kanidm setup?
Resolved: Password-based SSH access for recovery is not required - most hosts have console access through Proxmox or physical access, which provides an out-of-band recovery path if SSH keys fail.
Notes
- Firewall changes are the highest risk - test thoroughly on test-tier
- SSH hardening must not lock out access - verify keys first
- Consider creating a "break glass" procedure for emergency access if keys fail