Files

Run nix flake check / flake-check (push) Failing after 2s

Details

Based on security review findings, covering SSH hardening, firewall
enablement, log transport TLS, security alerting, and secrets management.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-08 05:26:15 +01:00

6.6 KiB

Raw Blame History

Security Hardening Plan

Overview

Address security gaps identified in infrastructure review. Focus areas: SSH hardening, network security, logging improvements, and secrets management.

Current State

SSH allows password auth and unrestricted root login (system/sshd.nix)
Firewall disabled on all hosts (networking.firewall.enable = false)
Promtail ships logs over HTTP to Loki
Loki has no authentication (auth_enabled = false)
AppRole secret-IDs never expire (secret_id_ttl = 0)
Vault TLS verification disabled by default (skipTlsVerify = true)
Audit logging exists (common/ssh-audit.nix) but not applied globally
Alert rules focus on availability, no security event detection

Priority Matrix

Issue	Severity	Effort	Priority
SSH password auth	High	Low	P1
Firewall disabled	High	Medium	P1
Promtail HTTP (no TLS)	High	Medium	P2
No security alerting	Medium	Low	P2
Audit logging not global	Low	Low	P2
Loki no auth	Medium	Medium	P3
Secret-ID TTL	Medium	Medium	P3
Vault skipTlsVerify	Medium	Low	P3

Phase 1: Quick Wins (P1)

1.1 SSH Hardening

Edit system/sshd.nix:

services.openssh = {
  enable = true;
  settings = {
    PermitRootLogin = "prohibit-password";  # Key-only root login
    PasswordAuthentication = false;
    KbdInteractiveAuthentication = false;
  };
};

Prerequisite: Verify all hosts have SSH keys deployed for root.

1.2 Enable Firewall

Create system/firewall.nix with default deny policy:

{ ... }: {
  networking.firewall.enable = true;

  # Use openssh's built-in firewall integration
  services.openssh.openFirewall = true;
}

Useful firewall options:

Option	Description
`networking.firewall.trustedInterfaces`	Accept all traffic from these interfaces (e.g., `[ "lo" ]`)
`networking.firewall.interfaces.<name>.allowedTCPPorts`	Per-interface port rules
`networking.firewall.extraInputRules`	Custom nftables rules (for complex filtering)

Network range restrictions: Consider restricting SSH to the infrastructure subnet (10.69.13.0/24) using extraInputRules for defense in depth. However, this adds complexity and may not be necessary given the trusted network model.

Per-Interface Rules (http-proxy WireGuard)

The http-proxy host has a WireGuard interface (wg0) that may need different rules than the LAN interface. Use networking.firewall.interfaces to apply per-interface policies:

# Example: http-proxy with different rules per interface
networking.firewall = {
  enable = true;

  # Default: only SSH (via openFirewall)
  allowedTCPPorts = [ ];

  # LAN interface: allow HTTP/HTTPS
  interfaces.ens18 = {
    allowedTCPPorts = [ 80 443 ];
  };

  # WireGuard interface: restrict to specific services or trust fully
  interfaces.wg0 = {
    allowedTCPPorts = [ 80 443 ];
    # Or use trustedInterfaces = [ "wg0" ] if fully trusted
  };
};

TODO: Investigate current WireGuard usage on http-proxy to determine appropriate rules.

Then per-host, open required ports:

Host	Additional Ports
ns1/ns2	53 (TCP/UDP)
vault01	8200
monitoring01	3100, 9090, 3000, 9093
http-proxy	80, 443
nats1	4222
ha1	1883, 8123
jelly01	8096
nix-cache01	5000

Phase 2: Logging & Detection (P2)

2.1 Enable TLS for Promtail → Loki

Update system/monitoring/logs.nix:

clients = [{
  url = "https://monitoring01.home.2rjus.net:3100/loki/api/v1/push";
  tls_config = {
    ca_file = "/etc/ssl/certs/homelab-root-ca.pem";
  };
}];

Requires:

Configure Loki with TLS certificate (use internal ACME)
Ensure all hosts trust root CA (already done via system/pki/root-ca.nix)

2.2 Security Alert Rules

Add to services/monitoring/rules.yml:

- name: security_rules
  rules:
    - alert: ssh_auth_failures
      expr: increase(node_logind_sessions_total[5m]) > 20
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: "Unusual login activity on {{ $labels.instance }}"

    - alert: vault_secret_fetch_failure
      expr: increase(vault_secret_failures[5m]) > 5
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: "Vault secret fetch failures on {{ $labels.instance }}"

Also add Loki-based alerts for:

Failed SSH attempts: {job="systemd-journal"} |= "Failed password"
sudo usage: {job="systemd-journal"} |= "sudo"

2.3 Global Audit Logging

Add ./common/ssh-audit.nix import to system/default.nix:

imports = [
  # ... existing imports
  ../common/ssh-audit.nix
];

Phase 3: Defense in Depth (P3)

3.1 Loki Authentication

Options:

Basic auth via reverse proxy - Put Loki behind Caddy with auth
Loki multi-tenancy - Enable auth_enabled = true and use tenant IDs
Network isolation - Bind Loki only to localhost, expose via authenticated proxy

Recommendation: Option 1 (reverse proxy) is simplest for homelab.

3.2 AppRole Secret Rotation

Update terraform/vault/approle.tf:

secret_id_ttl  = 2592000  # 30 days

Add documentation for manual rotation procedure or implement automated rotation via the existing restartTrigger mechanism in vault-secrets.nix.

3.3 Enable Vault TLS Verification

Change default in system/vault-secrets.nix:

skipTlsVerify = mkOption {
  type = types.bool;
  default = false;  # Changed from true
};

Prerequisite: Verify all hosts trust the internal CA that signed the Vault certificate.

Implementation Order

Test on test-tier first - Deploy phases 1-2 to testvm01/02/03
Validate SSH access - Ensure key-based login works before disabling passwords
Document firewall ports - Create reference of ports per host before enabling
Phase prod rollout - Deploy to prod hosts one at a time, verify each

Open Questions

Do all hosts have SSH keys configured for root access?
Should firewall rules be per-host or use a central definition with roles?
Should Loki authentication use the existing Kanidm setup?

Resolved: Password-based SSH access for recovery is not required - most hosts have console access through Proxmox or physical access, which provides an out-of-band recovery path if SSH keys fail.

Notes

Firewall changes are the highest risk - test thoroughly on test-tier
SSH hardening must not lock out access - verify keys first
Consider creating a "break glass" procedure for emergency access if keys fail

6.6 KiB Raw Blame History