Some checks failed
Run nix flake check / flake-check (push) Failing after 11m9s
283 lines
8.6 KiB
Markdown
283 lines
8.6 KiB
Markdown
# Homelab Infrastructure
|
|
|
|
This document describes the physical and virtual infrastructure components that support the NixOS-managed servers in this repository.
|
|
|
|
## Overview
|
|
|
|
The homelab consists of several core infrastructure components:
|
|
- **Proxmox VE** - Hypervisor hosting all NixOS VMs
|
|
- **TrueNAS** - Network storage and backup target
|
|
- **Ubiquiti EdgeRouter** - Primary router and gateway
|
|
- **Mikrotik Switch** - Core network switching
|
|
|
|
All NixOS configurations in this repository run as VMs on Proxmox and rely on these underlying infrastructure components.
|
|
|
|
## Network Topology
|
|
|
|
### Subnets
|
|
|
|
VLAN numbers are based on third octet of ip address.
|
|
|
|
TODO: VLAN naming is currently inconsistent across router/switch/Proxmox configurations. Need to standardize VLAN names and update all device configs to use consistent naming.
|
|
|
|
- `10.69.8.x` - Kubernetes (no longer in use)
|
|
- `10.69.12.x` - Core services
|
|
- `10.69.13.x` - NixOS VMs and core services
|
|
- `10.69.30.x` - Client network 1
|
|
- `10.69.31.x` - Clients network 2
|
|
- `10.69.99.x` - Management network
|
|
|
|
### Core Network Services
|
|
|
|
- **Gateway**: Web UI exposed on 10.69.10.1
|
|
- **DNS**: ns1 (10.69.13.5), ns2 (10.69.13.6)
|
|
- **Primary DNS Domain**: `home.2rjus.net`
|
|
|
|
## Hardware Components
|
|
|
|
### Proxmox Hypervisor
|
|
|
|
**Purpose**: Hosts all NixOS VMs defined in this repository
|
|
|
|
**Hardware**:
|
|
- CPU: AMD Ryzen 9 3900X 12-Core Processor
|
|
- RAM: 96GB (94Gi)
|
|
- Storage: 1TB NVMe SSD (nvme0n1)
|
|
|
|
**Management**:
|
|
- Web UI: `https://pve1.home.2rjus.net:8006`
|
|
- Cluster: Standalone
|
|
- Version: Proxmox VE 8.4.16 (kernel 6.8.12-18-pve)
|
|
|
|
**VM Provisioning**:
|
|
- Template VM: ID 9000 (built from `hosts/template2`)
|
|
- See `/terraform` directory for automated VM deployment using OpenTofu
|
|
|
|
**Storage**:
|
|
- ZFS pool: `rpool` on NVMe partition (nvme0n1p3)
|
|
- Total capacity: ~900GB (232GB used, 667GB available)
|
|
- Configuration: Single disk (no RAID)
|
|
- Scrub status: Last scrub completed successfully with 0 errors
|
|
|
|
**Networking**:
|
|
- Management interface: `vmbr0` - 10.69.12.75/24 (VLAN 12 - Core services)
|
|
- Physical interface: `enp9s0` (primary), `enp4s0` (unused)
|
|
- VM bridges:
|
|
- `vmbr0` - Main bridge (bridged to enp9s0)
|
|
- `vmbr0v8` - VLAN 8 (Kubernetes - deprecated)
|
|
- `vmbr0v13` - VLAN 13 (NixOS VMs and core services)
|
|
|
|
### TrueNAS
|
|
|
|
**Purpose**: Network storage, backup target, media storage
|
|
|
|
**Hardware**:
|
|
- Model: Custom build
|
|
- CPU: AMD Ryzen 5 5600G with Radeon Graphics
|
|
- RAM: 32GB (31.2 GiB)
|
|
- Disks:
|
|
- 2x Kingston SA400S37 240GB SSD (boot pool, mirrored)
|
|
- 2x Seagate ST16000NE000 16TB HDD (hdd-pool mirror-0)
|
|
- 2x WD WD80EFBX 8TB HDD (hdd-pool mirror-1)
|
|
- 2x Seagate ST8000VN004 8TB HDD (hdd-pool mirror-2)
|
|
- 1x NVMe 2TB (nvme-pool, no redundancy)
|
|
|
|
**Management**:
|
|
- Web UI: `https://nas.home.2rjus.net` (10.69.12.50)
|
|
- Hostname: `nas.home.2rjus.net`
|
|
- Version: TrueNAS-13.0-U6.1 (Core)
|
|
|
|
**Networking**:
|
|
- Primary interface: `mlxen0` - 10GbE (10Gbase-CX4) connected to sw1
|
|
- IP: 10.69.12.50/24 (VLAN 12 - Core services)
|
|
|
|
**ZFS Pools**:
|
|
- `boot-pool`: 206GB (mirrored SSDs) - 4% used
|
|
- Mirror of 2x Kingston 240GB SSDs
|
|
- Last scrub: No errors
|
|
- `hdd-pool`: 29.1TB total (3-way mirror, 28.4TB used, 658GB free) - 97% capacity
|
|
- mirror-0: 2x 16TB Seagate ST16000NE000
|
|
- mirror-1: 2x 8TB WD WD80EFBX
|
|
- mirror-2: 2x 8TB Seagate ST8000VN004
|
|
- Last scrub: No errors
|
|
- `nvme-pool`: 1.81TB (single NVMe, 70.4GB used, 1.74TB free) - 3% capacity
|
|
- Single NVMe drive, no redundancy
|
|
- Last scrub: No errors
|
|
|
|
**NFS Exports**:
|
|
- `/mnt/hdd-pool/media` - Media storage (exported to 10.69.0.0/16, used by Jellyfin)
|
|
- `/mnt/hdd-pool/virt/nfs-iso` - ISO storage for Proxmox
|
|
- `/mnt/hdd-pool/virt/kube-prod-pvc` - Kubernetes storage (deprecated)
|
|
|
|
**Jails**:
|
|
TrueNAS runs several FreeBSD jails for media management:
|
|
- nzbget - Usenet downloader
|
|
- restic-rest - Restic REST server for backups
|
|
- radarr - Movie management
|
|
- sonarr - TV show management
|
|
|
|
### Ubiquiti EdgeRouter
|
|
|
|
**Purpose**: Primary router, gateway, firewall, inter-VLAN routing
|
|
|
|
**Model**: EdgeRouter X 5-Port
|
|
|
|
**Hardware**:
|
|
- Serial: F09FC20E1A4C
|
|
|
|
**Management**:
|
|
- SSH: `ssh ubnt@10.69.10.1`
|
|
- Web UI: `https://10.69.10.1`
|
|
- Version: EdgeOS v2.0.9-hotfix.6 (build 5574651, 12/30/22)
|
|
|
|
**WAN Connection**:
|
|
- Interface: eth0
|
|
- Public IP: 84.213.73.123/20
|
|
- Gateway: 84.213.64.1
|
|
|
|
**Interface Layout**:
|
|
- **eth0**: WAN (public IP)
|
|
- **eth1**: 10.69.31.1/24 - Clients network 2
|
|
- **eth2**: Unused (down)
|
|
- **eth3**: 10.69.30.1/24 - Client network 1
|
|
- **eth4**: Trunk port to Mikrotik switch (carries all VLANs)
|
|
- eth4.8: 10.69.8.1/24 - K8S (deprecated)
|
|
- eth4.10: 10.69.10.1/24 - TRUSTED (management access)
|
|
- eth4.12: 10.69.12.1/24 - SERVER (Proxmox, TrueNAS, core services)
|
|
- eth4.13: 10.69.13.1/24 - SVC (NixOS VMs)
|
|
- eth4.21: 10.69.21.1/24 - CLIENTS
|
|
- eth4.22: 10.69.22.1/24 - WLAN (wireless clients)
|
|
- eth4.23: 10.69.23.1/24 - IOT
|
|
- eth4.99: 10.69.99.1/24 - MGMT (device management)
|
|
|
|
**Routing**:
|
|
- Default route: 0.0.0.0/0 via 84.213.64.1 (WAN gateway)
|
|
- Static route: 192.168.100.0/24 via eth0
|
|
- All internal VLANs directly connected
|
|
|
|
**DHCP Servers**:
|
|
Active DHCP pools on all networks:
|
|
- dhcp-8: VLAN 8 (K8S) - 91 addresses
|
|
- dhcp-12: VLAN 12 (SERVER) - 51 addresses
|
|
- dhcp-13: VLAN 13 (SVC) - 41 addresses
|
|
- dhcp-21: VLAN 21 (CLIENTS) - 141 addresses
|
|
- dhcp-22: VLAN 22 (WLAN) - 101 addresses
|
|
- dhcp-23: VLAN 23 (IOT) - 191 addresses
|
|
- dhcp-30: eth3 (Client network 1) - 101 addresses
|
|
- dhcp-31: eth1 (Clients network 2) - 21 addresses
|
|
- dhcp-mgmt: VLAN 99 (MGMT) - 51 addresses
|
|
|
|
**NAT/Firewall**:
|
|
- Masquerading on WAN interface (eth0)
|
|
|
|
### Mikrotik Switch
|
|
|
|
**Purpose**: Core Layer 2/3 switching
|
|
|
|
**Model**: MikroTik CRS326-24G-2S+ (24x 1GbE + 2x 10GbE SFP+)
|
|
|
|
**Hardware**:
|
|
- CPU: ARMv7 @ 800MHz
|
|
- RAM: 512MB
|
|
- Uptime: 21+ weeks
|
|
|
|
**Management**:
|
|
- Hostname: `sw1.home.2rjus.net`
|
|
- SSH access: `ssh admin@sw1.home.2rjus.net` (using gunter SSH key)
|
|
- Management IP: 10.69.99.2/24 (VLAN 99)
|
|
- Version: RouterOS 6.47.10 (long-term)
|
|
|
|
**VLANs**:
|
|
- VLAN 8: Kubernetes (deprecated)
|
|
- VLAN 12: SERVERS - Core services subnet
|
|
- VLAN 13: SVC - Services subnet
|
|
- VLAN 21: CLIENTS
|
|
- VLAN 22: WLAN - Wireless network
|
|
- VLAN 23: IOT
|
|
- VLAN 99: MGMT - Management network
|
|
|
|
**Port Layout** (active ports):
|
|
- **ether1**: Uplink to EdgeRouter (trunk, carries all VLANs)
|
|
- **ether11**: virt-mini1 (VLAN 12 - SERVERS)
|
|
- **ether12**: Home Assistant (VLAN 12 - SERVERS)
|
|
- **ether24**: Wireless AP (VLAN 22 - WLAN)
|
|
- **sfp-sfpplus1**: Media server/Jellyfin (VLAN 12) - 10Gbps, 7m copper DAC
|
|
- **sfp-sfpplus2**: TrueNAS (VLAN 12) - 10Gbps, 1m copper DAC
|
|
|
|
**Bridge Configuration**:
|
|
- All ports bridged to main bridge interface
|
|
- Hardware offloading enabled
|
|
- VLAN filtering enabled on bridge
|
|
|
|
## Backup & Disaster Recovery
|
|
|
|
### Backup Strategy
|
|
|
|
**NixOS VMs**:
|
|
- Declarative configurations in this git repository
|
|
- Secrets: SOPS-encrypted, backed up with repository
|
|
- State/data: Some hosts are backed up to nas host, but this should be improved and expanded to more hosts.
|
|
|
|
**Proxmox**:
|
|
- VM backups: Not currently implemented
|
|
|
|
**Critical Credentials**:
|
|
|
|
TODO: Document this
|
|
|
|
- OpenBao root token and unseal keys: _[offline secure storage location]_
|
|
- Proxmox root password: _[secure storage]_
|
|
- TrueNAS admin password: _[secure storage]_
|
|
- Router admin credentials: _[secure storage]_
|
|
|
|
### Disaster Recovery Procedures
|
|
|
|
**Total Infrastructure Loss**:
|
|
1. Restore Proxmox from installation media
|
|
2. Restore TrueNAS from installation media, import ZFS pools
|
|
3. Restore network configuration on EdgeRouter and Mikrotik
|
|
4. Rebuild NixOS VMs from this repository using Proxmox template
|
|
5. Restore stateful data from TrueNAS backups
|
|
6. Re-initialize OpenBao and restore from backup if needed
|
|
|
|
**Individual VM Loss**:
|
|
1. Deploy new VM from template using OpenTofu (`terraform/`)
|
|
2. Run `nixos-rebuild` with appropriate flake configuration
|
|
3. Restore any stateful data from backups
|
|
4. For vault01: follow re-provisioning steps in `docs/vault/auto-unseal.md`
|
|
|
|
**Network Device Failure**:
|
|
- EdgeRouter: _[config backup location, restoration procedure]_
|
|
- Mikrotik: _[config backup location, restoration procedure]_
|
|
|
|
## Future Additions
|
|
|
|
- Additional Proxmox nodes for clustering
|
|
- Backup Proxmox Backup Server
|
|
- Additional TrueNAS for replication
|
|
|
|
## Maintenance Notes
|
|
|
|
### Proxmox Updates
|
|
|
|
- Update schedule: manual
|
|
- Pre-update checklist: yolo
|
|
|
|
### TrueNAS Updates
|
|
|
|
- Update schedule: manual
|
|
|
|
### Network Device Updates
|
|
|
|
- EdgeRouter: manual
|
|
- Mikrotik: manual
|
|
|
|
## Monitoring
|
|
|
|
**Infrastructure Monitoring**:
|
|
|
|
TODO: Improve monitoring for physical hosts (proxmox, nas)
|
|
TODO: Improve monitoring for networking equipment
|
|
|
|
All NixOS VMs ship metrics to monitoring01 via node-exporter and logs via Promtail. See `/services/monitoring/` for the observability stack configuration.
|