# NixOS Hypervisor ## Overview Experiment with running a NixOS-based hypervisor as an alternative/complement to the current Proxmox setup. Goal is better homelab integration — declarative config, monitoring, auto-updates — while retaining the ability to run VMs with a Terraform-like workflow. ## Motivation - Proxmox works but doesn't integrate with the NixOS-managed homelab (no monitoring, no auto-updates, no vault, no declarative config) - The PN51 units (once stable) are good candidates for experimentation — test-tier, plenty of RAM (32-64GB), 8C/16T - Long-term: could reduce reliance on Proxmox or provide a secondary hypervisor pool - **VM migration**: Currently all VMs (including both nameservers) run on a single Proxmox host. Being able to migrate VMs between hypervisors would allow rebooting a host for kernel updates without downtime for critical services like DNS. ## Hardware Candidates | | pn01 | pn02 | |---|---|---| | **CPU** | Ryzen 7 5700U (8C/16T) | Ryzen 7 5700U (8C/16T) | | **RAM** | 64GB (2x32GB) | 32GB (1x32GB, second slot available) | | **Storage** | 1TB NVMe | 1TB SATA SSD (NVMe planned) | | **Status** | Stability testing | Stability testing | ## Options ### Option 1: Incus Fork of LXD (after Canonical made LXD proprietary). Supports both containers (LXC) and VMs (QEMU/KVM). **NixOS integration:** - `virtualisation.incus.enable` module in nixpkgs - Manages storage pools, networks, and instances - REST API for automation - CLI tool (`incus`) for management **Terraform integration:** - `lxd` provider works with Incus (API-compatible) - Dedicated `incus` Terraform provider also exists - Can define VMs/containers in OpenTofu, similar to current Proxmox workflow **Migration:** - Built-in live and offline migration via `incus move --target ` - Clustering makes hosts aware of each other — migration is a first-class operation - Shared storage (NFS, Ceph) or Incus can transfer storage during migration - Stateful stop-and-move also supported for offline migration **Pros:** - Supports both containers and VMs - REST API + CLI for automation - Built-in clustering and migration — closest to Proxmox experience - Good NixOS module support - Image-based workflow (can build NixOS images and import) - Active development and community **Cons:** - Another abstraction layer on top of QEMU/KVM - Less mature Terraform provider than libvirt - Container networking can be complex - NixOS guests in Incus VMs need some setup ### Option 2: libvirt/QEMU Standard Linux virtualization stack. Thin wrapper around QEMU/KVM. **NixOS integration:** - `virtualisation.libvirtd.enable` module in nixpkgs - Mature and well-tested - virsh CLI for management **Terraform integration:** - `dmacvicar/libvirt` provider — mature, well-maintained - Supports cloud-init, volume management, network config - Very similar workflow to current Proxmox+OpenTofu setup - Can reuse cloud-init patterns from existing `terraform/` config **Migration:** - Supports live and offline migration via `virsh migrate` - Requires shared storage (NFS, Ceph, or similar) for live migration - Requires matching CPU models between hosts (or CPU model masking) - Works but is manual — no cluster awareness, must specify target URI - No built-in orchestration for multi-host scenarios **Pros:** - Closest to current Proxmox+Terraform workflow - Most mature Terraform provider - Minimal abstraction — direct QEMU/KVM management - Well-understood, massive community - Cloud-init works identically to Proxmox workflow - Can reuse existing template-building patterns **Cons:** - VMs only (no containers without adding LXC separately) - No built-in REST API (would need to expose libvirt socket) - No web UI without adding cockpit or virt-manager - Migration works but requires manual setup — no clustering, no orchestration - Less feature-rich than Incus for multi-host scenarios ### Option 3: microvm.nix NixOS-native microVM framework. VMs defined as NixOS modules in the host's flake. **NixOS integration:** - VMs are NixOS configurations in the same flake - Supports multiple backends: cloud-hypervisor, QEMU, firecracker, kvmtool - Lightweight — shares host's nix store with guests via virtiofs - Declarative network, storage, and resource allocation **Terraform integration:** - None — everything is defined in Nix - Fundamentally different workflow from current Proxmox+Terraform approach **Pros:** - Most NixOS-native approach - VMs defined right alongside host configs in this repo - Very lightweight — fast boot, minimal overhead - Shares nix store with host (no duplicate packages) - No cloud-init needed — guest config is part of the flake **Migration:** - No migration support — VMs are tied to the host's NixOS config - Moving a VM means rebuilding it on another host **Cons:** - Very niche, smaller community - Different mental model from current workflow - Only NixOS guests (no Ubuntu, FreeBSD, etc.) - No Terraform integration - No migration support - Less isolation than full QEMU VMs - Would need to learn a new deployment pattern ## Comparison | Criteria | Incus | libvirt | microvm.nix | |----------|-------|---------|-------------| | **Workflow similarity** | Medium | High | Low | | **Terraform support** | Yes (lxd/incus provider) | Yes (mature provider) | No | | **NixOS module** | Yes | Yes | Yes | | **Containers + VMs** | Both | VMs only | VMs only | | **Non-NixOS guests** | Yes | Yes | No | | **Live migration** | Built-in (first-class) | Yes (manual setup) | No | | **Offline migration** | Built-in | Yes (manual setup) | No (rebuild) | | **Clustering** | Built-in | Manual | No | | **Learning curve** | Medium | Low | Medium | | **Community/maturity** | Growing | Very mature | Niche | | **Overhead** | Low | Minimal | Minimal | ## Recommendation Start with **Incus**. Migration and clustering are key requirements: - Built-in clustering makes two PN51s a proper hypervisor pool - Live and offline migration are first-class operations, similar to Proxmox - Can move VMs between hosts for maintenance (kernel updates, hardware work) without downtime - Supports both containers and VMs — flexibility for future use - Terraform provider exists (less mature than libvirt's, but functional) - REST API enables automation beyond what Terraform covers libvirt could achieve similar results but requires significantly more manual setup for migration and has no clustering awareness. For a two-node setup where migration is a priority, Incus provides much more out of the box. **microvm.nix** is off the table given the migration requirement. ## Implementation Plan ### Phase 1: Single-Node Setup (on one PN51) 1. Enable `virtualisation.incus` on pn01 (or whichever is stable) 2. Initialize Incus (`incus admin init`) — configure storage pool (local NVMe) and network bridge 3. Configure bridge networking for VM traffic on VLAN 12 4. Build a NixOS VM image and import it into Incus 5. Create a test VM manually with `incus launch` to validate the setup ### Phase 2: Two-Node Cluster (PN51s only) 1. Enable Incus on the second PN51 2. Form a cluster between both nodes 3. Configure shared storage (NFS from NAS, or Ceph if warranted) 4. Test offline migration: `incus move --target ` 5. Test live migration with shared storage 6. CPU compatibility is not an issue here — both nodes have identical Ryzen 7 5700U CPUs ### Phase 3: Terraform Integration 1. Add Incus Terraform provider to `terraform/` 2. Define a test VM in OpenTofu (cloud-init, static IP, vault provisioning) 3. Verify the full pipeline: tofu apply -> VM boots -> cloud-init -> vault credentials -> NixOS rebuild 4. Compare workflow with existing Proxmox pipeline ### Phase 4: Evaluate and Expand - Is the workflow comparable to Proxmox? - Migration reliability — does live migration work cleanly? - Performance overhead acceptable on Ryzen 5700U? - Worth migrating some test-tier VMs from Proxmox? - Could ns1/ns2 run on separate Incus nodes instead of the single Proxmox host? ### Phase 5: Proxmox Replacement (optional) If Incus works well on the PN51s, consider replacing Proxmox entirely for a three-node cluster. **CPU compatibility for mixed cluster:** | Node | CPU | Architecture | x86-64-v3 | |------|-----|-------------|-----------| | Proxmox host | AMD Ryzen 9 3900X (12C/24T) | Zen 2 | Yes | | pn01 | AMD Ryzen 7 5700U (8C/16T) | Zen 3 | Yes | | pn02 | AMD Ryzen 7 5700U (8C/16T) | Zen 3 | Yes | All three CPUs are AMD and support `x86-64-v3`. The 3900X (Zen 2) is the oldest, so it defines the feature ceiling — but `x86-64-v3` is well within its capabilities. VMs configured with `x86-64-v3` can migrate freely between all three nodes. Being all-AMD also avoids the trickier Intel/AMD cross-vendor migration edge cases (different CPUID layouts, virtualization extensions). The 3900X (12C/24T) would be the most powerful node, making it the natural home for heavier workloads, with the PN51s (8C/16T each) handling lighter VMs or serving as migration targets during maintenance. Steps: 1. Install NixOS + Incus on the Proxmox host (or a replacement machine) 2. Join it to the existing Incus cluster with `x86-64-v3` CPU baseline 3. Migrate VMs from Proxmox to the Incus cluster 4. Decommission Proxmox ## Prerequisites - [ ] PN51 units pass stability testing (see `pn51-stability.md`) - [ ] Decide which unit to use first (pn01 preferred — 64GB RAM, NVMe, currently more stable) ## Open Questions - How to handle VM storage? Local NVMe, NFS from NAS, or Ceph between the two nodes? - Network topology: bridge on VLAN 12, or trunk multiple VLANs to the PN51? - Should VMs be on the same VLAN as the hypervisor host, or separate? - Incus clustering with only two nodes — any quorum issues? Three nodes (with Proxmox replacement) would solve this - How to handle NixOS guest images? Build with nixos-generators, or use Incus image builder? - ~~What CPU does the current Proxmox host have?~~ AMD Ryzen 9 3900X (Zen 2) — `x86-64-v3` confirmed, all-AMD cluster - If replacing Proxmox: migrate VMs first, or fresh start and rebuild?