9.9 KiB
NixOS Hypervisor
Overview
Experiment with running a NixOS-based hypervisor as an alternative/complement to the current Proxmox setup. Goal is better homelab integration — declarative config, monitoring, auto-updates — while retaining the ability to run VMs with a Terraform-like workflow.
Motivation
- Proxmox works but doesn't integrate with the NixOS-managed homelab (no monitoring, no auto-updates, no vault, no declarative config)
- The PN51 units (once stable) are good candidates for experimentation — test-tier, plenty of RAM (32-64GB), 8C/16T
- Long-term: could reduce reliance on Proxmox or provide a secondary hypervisor pool
- VM migration: Currently all VMs (including both nameservers) run on a single Proxmox host. Being able to migrate VMs between hypervisors would allow rebooting a host for kernel updates without downtime for critical services like DNS.
Hardware Candidates
| pn01 | pn02 | |
|---|---|---|
| CPU | Ryzen 7 5700U (8C/16T) | Ryzen 7 5700U (8C/16T) |
| RAM | 64GB (2x32GB) | 32GB (1x32GB, second slot available) |
| Storage | 1TB NVMe | 1TB SATA SSD (NVMe planned) |
| Status | Stability testing | Stability testing |
Options
Option 1: Incus
Fork of LXD (after Canonical made LXD proprietary). Supports both containers (LXC) and VMs (QEMU/KVM).
NixOS integration:
virtualisation.incus.enablemodule in nixpkgs- Manages storage pools, networks, and instances
- REST API for automation
- CLI tool (
incus) for management
Terraform integration:
lxdprovider works with Incus (API-compatible)- Dedicated
incusTerraform provider also exists - Can define VMs/containers in OpenTofu, similar to current Proxmox workflow
Migration:
- Built-in live and offline migration via
incus move <instance> --target <host> - Clustering makes hosts aware of each other — migration is a first-class operation
- Shared storage (NFS, Ceph) or Incus can transfer storage during migration
- Stateful stop-and-move also supported for offline migration
Pros:
- Supports both containers and VMs
- REST API + CLI for automation
- Built-in clustering and migration — closest to Proxmox experience
- Good NixOS module support
- Image-based workflow (can build NixOS images and import)
- Active development and community
Cons:
- Another abstraction layer on top of QEMU/KVM
- Less mature Terraform provider than libvirt
- Container networking can be complex
- NixOS guests in Incus VMs need some setup
Option 2: libvirt/QEMU
Standard Linux virtualization stack. Thin wrapper around QEMU/KVM.
NixOS integration:
virtualisation.libvirtd.enablemodule in nixpkgs- Mature and well-tested
- virsh CLI for management
Terraform integration:
dmacvicar/libvirtprovider — mature, well-maintained- Supports cloud-init, volume management, network config
- Very similar workflow to current Proxmox+OpenTofu setup
- Can reuse cloud-init patterns from existing
terraform/config
Migration:
- Supports live and offline migration via
virsh migrate - Requires shared storage (NFS, Ceph, or similar) for live migration
- Requires matching CPU models between hosts (or CPU model masking)
- Works but is manual — no cluster awareness, must specify target URI
- No built-in orchestration for multi-host scenarios
Pros:
- Closest to current Proxmox+Terraform workflow
- Most mature Terraform provider
- Minimal abstraction — direct QEMU/KVM management
- Well-understood, massive community
- Cloud-init works identically to Proxmox workflow
- Can reuse existing template-building patterns
Cons:
- VMs only (no containers without adding LXC separately)
- No built-in REST API (would need to expose libvirt socket)
- No web UI without adding cockpit or virt-manager
- Migration works but requires manual setup — no clustering, no orchestration
- Less feature-rich than Incus for multi-host scenarios
Option 3: microvm.nix
NixOS-native microVM framework. VMs defined as NixOS modules in the host's flake.
NixOS integration:
- VMs are NixOS configurations in the same flake
- Supports multiple backends: cloud-hypervisor, QEMU, firecracker, kvmtool
- Lightweight — shares host's nix store with guests via virtiofs
- Declarative network, storage, and resource allocation
Terraform integration:
- None — everything is defined in Nix
- Fundamentally different workflow from current Proxmox+Terraform approach
Pros:
- Most NixOS-native approach
- VMs defined right alongside host configs in this repo
- Very lightweight — fast boot, minimal overhead
- Shares nix store with host (no duplicate packages)
- No cloud-init needed — guest config is part of the flake
Migration:
- No migration support — VMs are tied to the host's NixOS config
- Moving a VM means rebuilding it on another host
Cons:
- Very niche, smaller community
- Different mental model from current workflow
- Only NixOS guests (no Ubuntu, FreeBSD, etc.)
- No Terraform integration
- No migration support
- Less isolation than full QEMU VMs
- Would need to learn a new deployment pattern
Comparison
| Criteria | Incus | libvirt | microvm.nix |
|---|---|---|---|
| Workflow similarity | Medium | High | Low |
| Terraform support | Yes (lxd/incus provider) | Yes (mature provider) | No |
| NixOS module | Yes | Yes | Yes |
| Containers + VMs | Both | VMs only | VMs only |
| Non-NixOS guests | Yes | Yes | No |
| Live migration | Built-in (first-class) | Yes (manual setup) | No |
| Offline migration | Built-in | Yes (manual setup) | No (rebuild) |
| Clustering | Built-in | Manual | No |
| Learning curve | Medium | Low | Medium |
| Community/maturity | Growing | Very mature | Niche |
| Overhead | Low | Minimal | Minimal |
Recommendation
Start with Incus. Migration and clustering are key requirements:
- Built-in clustering makes two PN51s a proper hypervisor pool
- Live and offline migration are first-class operations, similar to Proxmox
- Can move VMs between hosts for maintenance (kernel updates, hardware work) without downtime
- Supports both containers and VMs — flexibility for future use
- Terraform provider exists (less mature than libvirt's, but functional)
- REST API enables automation beyond what Terraform covers
libvirt could achieve similar results but requires significantly more manual setup for migration and has no clustering awareness. For a two-node setup where migration is a priority, Incus provides much more out of the box.
microvm.nix is off the table given the migration requirement.
Implementation Plan
Phase 1: Single-Node Setup (on one PN51)
- Enable
virtualisation.incuson pn01 (or whichever is stable) - Initialize Incus (
incus admin init) — configure storage pool (local NVMe) and network bridge - Configure bridge networking for VM traffic on VLAN 12
- Build a NixOS VM image and import it into Incus
- Create a test VM manually with
incus launchto validate the setup
Phase 2: Two-Node Cluster (PN51s only)
- Enable Incus on the second PN51
- Form a cluster between both nodes
- Configure shared storage (NFS from NAS, or Ceph if warranted)
- Test offline migration:
incus move <vm> --target <other-node> - Test live migration with shared storage
- CPU compatibility is not an issue here — both nodes have identical Ryzen 7 5700U CPUs
Phase 3: Terraform Integration
- Add Incus Terraform provider to
terraform/ - Define a test VM in OpenTofu (cloud-init, static IP, vault provisioning)
- Verify the full pipeline: tofu apply -> VM boots -> cloud-init -> vault credentials -> NixOS rebuild
- Compare workflow with existing Proxmox pipeline
Phase 4: Evaluate and Expand
- Is the workflow comparable to Proxmox?
- Migration reliability — does live migration work cleanly?
- Performance overhead acceptable on Ryzen 5700U?
- Worth migrating some test-tier VMs from Proxmox?
- Could ns1/ns2 run on separate Incus nodes instead of the single Proxmox host?
Phase 5: Proxmox Replacement (optional)
If Incus works well on the PN51s, consider replacing Proxmox entirely for a three-node cluster.
CPU compatibility for mixed cluster:
| Node | CPU | Architecture | x86-64-v3 |
|---|---|---|---|
| Proxmox host | AMD Ryzen 9 3900X (12C/24T) | Zen 2 | Yes |
| pn01 | AMD Ryzen 7 5700U (8C/16T) | Zen 3 | Yes |
| pn02 | AMD Ryzen 7 5700U (8C/16T) | Zen 3 | Yes |
All three CPUs are AMD and support x86-64-v3. The 3900X (Zen 2) is the oldest, so it defines the feature ceiling — but x86-64-v3 is well within its capabilities. VMs configured with x86-64-v3 can migrate freely between all three nodes.
Being all-AMD also avoids the trickier Intel/AMD cross-vendor migration edge cases (different CPUID layouts, virtualization extensions).
The 3900X (12C/24T) would be the most powerful node, making it the natural home for heavier workloads, with the PN51s (8C/16T each) handling lighter VMs or serving as migration targets during maintenance.
Steps:
- Install NixOS + Incus on the Proxmox host (or a replacement machine)
- Join it to the existing Incus cluster with
x86-64-v3CPU baseline - Migrate VMs from Proxmox to the Incus cluster
- Decommission Proxmox
Prerequisites
- PN51 units pass stability testing (see
pn51-stability.md) - Decide which unit to use first (pn01 preferred — 64GB RAM, NVMe, currently more stable)
Open Questions
- How to handle VM storage? Local NVMe, NFS from NAS, or Ceph between the two nodes?
- Network topology: bridge on VLAN 12, or trunk multiple VLANs to the PN51?
- Should VMs be on the same VLAN as the hypervisor host, or separate?
- Incus clustering with only two nodes — any quorum issues? Three nodes (with Proxmox replacement) would solve this
- How to handle NixOS guest images? Build with nixos-generators, or use Incus image builder?
What CPU does the current Proxmox host have?AMD Ryzen 9 3900X (Zen 2) —x86-64-v3confirmed, all-AMD cluster- If replacing Proxmox: migrate VMs first, or fresh start and rebuild?