Files
nixos-servers/docs/plans/nixos-hypervisor.md
Torjus Håkestad bb53b922fa
Some checks failed
Run nix flake check / flake-check (push) Failing after 5m40s
Periodic flake update / flake-update (push) Failing after 4s
plans: add NixOS hypervisor plan (Incus on PN51s)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:47:09 +01:00

9.9 KiB

NixOS Hypervisor

Overview

Experiment with running a NixOS-based hypervisor as an alternative/complement to the current Proxmox setup. Goal is better homelab integration — declarative config, monitoring, auto-updates — while retaining the ability to run VMs with a Terraform-like workflow.

Motivation

  • Proxmox works but doesn't integrate with the NixOS-managed homelab (no monitoring, no auto-updates, no vault, no declarative config)
  • The PN51 units (once stable) are good candidates for experimentation — test-tier, plenty of RAM (32-64GB), 8C/16T
  • Long-term: could reduce reliance on Proxmox or provide a secondary hypervisor pool
  • VM migration: Currently all VMs (including both nameservers) run on a single Proxmox host. Being able to migrate VMs between hypervisors would allow rebooting a host for kernel updates without downtime for critical services like DNS.

Hardware Candidates

pn01 pn02
CPU Ryzen 7 5700U (8C/16T) Ryzen 7 5700U (8C/16T)
RAM 64GB (2x32GB) 32GB (1x32GB, second slot available)
Storage 1TB NVMe 1TB SATA SSD (NVMe planned)
Status Stability testing Stability testing

Options

Option 1: Incus

Fork of LXD (after Canonical made LXD proprietary). Supports both containers (LXC) and VMs (QEMU/KVM).

NixOS integration:

  • virtualisation.incus.enable module in nixpkgs
  • Manages storage pools, networks, and instances
  • REST API for automation
  • CLI tool (incus) for management

Terraform integration:

  • lxd provider works with Incus (API-compatible)
  • Dedicated incus Terraform provider also exists
  • Can define VMs/containers in OpenTofu, similar to current Proxmox workflow

Migration:

  • Built-in live and offline migration via incus move <instance> --target <host>
  • Clustering makes hosts aware of each other — migration is a first-class operation
  • Shared storage (NFS, Ceph) or Incus can transfer storage during migration
  • Stateful stop-and-move also supported for offline migration

Pros:

  • Supports both containers and VMs
  • REST API + CLI for automation
  • Built-in clustering and migration — closest to Proxmox experience
  • Good NixOS module support
  • Image-based workflow (can build NixOS images and import)
  • Active development and community

Cons:

  • Another abstraction layer on top of QEMU/KVM
  • Less mature Terraform provider than libvirt
  • Container networking can be complex
  • NixOS guests in Incus VMs need some setup

Option 2: libvirt/QEMU

Standard Linux virtualization stack. Thin wrapper around QEMU/KVM.

NixOS integration:

  • virtualisation.libvirtd.enable module in nixpkgs
  • Mature and well-tested
  • virsh CLI for management

Terraform integration:

  • dmacvicar/libvirt provider — mature, well-maintained
  • Supports cloud-init, volume management, network config
  • Very similar workflow to current Proxmox+OpenTofu setup
  • Can reuse cloud-init patterns from existing terraform/ config

Migration:

  • Supports live and offline migration via virsh migrate
  • Requires shared storage (NFS, Ceph, or similar) for live migration
  • Requires matching CPU models between hosts (or CPU model masking)
  • Works but is manual — no cluster awareness, must specify target URI
  • No built-in orchestration for multi-host scenarios

Pros:

  • Closest to current Proxmox+Terraform workflow
  • Most mature Terraform provider
  • Minimal abstraction — direct QEMU/KVM management
  • Well-understood, massive community
  • Cloud-init works identically to Proxmox workflow
  • Can reuse existing template-building patterns

Cons:

  • VMs only (no containers without adding LXC separately)
  • No built-in REST API (would need to expose libvirt socket)
  • No web UI without adding cockpit or virt-manager
  • Migration works but requires manual setup — no clustering, no orchestration
  • Less feature-rich than Incus for multi-host scenarios

Option 3: microvm.nix

NixOS-native microVM framework. VMs defined as NixOS modules in the host's flake.

NixOS integration:

  • VMs are NixOS configurations in the same flake
  • Supports multiple backends: cloud-hypervisor, QEMU, firecracker, kvmtool
  • Lightweight — shares host's nix store with guests via virtiofs
  • Declarative network, storage, and resource allocation

Terraform integration:

  • None — everything is defined in Nix
  • Fundamentally different workflow from current Proxmox+Terraform approach

Pros:

  • Most NixOS-native approach
  • VMs defined right alongside host configs in this repo
  • Very lightweight — fast boot, minimal overhead
  • Shares nix store with host (no duplicate packages)
  • No cloud-init needed — guest config is part of the flake

Migration:

  • No migration support — VMs are tied to the host's NixOS config
  • Moving a VM means rebuilding it on another host

Cons:

  • Very niche, smaller community
  • Different mental model from current workflow
  • Only NixOS guests (no Ubuntu, FreeBSD, etc.)
  • No Terraform integration
  • No migration support
  • Less isolation than full QEMU VMs
  • Would need to learn a new deployment pattern

Comparison

Criteria Incus libvirt microvm.nix
Workflow similarity Medium High Low
Terraform support Yes (lxd/incus provider) Yes (mature provider) No
NixOS module Yes Yes Yes
Containers + VMs Both VMs only VMs only
Non-NixOS guests Yes Yes No
Live migration Built-in (first-class) Yes (manual setup) No
Offline migration Built-in Yes (manual setup) No (rebuild)
Clustering Built-in Manual No
Learning curve Medium Low Medium
Community/maturity Growing Very mature Niche
Overhead Low Minimal Minimal

Recommendation

Start with Incus. Migration and clustering are key requirements:

  • Built-in clustering makes two PN51s a proper hypervisor pool
  • Live and offline migration are first-class operations, similar to Proxmox
  • Can move VMs between hosts for maintenance (kernel updates, hardware work) without downtime
  • Supports both containers and VMs — flexibility for future use
  • Terraform provider exists (less mature than libvirt's, but functional)
  • REST API enables automation beyond what Terraform covers

libvirt could achieve similar results but requires significantly more manual setup for migration and has no clustering awareness. For a two-node setup where migration is a priority, Incus provides much more out of the box.

microvm.nix is off the table given the migration requirement.

Implementation Plan

Phase 1: Single-Node Setup (on one PN51)

  1. Enable virtualisation.incus on pn01 (or whichever is stable)
  2. Initialize Incus (incus admin init) — configure storage pool (local NVMe) and network bridge
  3. Configure bridge networking for VM traffic on VLAN 12
  4. Build a NixOS VM image and import it into Incus
  5. Create a test VM manually with incus launch to validate the setup

Phase 2: Two-Node Cluster (PN51s only)

  1. Enable Incus on the second PN51
  2. Form a cluster between both nodes
  3. Configure shared storage (NFS from NAS, or Ceph if warranted)
  4. Test offline migration: incus move <vm> --target <other-node>
  5. Test live migration with shared storage
  6. CPU compatibility is not an issue here — both nodes have identical Ryzen 7 5700U CPUs

Phase 3: Terraform Integration

  1. Add Incus Terraform provider to terraform/
  2. Define a test VM in OpenTofu (cloud-init, static IP, vault provisioning)
  3. Verify the full pipeline: tofu apply -> VM boots -> cloud-init -> vault credentials -> NixOS rebuild
  4. Compare workflow with existing Proxmox pipeline

Phase 4: Evaluate and Expand

  • Is the workflow comparable to Proxmox?
  • Migration reliability — does live migration work cleanly?
  • Performance overhead acceptable on Ryzen 5700U?
  • Worth migrating some test-tier VMs from Proxmox?
  • Could ns1/ns2 run on separate Incus nodes instead of the single Proxmox host?

Phase 5: Proxmox Replacement (optional)

If Incus works well on the PN51s, consider replacing Proxmox entirely for a three-node cluster.

CPU compatibility for mixed cluster:

Node CPU Architecture x86-64-v3
Proxmox host AMD Ryzen 9 3900X (12C/24T) Zen 2 Yes
pn01 AMD Ryzen 7 5700U (8C/16T) Zen 3 Yes
pn02 AMD Ryzen 7 5700U (8C/16T) Zen 3 Yes

All three CPUs are AMD and support x86-64-v3. The 3900X (Zen 2) is the oldest, so it defines the feature ceiling — but x86-64-v3 is well within its capabilities. VMs configured with x86-64-v3 can migrate freely between all three nodes.

Being all-AMD also avoids the trickier Intel/AMD cross-vendor migration edge cases (different CPUID layouts, virtualization extensions).

The 3900X (12C/24T) would be the most powerful node, making it the natural home for heavier workloads, with the PN51s (8C/16T each) handling lighter VMs or serving as migration targets during maintenance.

Steps:

  1. Install NixOS + Incus on the Proxmox host (or a replacement machine)
  2. Join it to the existing Incus cluster with x86-64-v3 CPU baseline
  3. Migrate VMs from Proxmox to the Incus cluster
  4. Decommission Proxmox

Prerequisites

  • PN51 units pass stability testing (see pn51-stability.md)
  • Decide which unit to use first (pn01 preferred — 64GB RAM, NVMe, currently more stable)

Open Questions

  • How to handle VM storage? Local NVMe, NFS from NAS, or Ceph between the two nodes?
  • Network topology: bridge on VLAN 12, or trunk multiple VLANs to the PN51?
  • Should VMs be on the same VLAN as the hypervisor host, or separate?
  • Incus clustering with only two nodes — any quorum issues? Three nodes (with Proxmox replacement) would solve this
  • How to handle NixOS guest images? Build with nixos-generators, or use Incus image builder?
  • What CPU does the current Proxmox host have? AMD Ryzen 9 3900X (Zen 2) — x86-64-v3 confirmed, all-AMD cluster
  • If replacing Proxmox: migrate VMs first, or fresh start and rebuild?