# Memory Issues Follow-up Tracking the zram change to verify it resolves OOM issues during nixos-upgrade on low-memory hosts. ## Background On 2026-02-08, ns2 (2GB RAM) experienced an OOM kill during nixos-upgrade. The Nix evaluation process consumed ~1.6GB before being killed by the kernel. ns1 (manually increased to 4GB) succeeded with the same upgrade. Root cause: 2GB RAM is insufficient for Nix flake evaluation without swap. ## Fix Applied **Commit:** `1674b6a` - system: enable zram swap for all hosts **Merged:** 2026-02-08 ~12:15 UTC **Change:** Added `zramSwap.enable = true` to `system/zram.nix`, providing ~2GB compressed swap on all hosts. ## Timeline | Time (UTC) | Event | |------------|-------| | 05:00:46 | ns2 nixos-upgrade OOM killed | | 05:01:47 | `nixos_upgrade_failed` alert fired | | 12:15 | zram commit merged to master | | 12:19 | ns2 rebooted with zram enabled | | 12:20 | ns1 rebooted (memory reduced to 2GB via tofu) | ## Hosts Affected All 2GB VMs that run nixos-upgrade: - ns1, ns2 (DNS) - vault01 - testvm01, testvm02, testvm03 - kanidm01 ## Metrics to Monitor Check these in Grafana or via PromQL to verify the fix: ### Swap availability (should be ~2GB after upgrade) ```promql node_memory_SwapTotal_bytes / 1024 / 1024 ``` ### Swap usage during upgrades ```promql (node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / 1024 / 1024 ``` ### Zswap compressed bytes (active compression) ```promql node_memory_Zswap_bytes / 1024 / 1024 ``` ### Upgrade failures (should be 0) ```promql node_systemd_unit_state{name="nixos-upgrade.service", state="failed"} ``` ### Memory available during upgrades ```promql node_memory_MemAvailable_bytes / 1024 / 1024 ``` ## Verification Steps After a few days (allow auto-upgrades to run on all hosts): 1. Check all hosts have swap enabled: ```promql node_memory_SwapTotal_bytes > 0 ``` 2. Check for any upgrade failures since the fix: ```promql count_over_time(ALERTS{alertname="nixos_upgrade_failed"}[7d]) ``` 3. Review if any hosts used swap during upgrades (check historical graphs) ## Success Criteria - No `nixos_upgrade_failed` alerts due to OOM after 2026-02-08 - All hosts show ~2GB swap available - Upgrades complete successfully on 2GB VMs ## Fallback Options If zram is insufficient: 1. **Increase VM memory** - Update `terraform/vms.tf` to 4GB for affected hosts 2. **Use remote builds** - Configure `nix.buildMachines` to offload evaluation 3. **Reduce flake size** - Split configurations to reduce evaluation memory