# Local NTP with Chrony ## Overview/Goal Set up pve1 as a local NTP server and switch all NixOS VMs from systemd-timesyncd to chrony, pointing at pve1 as the sole time source. This eliminates clock drift issues that cause false `host_reboot` alerts. ## Current State - All NixOS hosts use `systemd-timesyncd` with default NixOS pool servers (`0.nixos.pool.ntp.org` etc.) - No NTP/timesyncd configuration exists in the repo — all defaults - pve1 (Proxmox, bare metal) already runs chrony but only as a client - VMs drift noticeably — ns1 (~19ms) and jelly01 (~39ms) are worst offenders - Clock step corrections from timesyncd trigger false `host_reboot` alerts via `changes(node_boot_time_seconds[10m]) > 0` - pve1 itself stays at 0ms offset thanks to chrony ## Why systemd-timesyncd is Insufficient - Minimal SNTP client, no proper clock discipline or frequency tracking - Backs off polling interval when it thinks clock is stable, missing drift - Corrects via step adjustments rather than gradual slewing, causing metric jumps - Each VM resolves to different pool servers with varying accuracy ## Implementation Steps ### 1. Configure pve1 as NTP Server Add to pve1's `/etc/chrony/chrony.conf`: ``` # Allow NTP clients from the infrastructure subnet allow 10.69.13.0/24 ``` Restart chrony on pve1. ### 2. Add Chrony to NixOS System Config Create `system/chrony.nix` (applied to all hosts via system imports): ```nix { # Disable systemd-timesyncd (chrony takes over) services.timesyncd.enable = false; # Enable chrony pointing at pve1 services.chrony = { enable = true; servers = [ "pve1.home.2rjus.net" ]; serverOption = "iburst"; }; } ``` ### 3. Optional: Add Chrony Exporter For better visibility into NTP sync quality: ```nix services.prometheus.exporters.chrony.enable = true; ``` Add chrony exporter scrape targets via `homelab.monitoring.scrapeTargets` and create a Grafana dashboard for NTP offset across all hosts. ### 4. Roll Out - Deploy to a test-tier host first to verify - Then deploy to all hosts via auto-upgrade ## Open Questions - [ ] Does pve1's chrony config need `local stratum 10` as fallback if upstream is unreachable? - [ ] Should we also enable `enableRTCTrimming` for the VMs? - [ ] Worth adding a chrony exporter on pve1 as well (manual install like node-exporter)? ## Notes - No fallback NTP servers needed on VMs — if pve1 is down, all VMs are down too - The `host_reboot` alert rule (`changes(node_boot_time_seconds[10m]) > 0`) should stop false-firing once clock corrections are slewed instead of stepped - pn01/pn02 are bare metal but still benefit from syncing to pve1 for consistency