Files
nixos-servers/docs/plans/local-ntp-chrony.md
Torjus Håkestad 55da459108
Some checks failed
Run nix flake check / flake-check (push) Failing after 9m52s
Periodic flake update / flake-update (push) Successful in 5m19s
docs: add plan for local NTP with chrony
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:33:28 +01:00

2.6 KiB

Local NTP with Chrony

Overview/Goal

Set up pve1 as a local NTP server and switch all NixOS VMs from systemd-timesyncd to chrony, pointing at pve1 as the sole time source. This eliminates clock drift issues that cause false host_reboot alerts.

Current State

  • All NixOS hosts use systemd-timesyncd with default NixOS pool servers (0.nixos.pool.ntp.org etc.)
  • No NTP/timesyncd configuration exists in the repo — all defaults
  • pve1 (Proxmox, bare metal) already runs chrony but only as a client
  • VMs drift noticeably — ns1 (~19ms) and jelly01 (~39ms) are worst offenders
  • Clock step corrections from timesyncd trigger false host_reboot alerts via changes(node_boot_time_seconds[10m]) > 0
  • pve1 itself stays at 0ms offset thanks to chrony

Why systemd-timesyncd is Insufficient

  • Minimal SNTP client, no proper clock discipline or frequency tracking
  • Backs off polling interval when it thinks clock is stable, missing drift
  • Corrects via step adjustments rather than gradual slewing, causing metric jumps
  • Each VM resolves to different pool servers with varying accuracy

Implementation Steps

1. Configure pve1 as NTP Server

Add to pve1's /etc/chrony/chrony.conf:

# Allow NTP clients from the infrastructure subnet
allow 10.69.13.0/24

Restart chrony on pve1.

2. Add Chrony to NixOS System Config

Create system/chrony.nix (applied to all hosts via system imports):

{
  # Disable systemd-timesyncd (chrony takes over)
  services.timesyncd.enable = false;

  # Enable chrony pointing at pve1
  services.chrony = {
    enable = true;
    servers = [ "pve1.home.2rjus.net" ];
    serverOption = "iburst";
  };
}

3. Optional: Add Chrony Exporter

For better visibility into NTP sync quality:

services.prometheus.exporters.chrony.enable = true;

Add chrony exporter scrape targets via homelab.monitoring.scrapeTargets and create a Grafana dashboard for NTP offset across all hosts.

4. Roll Out

  • Deploy to a test-tier host first to verify
  • Then deploy to all hosts via auto-upgrade

Open Questions

  • Does pve1's chrony config need local stratum 10 as fallback if upstream is unreachable?
  • Should we also enable enableRTCTrimming for the VMs?
  • Worth adding a chrony exporter on pve1 as well (manual install like node-exporter)?

Notes

  • No fallback NTP servers needed on VMs — if pve1 is down, all VMs are down too
  • The host_reboot alert rule (changes(node_boot_time_seconds[10m]) > 0) should stop false-firing once clock corrections are slewed instead of stepped
  • pn01/pn02 are bare metal but still benefit from syncing to pve1 for consistency