80 lines
2.6 KiB
Markdown
80 lines
2.6 KiB
Markdown
# Local NTP with Chrony
|
|
|
|
## Overview/Goal
|
|
|
|
Set up pve1 as a local NTP server and switch all NixOS VMs from systemd-timesyncd to chrony, pointing at pve1 as the sole time source. This eliminates clock drift issues that cause false `host_reboot` alerts.
|
|
|
|
## Current State
|
|
|
|
- All NixOS hosts use `systemd-timesyncd` with default NixOS pool servers (`0.nixos.pool.ntp.org` etc.)
|
|
- No NTP/timesyncd configuration exists in the repo — all defaults
|
|
- pve1 (Proxmox, bare metal) already runs chrony but only as a client
|
|
- VMs drift noticeably — ns1 (~19ms) and jelly01 (~39ms) are worst offenders
|
|
- Clock step corrections from timesyncd trigger false `host_reboot` alerts via `changes(node_boot_time_seconds[10m]) > 0`
|
|
- pve1 itself stays at 0ms offset thanks to chrony
|
|
|
|
## Why systemd-timesyncd is Insufficient
|
|
|
|
- Minimal SNTP client, no proper clock discipline or frequency tracking
|
|
- Backs off polling interval when it thinks clock is stable, missing drift
|
|
- Corrects via step adjustments rather than gradual slewing, causing metric jumps
|
|
- Each VM resolves to different pool servers with varying accuracy
|
|
|
|
## Implementation Steps
|
|
|
|
### 1. Configure pve1 as NTP Server
|
|
|
|
Add to pve1's `/etc/chrony/chrony.conf`:
|
|
|
|
```
|
|
# Allow NTP clients from the infrastructure subnet
|
|
allow 10.69.13.0/24
|
|
```
|
|
|
|
Restart chrony on pve1.
|
|
|
|
### 2. Add Chrony to NixOS System Config
|
|
|
|
Create `system/chrony.nix` (applied to all hosts via system imports):
|
|
|
|
```nix
|
|
{
|
|
# Disable systemd-timesyncd (chrony takes over)
|
|
services.timesyncd.enable = false;
|
|
|
|
# Enable chrony pointing at pve1
|
|
services.chrony = {
|
|
enable = true;
|
|
servers = [ "pve1.home.2rjus.net" ];
|
|
serverOption = "iburst";
|
|
};
|
|
}
|
|
```
|
|
|
|
### 3. Optional: Add Chrony Exporter
|
|
|
|
For better visibility into NTP sync quality:
|
|
|
|
```nix
|
|
services.prometheus.exporters.chrony.enable = true;
|
|
```
|
|
|
|
Add chrony exporter scrape targets via `homelab.monitoring.scrapeTargets` and create a Grafana dashboard for NTP offset across all hosts.
|
|
|
|
### 4. Roll Out
|
|
|
|
- Deploy to a test-tier host first to verify
|
|
- Then deploy to all hosts via auto-upgrade
|
|
|
|
## Open Questions
|
|
|
|
- [ ] Does pve1's chrony config need `local stratum 10` as fallback if upstream is unreachable?
|
|
- [ ] Should we also enable `enableRTCTrimming` for the VMs?
|
|
- [ ] Worth adding a chrony exporter on pve1 as well (manual install like node-exporter)?
|
|
|
|
## Notes
|
|
|
|
- No fallback NTP servers needed on VMs — if pve1 is down, all VMs are down too
|
|
- The `host_reboot` alert rule (`changes(node_boot_time_seconds[10m]) > 0`) should stop false-firing once clock corrections are slewed instead of stepped
|
|
- pn01/pn02 are bare metal but still benefit from syncing to pve1 for consistency
|