# NixOS Prometheus Exporter ## Overview Build a generic Prometheus exporter for NixOS-specific metrics. This exporter should be useful for any NixOS deployment, not just our homelab. ## Goal Provide visibility into NixOS system state that standard exporters don't cover: - Generation management (count, age, current vs booted) - Flake input freshness - Upgrade status ## Metrics ### Core Metrics | Metric | Description | Source | |--------|-------------|--------| | `nixos_generation_count` | Number of system generations | Count entries in `/nix/var/nix/profiles/system-*` | | `nixos_current_generation` | Active generation number | Parse `readlink /run/current-system` | | `nixos_booted_generation` | Generation that was booted | Parse `/run/booted-system` | | `nixos_generation_age_seconds` | Age of current generation | File mtime of current system profile | | `nixos_config_mismatch` | 1 if booted != current, 0 otherwise | Compare symlink targets | ### Flake Metrics (optional collector) | Metric | Description | Source | |--------|-------------|--------| | `nixos_flake_input_age_seconds` | Age of each flake.lock input | Parse `lastModified` from flake.lock | | `nixos_flake_input_info` | Info gauge with rev label | Parse `rev` from flake.lock | Labels: `input` (e.g., "nixpkgs", "home-manager") ### Future Metrics | Metric | Description | Source | |--------|-------------|--------| | `nixos_upgrade_pending` | 1 if remote differs from local | Compare flake refs (expensive) | | `nixos_store_size_bytes` | Size of /nix/store | `du` or filesystem stats | | `nixos_store_path_count` | Number of store paths | Count entries | ## Architecture Single binary with optional collectors enabled via config or flags. ``` nixos-exporter ├── main.go ├── collector/ │ ├── generation.go # Core generation metrics │ └── flake.go # Flake input metrics └── config/ └── config.go ``` ## Configuration ```yaml listen_addr: ":9971" collectors: generation: enabled: true flake: enabled: false lock_path: "/etc/nixos/flake.lock" # or auto-detect from /run/current-system ``` Command-line alternative: ```bash nixos-exporter --listen=:9971 --collector.flake --flake.lock-path=/etc/nixos/flake.lock ``` ## NixOS Module ```nix services.prometheus.exporters.nixos = { enable = true; port = 9971; collectors = [ "generation" "flake" ]; flake.lockPath = "/etc/nixos/flake.lock"; }; ``` The module should integrate with nixpkgs' existing `services.prometheus.exporters.*` pattern. ## Implementation ### Language Go - mature prometheus client library, single static binary, easy cross-compilation. ### Phase 1: Core 1. Create git repository 2. Implement generation collector (count, current, booted, age, mismatch) 3. Basic HTTP server with `/metrics` endpoint 4. NixOS module ### Phase 2: Flake Collector 1. Parse flake.lock JSON format 2. Extract lastModified timestamps per input 3. Add input labels ### Phase 3: Packaging 1. Add to nixpkgs or publish as flake 2. Documentation 3. Example Grafana dashboard ## Example Output ``` # HELP nixos_generation_count Total number of system generations # TYPE nixos_generation_count gauge nixos_generation_count 47 # HELP nixos_current_generation Currently active generation number # TYPE nixos_current_generation gauge nixos_current_generation 47 # HELP nixos_booted_generation Generation that was booted # TYPE nixos_booted_generation gauge nixos_booted_generation 46 # HELP nixos_generation_age_seconds Age of current generation in seconds # TYPE nixos_generation_age_seconds gauge nixos_generation_age_seconds 3600 # HELP nixos_config_mismatch 1 if booted generation differs from current # TYPE nixos_config_mismatch gauge nixos_config_mismatch 1 # HELP nixos_flake_input_age_seconds Age of flake input in seconds # TYPE nixos_flake_input_age_seconds gauge nixos_flake_input_age_seconds{input="nixpkgs"} 259200 nixos_flake_input_age_seconds{input="home-manager"} 86400 ``` ## Alert Examples ```yaml - alert: NixOSConfigStale expr: nixos_generation_age_seconds > 7 * 24 * 3600 for: 1h labels: severity: warning annotations: summary: "NixOS config on {{ $labels.instance }} is over 7 days old" - alert: NixOSRebootRequired expr: nixos_config_mismatch == 1 for: 24h labels: severity: info annotations: summary: "{{ $labels.instance }} needs reboot to apply config" - alert: NixpkgsInputStale expr: nixos_flake_input_age_seconds{input="nixpkgs"} > 30 * 24 * 3600 for: 1d labels: severity: info annotations: summary: "nixpkgs input on {{ $labels.instance }} is over 30 days old" ``` ## Open Questions - [ ] How to detect flake.lock path automatically? (check /run/current-system for flake info) - [ ] Should generation collector need root? (probably not, just reading symlinks) - [ ] Include in nixpkgs or distribute as standalone flake? ## Notes - Port 9971 suggested (9970 reserved for homelab-exporter) - Keep scope focused on NixOS-specific metrics - don't duplicate node-exporter - Consider submitting to prometheus exporter registry once stable