Files
nixos-servers/docs/plans/nixos-exporter.md

5.0 KiB

NixOS Prometheus Exporter

Overview

Build a generic Prometheus exporter for NixOS-specific metrics. This exporter should be useful for any NixOS deployment, not just our homelab.

Goal

Provide visibility into NixOS system state that standard exporters don't cover:

  • Generation management (count, age, current vs booted)
  • Flake input freshness
  • Upgrade status

Metrics

Core Metrics

Metric Description Source
nixos_generation_count Number of system generations Count entries in /nix/var/nix/profiles/system-*
nixos_current_generation Active generation number Parse readlink /run/current-system
nixos_booted_generation Generation that was booted Parse /run/booted-system
nixos_generation_age_seconds Age of current generation File mtime of current system profile
nixos_config_mismatch 1 if booted != current, 0 otherwise Compare symlink targets

Flake Metrics (optional collector)

Metric Description Source
nixos_flake_input_age_seconds Age of each flake.lock input Parse lastModified from flake.lock
nixos_flake_input_info Info gauge with rev label Parse rev from flake.lock

Labels: input (e.g., "nixpkgs", "home-manager")

Future Metrics

Metric Description Source
nixos_upgrade_pending 1 if remote differs from local Compare flake refs (expensive)
nixos_store_size_bytes Size of /nix/store du or filesystem stats
nixos_store_path_count Number of store paths Count entries

Architecture

Single binary with optional collectors enabled via config or flags.

nixos-exporter
├── main.go
├── collector/
│   ├── generation.go    # Core generation metrics
│   └── flake.go         # Flake input metrics
└── config/
    └── config.go

Configuration

listen_addr: ":9971"
collectors:
  generation:
    enabled: true
  flake:
    enabled: false
    lock_path: "/etc/nixos/flake.lock"  # or auto-detect from /run/current-system

Command-line alternative:

nixos-exporter --listen=:9971 --collector.flake --flake.lock-path=/etc/nixos/flake.lock

NixOS Module

services.prometheus.exporters.nixos = {
  enable = true;
  port = 9971;
  collectors = [ "generation" "flake" ];
  flake.lockPath = "/etc/nixos/flake.lock";
};

The module should integrate with nixpkgs' existing services.prometheus.exporters.* pattern.

Implementation

Language

Go - mature prometheus client library, single static binary, easy cross-compilation.

Phase 1: Core

  1. Create git repository
  2. Implement generation collector (count, current, booted, age, mismatch)
  3. Basic HTTP server with /metrics endpoint
  4. NixOS module

Phase 2: Flake Collector

  1. Parse flake.lock JSON format
  2. Extract lastModified timestamps per input
  3. Add input labels

Phase 3: Packaging

  1. Add to nixpkgs or publish as flake
  2. Documentation
  3. Example Grafana dashboard

Example Output

# HELP nixos_generation_count Total number of system generations
# TYPE nixos_generation_count gauge
nixos_generation_count 47

# HELP nixos_current_generation Currently active generation number
# TYPE nixos_current_generation gauge
nixos_current_generation 47

# HELP nixos_booted_generation Generation that was booted
# TYPE nixos_booted_generation gauge
nixos_booted_generation 46

# HELP nixos_generation_age_seconds Age of current generation in seconds
# TYPE nixos_generation_age_seconds gauge
nixos_generation_age_seconds 3600

# HELP nixos_config_mismatch 1 if booted generation differs from current
# TYPE nixos_config_mismatch gauge
nixos_config_mismatch 1

# HELP nixos_flake_input_age_seconds Age of flake input in seconds
# TYPE nixos_flake_input_age_seconds gauge
nixos_flake_input_age_seconds{input="nixpkgs"} 259200
nixos_flake_input_age_seconds{input="home-manager"} 86400

Alert Examples

- alert: NixOSConfigStale
  expr: nixos_generation_age_seconds > 7 * 24 * 3600
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "NixOS config on {{ $labels.instance }} is over 7 days old"

- alert: NixOSRebootRequired
  expr: nixos_config_mismatch == 1
  for: 24h
  labels:
    severity: info
  annotations:
    summary: "{{ $labels.instance }} needs reboot to apply config"

- alert: NixpkgsInputStale
  expr: nixos_flake_input_age_seconds{input="nixpkgs"} > 30 * 24 * 3600
  for: 1d
  labels:
    severity: info
  annotations:
    summary: "nixpkgs input on {{ $labels.instance }} is over 30 days old"

Open Questions

  • How to detect flake.lock path automatically? (check /run/current-system for flake info)
  • Should generation collector need root? (probably not, just reading symlinks)
  • Include in nixpkgs or distribute as standalone flake?

Notes

  • Port 9971 suggested (9970 reserved for homelab-exporter)
  • Keep scope focused on NixOS-specific metrics - don't duplicate node-exporter
  • Consider submitting to prometheus exporter registry once stable