Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.0 KiB
5.0 KiB
NixOS Prometheus Exporter
Overview
Build a generic Prometheus exporter for NixOS-specific metrics. This exporter should be useful for any NixOS deployment, not just our homelab.
Goal
Provide visibility into NixOS system state that standard exporters don't cover:
- Generation management (count, age, current vs booted)
- Flake input freshness
- Upgrade status
Metrics
Core Metrics
| Metric | Description | Source |
|---|---|---|
nixos_generation_count |
Number of system generations | Count entries in /nix/var/nix/profiles/system-* |
nixos_current_generation |
Active generation number | Parse readlink /run/current-system |
nixos_booted_generation |
Generation that was booted | Parse /run/booted-system |
nixos_generation_age_seconds |
Age of current generation | File mtime of current system profile |
nixos_config_mismatch |
1 if booted != current, 0 otherwise | Compare symlink targets |
Flake Metrics (optional collector)
| Metric | Description | Source |
|---|---|---|
nixos_flake_input_age_seconds |
Age of each flake.lock input | Parse lastModified from flake.lock |
nixos_flake_input_info |
Info gauge with rev label | Parse rev from flake.lock |
Labels: input (e.g., "nixpkgs", "home-manager")
Future Metrics
| Metric | Description | Source |
|---|---|---|
nixos_upgrade_pending |
1 if remote differs from local | Compare flake refs (expensive) |
nixos_store_size_bytes |
Size of /nix/store | du or filesystem stats |
nixos_store_path_count |
Number of store paths | Count entries |
Architecture
Single binary with optional collectors enabled via config or flags.
nixos-exporter
├── main.go
├── collector/
│ ├── generation.go # Core generation metrics
│ └── flake.go # Flake input metrics
└── config/
└── config.go
Configuration
listen_addr: ":9971"
collectors:
generation:
enabled: true
flake:
enabled: false
lock_path: "/etc/nixos/flake.lock" # or auto-detect from /run/current-system
Command-line alternative:
nixos-exporter --listen=:9971 --collector.flake --flake.lock-path=/etc/nixos/flake.lock
NixOS Module
services.prometheus.exporters.nixos = {
enable = true;
port = 9971;
collectors = [ "generation" "flake" ];
flake.lockPath = "/etc/nixos/flake.lock";
};
The module should integrate with nixpkgs' existing services.prometheus.exporters.* pattern.
Implementation
Language
Go - mature prometheus client library, single static binary, easy cross-compilation.
Phase 1: Core
- Create git repository
- Implement generation collector (count, current, booted, age, mismatch)
- Basic HTTP server with
/metricsendpoint - NixOS module
Phase 2: Flake Collector
- Parse flake.lock JSON format
- Extract lastModified timestamps per input
- Add input labels
Phase 3: Packaging
- Add to nixpkgs or publish as flake
- Documentation
- Example Grafana dashboard
Example Output
# HELP nixos_generation_count Total number of system generations
# TYPE nixos_generation_count gauge
nixos_generation_count 47
# HELP nixos_current_generation Currently active generation number
# TYPE nixos_current_generation gauge
nixos_current_generation 47
# HELP nixos_booted_generation Generation that was booted
# TYPE nixos_booted_generation gauge
nixos_booted_generation 46
# HELP nixos_generation_age_seconds Age of current generation in seconds
# TYPE nixos_generation_age_seconds gauge
nixos_generation_age_seconds 3600
# HELP nixos_config_mismatch 1 if booted generation differs from current
# TYPE nixos_config_mismatch gauge
nixos_config_mismatch 1
# HELP nixos_flake_input_age_seconds Age of flake input in seconds
# TYPE nixos_flake_input_age_seconds gauge
nixos_flake_input_age_seconds{input="nixpkgs"} 259200
nixos_flake_input_age_seconds{input="home-manager"} 86400
Alert Examples
- alert: NixOSConfigStale
expr: nixos_generation_age_seconds > 7 * 24 * 3600
for: 1h
labels:
severity: warning
annotations:
summary: "NixOS config on {{ $labels.instance }} is over 7 days old"
- alert: NixOSRebootRequired
expr: nixos_config_mismatch == 1
for: 24h
labels:
severity: info
annotations:
summary: "{{ $labels.instance }} needs reboot to apply config"
- alert: NixpkgsInputStale
expr: nixos_flake_input_age_seconds{input="nixpkgs"} > 30 * 24 * 3600
for: 1d
labels:
severity: info
annotations:
summary: "nixpkgs input on {{ $labels.instance }} is over 30 days old"
Open Questions
- How to detect flake.lock path automatically? (check /run/current-system for flake info)
- Should generation collector need root? (probably not, just reading symlinks)
- Include in nixpkgs or distribute as standalone flake?
Notes
- Port 9971 suggested (9970 reserved for homelab-exporter)
- Keep scope focused on NixOS-specific metrics - don't duplicate node-exporter
- Consider submitting to prometheus exporter registry once stable