Files
nixos-servers/CLAUDE.md
Torjus Håkestad f779f49c20
Some checks failed
Run nix flake check / flake-check (push) Failing after 12m40s
vibecoding: add CLAUDE.md
2026-01-31 10:56:21 +01:00

6.7 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Repository Overview

This is a Nix Flake-based NixOS configuration repository for managing a homelab infrastructure consisting of 16 server configurations. The repository uses a modular architecture with shared system configurations, reusable service modules, and per-host customization.

Common Commands

Building Configurations

# List all available configurations
nix flake show

# Build a specific host configuration locally (without deploying)
nixos-rebuild build --flake .#<hostname>

# Build and check a configuration
nix build .#nixosConfigurations.<hostname>.config.system.build.toplevel

Deployment

Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.

Flake Management

# Check flake for errors
nix flake check

Do not run nix flake update. Should only be done manually by user.

Development Environment

# Enter development shell (provides ansible, python3)
nix develop

Secrets Management

Secrets are handled by sops. Do not edit any .sops.yaml or any file within secrets/. Ask the user to modify if necessary.

Architecture

Directory Structure

  • /flake.nix - Central flake defining all 16 NixOS configurations
  • /hosts/<hostname>/ - Per-host configurations
    • default.nix - Entry point, imports configuration.nix and services
    • configuration.nix - Host-specific settings (networking, hardware, users)
  • /system/ - Shared system-level configurations applied to ALL hosts
    • Core modules: nix.nix, sshd.nix, sops.nix, acme.nix, autoupgrade.nix
    • Monitoring: node-exporter and promtail on every host
  • /services/ - Reusable service modules, selectively imported by hosts
    • home-assistant/ - Home automation stack
    • monitoring/ - Observability stack (Prometheus, Grafana, Loki, Tempo)
    • ns/ - DNS services (authoritative, resolver)
    • http-proxy/, ca/, postgres/, nats/, jellyfin/, etc.
  • /secrets/ - SOPS-encrypted secrets with age encryption
  • /common/ - Shared configurations (e.g., VM guest agent)
  • /playbooks/ - Ansible playbooks for fleet management
  • /.sops.yaml - SOPS configuration with age keys for all servers

Configuration Inheritance

Each host follows this import pattern:

hosts/<hostname>/default.nix
  └─> configuration.nix (host-specific)
      ├─> ../../system (ALL shared system configs - applied to every host)
      ├─> ../../services/<service> (selective service imports)
      └─> ../../common/vm (if VM)

All hosts automatically get:

  • Nix binary cache (nix-cache.home.2rjus.net)
  • SSH with root login enabled
  • SOPS secrets management with auto-generated age keys
  • Internal ACME CA integration (ca.home.2rjus.net)
  • Daily auto-upgrades with auto-reboot
  • Prometheus node-exporter + Promtail (logs to monitoring01)
  • Custom root CA trust

Active Hosts

Production servers managed by rebuild-all.sh:

  • ns1, ns2 - Primary/secondary DNS servers (10.69.13.5/6)
  • ca - Internal Certificate Authority
  • ha1 - Home Assistant + Zigbee2MQTT + Mosquitto
  • http-proxy - Reverse proxy
  • monitoring01 - Full observability stack (Prometheus, Grafana, Loki, Tempo, Pyroscope)
  • jelly01 - Jellyfin media server
  • nix-cache01 - Binary cache server
  • pgdb1 - PostgreSQL database
  • nats1 - NATS messaging server
  • auth01 - Authentication service

Template/test hosts:

  • template1 - Base template for cloning new hosts
  • nixos-test1 - Test environment

Flake Inputs

  • nixpkgs - NixOS 25.11 stable (primary)
  • nixpkgs-unstable - Unstable channel (available via overlay as pkgs.unstable.<package>)
  • sops-nix - Secrets management
  • Custom packages from git.t-juice.club:
    • backup-helper - Backup automation module
    • alerttonotify - Alert routing
    • labmon - Lab monitoring

Network Architecture

  • Domain: home.2rjus.net
  • Infrastructure subnet: 10.69.13.x
  • DNS: ns1/ns2 provide authoritative DNS with primary-secondary setup
  • Internal CA for ACME certificates (no Let's Encrypt)
  • Centralized monitoring at monitoring01
  • Static networking via systemd-networkd

Secrets Management

  • Uses SOPS with age encryption
  • Each server has unique age key in .sops.yaml
  • Keys auto-generated at /var/lib/sops-nix/key.txt on first boot
  • Shared secrets: /secrets/secrets.yaml
  • Per-host secrets: /secrets/<hostname>/
  • All production servers can decrypt shared secrets; host-specific secrets require specific host keys

Auto-Upgrade System

All hosts pull updates daily from:

git+https://git.t-juice.club/torjus/nixos-servers.git

Configured in /system/autoupgrade.nix:

  • Random delay to avoid simultaneous upgrades
  • Auto-reboot after successful upgrade
  • Systemd service: nixos-upgrade.service

Adding a New Host

  1. Create /hosts/<hostname>/ directory
  2. Copy structure from template1 or similar host
  3. Add host entry to flake.nix nixosConfigurations
  4. Add hostname to dns zone files. Merge to master. Run auto-upgrade on dns servers.
  5. User clones template host
  6. User runs prepare-host.sh on new host, this deletes files which should be regenerated, like ssh host keys, machine-id etc. It also creates a new age key, and prints the public key
  7. This key is then added to .sops.yaml
  8. Create /secrets/<hostname>/ if needed
  9. Configure networking (static IP, DNS servers)
  10. Commit changes, and merge to master.
  11. Deploy by running nixos-rebuild boot --flake URL#<hostname> on the host.

Important Patterns

Overlay usage: Access unstable packages via pkgs.unstable.<package> (defined in flake.nix overlay-unstable)

Service composition: Services in /services/ are designed to be imported by multiple hosts. Keep them modular and reusable.

Hardware configuration reuse: Multiple hosts share /hosts/template/hardware-configuration.nix for VM instances.

State version: All hosts use stateVersion "23.11" - do not change this on existing hosts.

Firewall: Disabled on most hosts (trusted network). Enable selectively in host configuration if needed.

Monitoring Stack

All hosts ship metrics and logs to monitoring01:

  • Metrics: Prometheus scrapes node-exporter from all hosts
  • Logs: Promtail ships logs to Loki on monitoring01
  • Access: Grafana at monitoring01 for visualization
  • Tracing: Tempo for distributed tracing
  • Profiling: Pyroscope for continuous profiling

DNS Architecture

  • ns1 (10.69.13.5) - Primary authoritative DNS + resolver
  • ns2 (10.69.13.6) - Secondary authoritative DNS (AXFR from ns1)
  • Zone files managed in /services/ns/
  • All hosts point to ns1/ns2 for DNS resolution