# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Repository Overview This is a Nix Flake-based NixOS configuration repository for managing a homelab infrastructure consisting of 16 server configurations. The repository uses a modular architecture with shared system configurations, reusable service modules, and per-host customization. ## Common Commands ### Building Configurations ```bash # List all available configurations nix flake show # Build a specific host configuration locally (without deploying) nixos-rebuild build --flake .# # Build and check a configuration nix build .#nixosConfigurations..config.system.build.toplevel ``` ### Deployment Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host. ### Flake Management ```bash # Check flake for errors nix flake check ``` Do not run `nix flake update`. Should only be done manually by user. ### Development Environment ```bash # Enter development shell (provides ansible, python3) nix develop ``` ### Secrets Management Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary. ## Architecture ### Directory Structure - `/flake.nix` - Central flake defining all 16 NixOS configurations - `/hosts//` - Per-host configurations - `default.nix` - Entry point, imports configuration.nix and services - `configuration.nix` - Host-specific settings (networking, hardware, users) - `/system/` - Shared system-level configurations applied to ALL hosts - Core modules: nix.nix, sshd.nix, sops.nix, acme.nix, autoupgrade.nix - Monitoring: node-exporter and promtail on every host - `/services/` - Reusable service modules, selectively imported by hosts - `home-assistant/` - Home automation stack - `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo) - `ns/` - DNS services (authoritative, resolver) - `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc. - `/secrets/` - SOPS-encrypted secrets with age encryption - `/common/` - Shared configurations (e.g., VM guest agent) - `/playbooks/` - Ansible playbooks for fleet management - `/.sops.yaml` - SOPS configuration with age keys for all servers ### Configuration Inheritance Each host follows this import pattern: ``` hosts//default.nix └─> configuration.nix (host-specific) ├─> ../../system (ALL shared system configs - applied to every host) ├─> ../../services/ (selective service imports) └─> ../../common/vm (if VM) ``` All hosts automatically get: - Nix binary cache (nix-cache.home.2rjus.net) - SSH with root login enabled - SOPS secrets management with auto-generated age keys - Internal ACME CA integration (ca.home.2rjus.net) - Daily auto-upgrades with auto-reboot - Prometheus node-exporter + Promtail (logs to monitoring01) - Custom root CA trust ### Active Hosts Production servers managed by `rebuild-all.sh`: - `ns1`, `ns2` - Primary/secondary DNS servers (10.69.13.5/6) - `ca` - Internal Certificate Authority - `ha1` - Home Assistant + Zigbee2MQTT + Mosquitto - `http-proxy` - Reverse proxy - `monitoring01` - Full observability stack (Prometheus, Grafana, Loki, Tempo, Pyroscope) - `jelly01` - Jellyfin media server - `nix-cache01` - Binary cache server - `pgdb1` - PostgreSQL database - `nats1` - NATS messaging server - `auth01` - Authentication service Template/test hosts: - `template1` - Base template for cloning new hosts - `nixos-test1` - Test environment ### Flake Inputs - `nixpkgs` - NixOS 25.11 stable (primary) - `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.`) - `sops-nix` - Secrets management - Custom packages from git.t-juice.club: - `backup-helper` - Backup automation module - `alerttonotify` - Alert routing - `labmon` - Lab monitoring ### Network Architecture - Domain: `home.2rjus.net` - Infrastructure subnet: `10.69.13.x` - DNS: ns1/ns2 provide authoritative DNS with primary-secondary setup - Internal CA for ACME certificates (no Let's Encrypt) - Centralized monitoring at monitoring01 - Static networking via systemd-networkd ### Secrets Management - Uses SOPS with age encryption - Each server has unique age key in `.sops.yaml` - Keys auto-generated at `/var/lib/sops-nix/key.txt` on first boot - Shared secrets: `/secrets/secrets.yaml` - Per-host secrets: `/secrets//` - All production servers can decrypt shared secrets; host-specific secrets require specific host keys ### Auto-Upgrade System All hosts pull updates daily from: ``` git+https://git.t-juice.club/torjus/nixos-servers.git ``` Configured in `/system/autoupgrade.nix`: - Random delay to avoid simultaneous upgrades - Auto-reboot after successful upgrade - Systemd service: `nixos-upgrade.service` ### Adding a New Host 1. Create `/hosts//` directory 2. Copy structure from `template1` or similar host 3. Add host entry to `flake.nix` nixosConfigurations 4. Add hostname to dns zone files. Merge to master. Run auto-upgrade on dns servers. 5. User clones template host 6. User runs `prepare-host.sh` on new host, this deletes files which should be regenerated, like ssh host keys, machine-id etc. It also creates a new age key, and prints the public key 7. This key is then added to `.sops.yaml` 8. Create `/secrets//` if needed 9. Configure networking (static IP, DNS servers) 10. Commit changes, and merge to master. 11. Deploy by running `nixos-rebuild boot --flake URL#` on the host. ### Important Patterns **Overlay usage**: Access unstable packages via `pkgs.unstable.` (defined in flake.nix overlay-unstable) **Service composition**: Services in `/services/` are designed to be imported by multiple hosts. Keep them modular and reusable. **Hardware configuration reuse**: Multiple hosts share `/hosts/template/hardware-configuration.nix` for VM instances. **State version**: All hosts use stateVersion `"23.11"` - do not change this on existing hosts. **Firewall**: Disabled on most hosts (trusted network). Enable selectively in host configuration if needed. ### Monitoring Stack All hosts ship metrics and logs to `monitoring01`: - **Metrics**: Prometheus scrapes node-exporter from all hosts - **Logs**: Promtail ships logs to Loki on monitoring01 - **Access**: Grafana at monitoring01 for visualization - **Tracing**: Tempo for distributed tracing - **Profiling**: Pyroscope for continuous profiling ### DNS Architecture - `ns1` (10.69.13.5) - Primary authoritative DNS + resolver - `ns2` (10.69.13.6) - Secondary authoritative DNS (AXFR from ns1) - Zone files managed in `/services/ns/` - All hosts point to ns1/ns2 for DNS resolution