380 lines
15 KiB
Markdown
380 lines
15 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Repository Overview
|
|
|
|
This is a Nix Flake-based NixOS configuration repository for managing a homelab infrastructure consisting of 16 server configurations. The repository uses a modular architecture with shared system configurations, reusable service modules, and per-host customization.
|
|
|
|
## Common Commands
|
|
|
|
### Building Configurations
|
|
|
|
```bash
|
|
# List all available configurations
|
|
nix flake show
|
|
|
|
# Build a specific host configuration locally (without deploying)
|
|
nixos-rebuild build --flake .#<hostname>
|
|
|
|
# Build and check a configuration
|
|
nix build .#nixosConfigurations.<hostname>.config.system.build.toplevel
|
|
```
|
|
|
|
**Important:** Do NOT pipe `nix build` commands to other commands like `tail` or `head`. Piping can hide errors and make builds appear successful when they actually failed. Always run `nix build` without piping to see the full output.
|
|
|
|
```bash
|
|
# BAD - hides errors
|
|
nix build .#create-host 2>&1 | tail -20
|
|
|
|
# GOOD - shows all output and errors
|
|
nix build .#create-host
|
|
```
|
|
|
|
### Deployment
|
|
|
|
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
|
|
|
### Flake Management
|
|
|
|
```bash
|
|
# Check flake for errors
|
|
nix flake check
|
|
```
|
|
Do not run `nix flake update`. Should only be done manually by user.
|
|
|
|
### Development Environment
|
|
|
|
```bash
|
|
# Enter development shell (provides ansible, python3)
|
|
nix develop
|
|
```
|
|
|
|
### Secrets Management
|
|
|
|
Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
|
|
|
### Git Workflow
|
|
|
|
**Important:** Never commit directly to `master` unless the user explicitly asks for it. Always create a feature branch for changes.
|
|
|
|
When starting a new plan or task, the first step should typically be to create and checkout a new branch with an appropriate name (e.g., `git checkout -b dns-automation` or `git checkout -b fix-nginx-config`).
|
|
|
|
### Plan Management
|
|
|
|
When creating plans for large features, follow this workflow:
|
|
|
|
1. When implementation begins, save a copy of the plan to `docs/plans/` (e.g., `docs/plans/feature-name.md`)
|
|
2. Once the feature is fully implemented, move the plan to `docs/plans/completed/`
|
|
|
|
### Git Commit Messages
|
|
|
|
Commit messages should follow the format: `topic: short description`
|
|
|
|
Examples:
|
|
- `flake: add opentofu to devshell`
|
|
- `template2: add proxmox image configuration`
|
|
- `terraform: add VM deployment configuration`
|
|
|
|
### Clipboard
|
|
|
|
To copy text to the clipboard, pipe to `wl-copy` (Wayland):
|
|
|
|
```bash
|
|
echo "text" | wl-copy
|
|
```
|
|
|
|
### NixOS Options and Packages Lookup
|
|
|
|
Two MCP servers are available for searching NixOS options and packages:
|
|
|
|
- **nixpkgs-options** - Search and lookup NixOS configuration option documentation
|
|
- **nixpkgs-packages** - Search and lookup Nix packages from nixpkgs
|
|
|
|
**Session Setup:** At the start of each session, index the nixpkgs revision from `flake.lock` to ensure documentation matches the project's nixpkgs version:
|
|
|
|
1. Read `flake.lock` and find the `nixpkgs` node's `rev` field
|
|
2. Call `index_revision` with that git hash (both servers share the same index)
|
|
|
|
**Options Tools (nixpkgs-options):**
|
|
|
|
- `search_options` - Search for options by name or description (e.g., query "nginx" or "postgresql")
|
|
- `get_option` - Get full details for a specific option (e.g., `services.loki.configuration`)
|
|
- `get_file` - Fetch the source file from nixpkgs that declares an option
|
|
|
|
**Package Tools (nixpkgs-packages):**
|
|
|
|
- `search_packages` - Search for packages by name or description (e.g., query "nginx" or "python")
|
|
- `get_package` - Get full details for a specific package by attribute path (e.g., `firefox`, `python312Packages.requests`)
|
|
- `get_file` - Fetch the source file from nixpkgs that defines a package
|
|
|
|
This ensures documentation matches the exact nixpkgs version (currently NixOS 25.11) used by this flake.
|
|
|
|
## Architecture
|
|
|
|
### Directory Structure
|
|
|
|
- `/flake.nix` - Central flake defining all NixOS configurations
|
|
- `/hosts/<hostname>/` - Per-host configurations
|
|
- `default.nix` - Entry point, imports configuration.nix and services
|
|
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
|
- `/system/` - Shared system-level configurations applied to ALL hosts
|
|
- Core modules: nix.nix, sshd.nix, sops.nix, acme.nix, autoupgrade.nix
|
|
- Monitoring: node-exporter and promtail on every host
|
|
- `/modules/` - Custom NixOS modules
|
|
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
|
|
- `/lib/` - Nix library functions
|
|
- `dns-zone.nix` - DNS zone generation functions
|
|
- `monitoring.nix` - Prometheus scrape target generation functions
|
|
- `/services/` - Reusable service modules, selectively imported by hosts
|
|
- `home-assistant/` - Home automation stack
|
|
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
|
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
|
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
|
- `/secrets/` - SOPS-encrypted secrets with age encryption
|
|
- `/common/` - Shared configurations (e.g., VM guest agent)
|
|
- `/docs/` - Documentation and plans
|
|
- `plans/` - Future plans and proposals
|
|
- `plans/completed/` - Completed plans (moved here when done)
|
|
- `/playbooks/` - Ansible playbooks for fleet management
|
|
- `/.sops.yaml` - SOPS configuration with age keys for all servers
|
|
|
|
### Configuration Inheritance
|
|
|
|
Each host follows this import pattern:
|
|
```
|
|
hosts/<hostname>/default.nix
|
|
└─> configuration.nix (host-specific)
|
|
├─> ../../system (ALL shared system configs - applied to every host)
|
|
├─> ../../services/<service> (selective service imports)
|
|
└─> ../../common/vm (if VM)
|
|
```
|
|
|
|
All hosts automatically get:
|
|
- Nix binary cache (nix-cache.home.2rjus.net)
|
|
- SSH with root login enabled
|
|
- SOPS secrets management with auto-generated age keys
|
|
- Internal ACME CA integration (ca.home.2rjus.net)
|
|
- Daily auto-upgrades with auto-reboot
|
|
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
|
- Monitoring scrape target auto-registration via `homelab.monitoring` options
|
|
- Custom root CA trust
|
|
- DNS zone auto-registration via `homelab.dns` options
|
|
|
|
### Active Hosts
|
|
|
|
Production servers managed by `rebuild-all.sh`:
|
|
- `ns1`, `ns2` - Primary/secondary DNS servers (10.69.13.5/6)
|
|
- `ca` - Internal Certificate Authority
|
|
- `ha1` - Home Assistant + Zigbee2MQTT + Mosquitto
|
|
- `http-proxy` - Reverse proxy
|
|
- `monitoring01` - Full observability stack (Prometheus, Grafana, Loki, Tempo, Pyroscope)
|
|
- `jelly01` - Jellyfin media server
|
|
- `nix-cache01` - Binary cache server
|
|
- `pgdb1` - PostgreSQL database
|
|
- `nats1` - NATS messaging server
|
|
- `auth01` - Authentication service
|
|
|
|
Template/test hosts:
|
|
- `template1` - Base template for cloning new hosts
|
|
- `nixos-test1` - Test environment
|
|
|
|
### Flake Inputs
|
|
|
|
- `nixpkgs` - NixOS 25.11 stable (primary)
|
|
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
|
- `sops-nix` - Secrets management
|
|
- Custom packages from git.t-juice.club:
|
|
- `alerttonotify` - Alert routing
|
|
- `labmon` - Lab monitoring
|
|
|
|
### Network Architecture
|
|
|
|
- Domain: `home.2rjus.net`
|
|
- Infrastructure subnet: `10.69.13.x`
|
|
- DNS: ns1/ns2 provide authoritative DNS with primary-secondary setup
|
|
- Internal CA for ACME certificates (no Let's Encrypt)
|
|
- Centralized monitoring at monitoring01
|
|
- Static networking via systemd-networkd
|
|
|
|
### Secrets Management
|
|
|
|
- Uses SOPS with age encryption
|
|
- Each server has unique age key in `.sops.yaml`
|
|
- Keys auto-generated at `/var/lib/sops-nix/key.txt` on first boot
|
|
- Shared secrets: `/secrets/secrets.yaml`
|
|
- Per-host secrets: `/secrets/<hostname>/`
|
|
- All production servers can decrypt shared secrets; host-specific secrets require specific host keys
|
|
|
|
### Auto-Upgrade System
|
|
|
|
All hosts pull updates daily from:
|
|
```
|
|
git+https://git.t-juice.club/torjus/nixos-servers.git
|
|
```
|
|
|
|
Configured in `/system/autoupgrade.nix`:
|
|
- Random delay to avoid simultaneous upgrades
|
|
- Auto-reboot after successful upgrade
|
|
- Systemd service: `nixos-upgrade.service`
|
|
|
|
### Proxmox VM Provisioning with OpenTofu
|
|
|
|
The repository includes automated workflows for building Proxmox VM templates and deploying VMs using OpenTofu (Terraform).
|
|
|
|
#### Building and Deploying Templates
|
|
|
|
Template VMs are built from `hosts/template2` and deployed to Proxmox using Ansible:
|
|
|
|
```bash
|
|
# Build NixOS image and deploy to Proxmox as template
|
|
nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml
|
|
```
|
|
|
|
This playbook:
|
|
1. Builds the Proxmox image using `nixos-rebuild build-image --image-variant proxmox`
|
|
2. Uploads the `.vma.zst` image to Proxmox at `/var/lib/vz/dump`
|
|
3. Restores it as VM ID 9000
|
|
4. Converts it to a template
|
|
|
|
Template configuration (`hosts/template2`):
|
|
- Minimal base system with essential packages (age, vim, wget, git)
|
|
- Cloud-init configured for NoCloud datasource (no EC2 metadata timeout)
|
|
- DHCP networking on ens18
|
|
- SSH key-based root login
|
|
- `prepare-host.sh` script for cleaning machine-id, SSH keys, and regenerating age keys
|
|
|
|
#### Deploying VMs with OpenTofu
|
|
|
|
VMs are deployed from templates using OpenTofu in the `/terraform` directory:
|
|
|
|
```bash
|
|
cd terraform
|
|
tofu init # First time only
|
|
tofu apply # Deploy VMs
|
|
```
|
|
|
|
Configuration files:
|
|
- `main.tf` - Proxmox provider configuration
|
|
- `variables.tf` - Provider variables (API credentials)
|
|
- `vm.tf` - VM resource definitions
|
|
- `terraform.tfvars` - Actual credentials (gitignored)
|
|
|
|
Example VM deployment includes:
|
|
- Clone from template VM
|
|
- Cloud-init configuration (SSH keys, network, DNS)
|
|
- Custom CPU/memory/disk sizing
|
|
- VLAN tagging
|
|
- QEMU guest agent
|
|
|
|
OpenTofu outputs the VM's IP address after deployment for easy SSH access.
|
|
|
|
#### Template Rebuilding and Terraform State
|
|
|
|
When the Proxmox template is rebuilt (via `build-and-deploy-template.yml`), the template name may change. This would normally cause Terraform to want to recreate all existing VMs, but that's unnecessary since VMs are independent once cloned.
|
|
|
|
**Solution**: The `terraform/vms.tf` file includes a lifecycle rule to ignore certain attributes that don't need management:
|
|
|
|
```hcl
|
|
lifecycle {
|
|
ignore_changes = [
|
|
clone, # Template name can change without recreating VMs
|
|
startup_shutdown, # Proxmox sets defaults (-1) that we don't need to manage
|
|
]
|
|
}
|
|
```
|
|
|
|
This means:
|
|
- **clone**: Existing VMs are not affected by template name changes; only new VMs use the updated template
|
|
- **startup_shutdown**: Proxmox sets default startup order/delay values (-1) that Terraform would otherwise try to remove
|
|
- You can safely update `default_template_name` in `terraform/variables.tf` without recreating VMs
|
|
- `tofu plan` won't show spurious changes for Proxmox-managed defaults
|
|
|
|
**When rebuilding the template:**
|
|
1. Run `nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml`
|
|
2. Update `default_template_name` in `terraform/variables.tf` if the name changed
|
|
3. Run `tofu plan` - should show no VM recreations (only template name in state)
|
|
4. Run `tofu apply` - updates state without touching existing VMs
|
|
5. New VMs created after this point will use the new template
|
|
|
|
### Adding a New Host
|
|
|
|
1. Create `/hosts/<hostname>/` directory
|
|
2. Copy structure from `template1` or similar host
|
|
3. Add host entry to `flake.nix` nixosConfigurations
|
|
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
|
|
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
|
|
6. User clones template host
|
|
7. User runs `prepare-host.sh` on new host, this deletes files which should be regenerated, like ssh host keys, machine-id etc. It also creates a new age key, and prints the public key
|
|
8. This key is then added to `.sops.yaml`
|
|
9. Create `/secrets/<hostname>/` if needed
|
|
10. Commit changes, and merge to master.
|
|
11. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
|
12. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
|
|
|
|
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
|
|
|
### Important Patterns
|
|
|
|
**Overlay usage**: Access unstable packages via `pkgs.unstable.<package>` (defined in flake.nix overlay-unstable)
|
|
|
|
**Service composition**: Services in `/services/` are designed to be imported by multiple hosts. Keep them modular and reusable.
|
|
|
|
**Hardware configuration reuse**: Multiple hosts share `/hosts/template/hardware-configuration.nix` for VM instances.
|
|
|
|
**State version**: All hosts use stateVersion `"23.11"` - do not change this on existing hosts.
|
|
|
|
**Firewall**: Disabled on most hosts (trusted network). Enable selectively in host configuration if needed.
|
|
|
|
### Monitoring Stack
|
|
|
|
All hosts ship metrics and logs to `monitoring01`:
|
|
- **Metrics**: Prometheus scrapes node-exporter from all hosts
|
|
- **Logs**: Promtail ships logs to Loki on monitoring01
|
|
- **Access**: Grafana at monitoring01 for visualization
|
|
- **Tracing**: Tempo for distributed tracing
|
|
- **Profiling**: Pyroscope for continuous profiling
|
|
|
|
**Scrape Target Auto-Generation:**
|
|
|
|
Prometheus scrape targets are automatically generated from host configurations, following the same pattern as DNS zone generation:
|
|
|
|
- **Node-exporter**: All flake hosts with static IPs are automatically added as node-exporter targets
|
|
- **Service targets**: Defined via `homelab.monitoring.scrapeTargets` in service modules
|
|
- **External targets**: Non-flake hosts defined in `/services/monitoring/external-targets.nix`
|
|
- **Library**: `lib/monitoring.nix` provides `generateNodeExporterTargets` and `generateScrapeConfigs`
|
|
|
|
Host monitoring options (`homelab.monitoring.*`):
|
|
- `enable` (default: `true`) - Include host in Prometheus node-exporter scrape targets
|
|
- `scrapeTargets` (default: `[]`) - Additional scrape targets exposed by this host (job_name, port, metrics_path, scheme, scrape_interval, honor_labels)
|
|
|
|
Service modules declare their scrape targets directly (e.g., `services/ca/default.nix` declares step-ca on port 9000). The Prometheus config on monitoring01 auto-generates scrape configs from all hosts.
|
|
|
|
To add monitoring targets for non-NixOS hosts, edit `/services/monitoring/external-targets.nix`.
|
|
|
|
### DNS Architecture
|
|
|
|
- `ns1` (10.69.13.5) - Primary authoritative DNS + resolver
|
|
- `ns2` (10.69.13.6) - Secondary authoritative DNS (AXFR from ns1)
|
|
- All hosts point to ns1/ns2 for DNS resolution
|
|
|
|
**Zone Auto-Generation:**
|
|
|
|
DNS zone entries are automatically generated from host configurations:
|
|
|
|
- **Flake-managed hosts**: A records extracted from `systemd.network.networks` static IPs
|
|
- **CNAMEs**: Defined via `homelab.dns.cnames` option in host configs
|
|
- **External hosts**: Non-flake hosts defined in `/services/ns/external-hosts.nix`
|
|
- **Serial number**: Uses `self.sourceInfo.lastModified` (git commit timestamp)
|
|
|
|
Host DNS options (`homelab.dns.*`):
|
|
- `enable` (default: `true`) - Include host in DNS zone generation
|
|
- `cnames` (default: `[]`) - List of CNAME aliases pointing to this host
|
|
|
|
Hosts are automatically excluded from DNS if:
|
|
- `homelab.dns.enable = false` (e.g., template hosts)
|
|
- No static IP configured (e.g., DHCP-only hosts)
|
|
- Network interface is a VPN/tunnel (wg*, tun*, tap*)
|
|
|
|
To add DNS entries for non-NixOS hosts, edit `/services/ns/external-hosts.nix`.
|