Compare commits
89 Commits
01d4812280
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
|
f2c30cc24f
|
|||
|
7e80d2e0bc
|
|||
|
1f5b7b13e2
|
|||
|
c53e36c3f3
|
|||
|
04a252b857
|
|||
|
5d26f52e0d
|
|||
|
506a692548
|
|||
|
fa8f4f0784
|
|||
|
025570dea1
|
|||
|
15c00393f1
|
|||
|
787c14c7a6
|
|||
|
eee3dde04f
|
|||
| 682b07b977 | |||
| 70661ac3d9 | |||
|
506e93a5e2
|
|||
|
b6c41aa910
|
|||
| aa6e00a327 | |||
|
258e350b89
|
|||
|
eba195c192
|
|||
|
bbb22e588e
|
|||
|
879e7aba60
|
|||
|
39a4ea98ab
|
|||
| 1d90dc2181 | |||
|
e9857afc11
|
|||
| 88e9036cb4 | |||
|
59e1962d75
|
|||
|
3dc4422ba0
|
|||
|
f0963624bc
|
|||
| 7b46f94e48 | |||
|
32968147b5
|
|||
|
c515a6b4e1
|
|||
|
4d8b94ce83
|
|||
|
8b0a4ea33a
|
|||
| 5be1f43c24 | |||
|
b322b1156b
|
|||
|
3cccfc0487
|
|||
|
41d4226812
|
|||
|
351fb6f720
|
|||
|
7d92c55d37
|
|||
|
6d117d68ca
|
|||
| a46fbdaa70 | |||
|
2c9d86eaf2
|
|||
|
ccb1c3fe2e
|
|||
|
0700033c0a
|
|||
|
4d33018285
|
|||
|
678fd3d6de
|
|||
|
9d74aa5c04
|
|||
|
fe80ec3576
|
|||
|
870fb3e532
|
|||
|
e602e8d70b
|
|||
|
28b8d7c115
|
|||
|
64f2688349
|
|||
|
09d9d71e2b
|
|||
|
cc799f5929
|
|||
|
0abdda8e8a
|
|||
| 4076361bf7 | |||
|
0ef63ad874
|
|||
| 8f29141dd1 | |||
|
3a9a47f1ad
|
|||
|
fa6380e767
|
|||
|
86a077e152
|
|||
| 9da57c6a2f | |||
| da9dd02d10 | |||
|
e7980978c7
|
|||
|
dd1b64de27
|
|||
|
4e8cc124f2
|
|||
|
a2a55f3955
|
|||
|
c38034ba41
|
|||
|
d7d4b0846c
|
|||
| 8ca7c4e402 | |||
|
106912499b
|
|||
|
83af00458b
|
|||
|
67d5de3eb8
|
|||
|
cee1b264cd
|
|||
|
4ceee04308
|
|||
| e3ced5bcda | |||
| 15459870cd | |||
|
d1861eefb5
|
|||
|
d25fc99e1d
|
|||
|
b5da9431aa
|
|||
| 0e5dea635e | |||
| 86249c466b | |||
| 5d560267cf | |||
|
63662b89e0
|
|||
|
7ae474fd3e
|
|||
|
f0525b5c74
|
|||
|
42c391b355
|
|||
|
048536ba70
|
|||
| cccce09406 |
89
.claude/skills/quick-plan/SKILL.md
Normal file
89
.claude/skills/quick-plan/SKILL.md
Normal file
@@ -0,0 +1,89 @@
|
||||
---
|
||||
name: quick-plan
|
||||
description: Create a planning document for a future homelab project. Use when the user wants to document ideas for future work without implementing immediately.
|
||||
argument-hint: [topic or feature to plan]
|
||||
---
|
||||
|
||||
# Quick Plan Generator
|
||||
|
||||
Create a planning document for a future homelab infrastructure project. Plans are for documenting ideas and approaches that will be implemented later, not immediately.
|
||||
|
||||
## Input
|
||||
|
||||
The user provides: $ARGUMENTS
|
||||
|
||||
## Process
|
||||
|
||||
1. **Understand the topic**: Research the codebase to understand:
|
||||
- Current state of related systems
|
||||
- Existing patterns and conventions
|
||||
- Relevant NixOS options or packages
|
||||
- Any constraints or dependencies
|
||||
|
||||
2. **Evaluate options**: If there are multiple approaches, research and compare them with pros/cons.
|
||||
|
||||
3. **Draft the plan**: Create a markdown document following the structure below.
|
||||
|
||||
4. **Save the plan**: Write to `docs/plans/<topic-slug>.md` using a kebab-case filename derived from the topic.
|
||||
|
||||
## Plan Structure
|
||||
|
||||
Use these sections as appropriate (not all plans need every section):
|
||||
|
||||
```markdown
|
||||
# Title
|
||||
|
||||
## Overview/Goal
|
||||
Brief description of what this plan addresses and why.
|
||||
|
||||
## Current State
|
||||
What exists today that's relevant to this plan.
|
||||
|
||||
## Options Evaluated (if multiple approaches)
|
||||
For each option:
|
||||
- **Option Name**
|
||||
- **Pros:** bullet points
|
||||
- **Cons:** bullet points
|
||||
- **Verdict:** brief assessment
|
||||
|
||||
Or use a comparison table for structured evaluation.
|
||||
|
||||
## Recommendation/Decision
|
||||
What approach is recommended and why. Include rationale.
|
||||
|
||||
## Implementation Steps
|
||||
Numbered phases or steps. Be specific but not overly detailed.
|
||||
Can use sub-sections for major phases.
|
||||
|
||||
## Open Questions
|
||||
Things still to be determined. Use checkbox format:
|
||||
- [ ] Question 1?
|
||||
- [ ] Question 2?
|
||||
|
||||
## Notes (optional)
|
||||
Additional context, caveats, or references.
|
||||
```
|
||||
|
||||
## Style Guidelines
|
||||
|
||||
- **Concise**: Use bullet points, avoid verbose paragraphs
|
||||
- **Technical but accessible**: Include NixOS config snippets when relevant
|
||||
- **Future-oriented**: These are plans, not specifications
|
||||
- **Acknowledge uncertainty**: Use "Open Questions" for unresolved decisions
|
||||
- **Reference existing patterns**: Mention how this fits with existing infrastructure
|
||||
- **Tables for comparisons**: Use markdown tables when comparing options
|
||||
- **Practical focus**: Emphasize what needs to happen, not theory
|
||||
|
||||
## Examples of Good Plans
|
||||
|
||||
Reference these existing plans for style guidance:
|
||||
- `docs/plans/auth-system-replacement.md` - Good option evaluation with table
|
||||
- `docs/plans/truenas-migration.md` - Good decision documentation with rationale
|
||||
- `docs/plans/remote-access.md` - Good multi-option comparison
|
||||
- `docs/plans/prometheus-scrape-target-labels.md` - Good implementation detail level
|
||||
|
||||
## After Creating the Plan
|
||||
|
||||
1. Tell the user the plan was saved to `docs/plans/<filename>.md`
|
||||
2. Summarize the key points
|
||||
3. Ask if they want any adjustments before committing
|
||||
28
.mcp.json
Normal file
28
.mcp.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"nixpkgs-options": {
|
||||
"command": "nix",
|
||||
"args": ["run", "git+https://git.t-juice.club/torjus/labmcp#nixpkgs-search", "--", "options", "serve"],
|
||||
"env": {
|
||||
"NIXPKGS_SEARCH_DATABASE": "sqlite:///run/user/1000/labmcp/nixpkgs-search.db"
|
||||
}
|
||||
},
|
||||
"nixpkgs-packages": {
|
||||
"command": "nix",
|
||||
"args": ["run", "git+https://git.t-juice.club/torjus/labmcp#nixpkgs-search", "--", "packages", "serve"],
|
||||
"env": {
|
||||
"NIXPKGS_SEARCH_DATABASE": "sqlite:///run/user/1000/labmcp/nixpkgs-search.db"
|
||||
}
|
||||
},
|
||||
"lab-monitoring": {
|
||||
"command": "nix",
|
||||
"args": ["run", "git+https://git.t-juice.club/torjus/labmcp#lab-monitoring", "--", "serve", "--enable-silences"],
|
||||
"env": {
|
||||
"PROMETHEUS_URL": "https://prometheus.home.2rjus.net",
|
||||
"ALERTMANAGER_URL": "https://alertmanager.home.2rjus.net",
|
||||
"LOKI_URL": "http://monitoring01.home.2rjus.net:3100"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
20
.sops.yaml
20
.sops.yaml
@@ -2,11 +2,7 @@ keys:
|
||||
- &admin_torjus age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
||||
- &server_ns1 age1hz2lz4k050ru3shrk5j3zk3f8azxmrp54pktw5a7nzjml4saudesx6jsl0
|
||||
- &server_ns2 age1w2q4gm2lrcgdzscq8du3ssyvk6qtzm4fcszc92z9ftclq23yyydqdga5um
|
||||
- &server_ns3 age1snmhmpavqy7xddmw4nuny0u4xusqmnqxqarjmghkm5zaluff84eq5xatrd
|
||||
- &server_ns4 age12a3nyvjs8jrwmpkf3tgawel3nwcklwsr35ktmytnvhpawqwzrsfqpgcy0q
|
||||
- &server_ha1 age1d2w5zece9647qwyq4vas9qyqegg96xwmg6c86440a6eg4uj6dd2qrq0w3l
|
||||
- &server_nixos-test1 age1gcyfkxh4fq5zdp0dh484aj82ksz66wrly7qhnpv0r0p576sn9ekse8e9ju
|
||||
- &server_inc1 age1g5luz2rtel3surgzuh62rkvtey7lythrvfenyq954vmeyfpxjqkqdj3wt8
|
||||
- &server_http-proxy age1gq8434ku0xekqmvnseeunv83e779cg03c06gwrusnymdsr3rpufqx6vr3m
|
||||
- &server_ca age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk
|
||||
- &server_monitoring01 age1vpns76ykll8jgdlu3h05cur4ew2t3k7u03kxdg8y6ypfhsfhq9fqyurjey
|
||||
@@ -14,7 +10,6 @@ keys:
|
||||
- &server_nix-cache01 age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
||||
- &server_pgdb1 age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
||||
- &server_nats1 age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
||||
- &server_auth01 age16prza00sqzuhwwcyakj6z4hvwkruwkqpmmrsn94a5ucgpkelncdq2ldctk
|
||||
creation_rules:
|
||||
- path_regex: secrets/[^/]+\.(yaml|json|env|ini)
|
||||
key_groups:
|
||||
@@ -22,11 +17,7 @@ creation_rules:
|
||||
- *admin_torjus
|
||||
- *server_ns1
|
||||
- *server_ns2
|
||||
- *server_ns3
|
||||
- *server_ns4
|
||||
- *server_ha1
|
||||
- *server_nixos-test1
|
||||
- *server_inc1
|
||||
- *server_http-proxy
|
||||
- *server_ca
|
||||
- *server_monitoring01
|
||||
@@ -34,12 +25,6 @@ creation_rules:
|
||||
- *server_nix-cache01
|
||||
- *server_pgdb1
|
||||
- *server_nats1
|
||||
- *server_auth01
|
||||
- path_regex: secrets/ns3/[^/]+\.(yaml|json|env|ini)
|
||||
key_groups:
|
||||
- age:
|
||||
- *admin_torjus
|
||||
- *server_ns3
|
||||
- path_regex: secrets/ca/[^/]+\.(yaml|json|env|ini|)
|
||||
key_groups:
|
||||
- age:
|
||||
@@ -65,8 +50,3 @@ creation_rules:
|
||||
- age:
|
||||
- *admin_torjus
|
||||
- *server_http-proxy
|
||||
- path_regex: secrets/auth01/[^/]+\.(yaml|json|env|ini|)
|
||||
key_groups:
|
||||
- age:
|
||||
- *admin_torjus
|
||||
- *server_auth01
|
||||
|
||||
238
CLAUDE.md
238
CLAUDE.md
@@ -35,6 +35,21 @@ nix build .#create-host
|
||||
|
||||
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
||||
|
||||
### Testing Feature Branches on Hosts
|
||||
|
||||
All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging:
|
||||
|
||||
```bash
|
||||
# On the target host, test a feature branch
|
||||
nixos-rebuild-test boot <branch-name>
|
||||
nixos-rebuild-test switch <branch-name>
|
||||
|
||||
# Additional arguments are passed through to nixos-rebuild
|
||||
nixos-rebuild-test boot my-feature --show-trace
|
||||
```
|
||||
|
||||
When working on a feature branch that requires testing on a live host, suggest using this command instead of the full flake URL syntax.
|
||||
|
||||
### Flake Management
|
||||
|
||||
```bash
|
||||
@@ -52,7 +67,27 @@ nix develop
|
||||
|
||||
### Secrets Management
|
||||
|
||||
Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
||||
Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the
|
||||
`vault.secrets` option defined in `system/vault-secrets.nix` to fetch secrets at boot.
|
||||
Terraform manages the secrets and AppRole policies in `terraform/vault/`.
|
||||
|
||||
Legacy sops-nix is still present but only actively used by the `ca` host. Do not edit any
|
||||
`.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
||||
|
||||
### Git Workflow
|
||||
|
||||
**Important:** Never commit directly to `master` unless the user explicitly asks for it. Always create a feature branch for changes.
|
||||
|
||||
**Important:** Never amend commits to `master` unless the user explicitly asks for it. Amending rewrites history and causes issues for deployed configurations.
|
||||
|
||||
When starting a new plan or task, the first step should typically be to create and checkout a new branch with an appropriate name (e.g., `git checkout -b dns-automation` or `git checkout -b fix-nginx-config`).
|
||||
|
||||
### Plan Management
|
||||
|
||||
When creating plans for large features, follow this workflow:
|
||||
|
||||
1. When implementation begins, save a copy of the plan to `docs/plans/` (e.g., `docs/plans/feature-name.md`)
|
||||
2. Once the feature is fully implemented, move the plan to `docs/plans/completed/`
|
||||
|
||||
### Git Commit Messages
|
||||
|
||||
@@ -63,26 +98,130 @@ Examples:
|
||||
- `template2: add proxmox image configuration`
|
||||
- `terraform: add VM deployment configuration`
|
||||
|
||||
### Clipboard
|
||||
|
||||
To copy text to the clipboard, pipe to `wl-copy` (Wayland):
|
||||
|
||||
```bash
|
||||
echo "text" | wl-copy
|
||||
```
|
||||
|
||||
### NixOS Options and Packages Lookup
|
||||
|
||||
Two MCP servers are available for searching NixOS options and packages:
|
||||
|
||||
- **nixpkgs-options** - Search and lookup NixOS configuration option documentation
|
||||
- **nixpkgs-packages** - Search and lookup Nix packages from nixpkgs
|
||||
|
||||
**Session Setup:** At the start of each session, index the nixpkgs revision from `flake.lock` to ensure documentation matches the project's nixpkgs version:
|
||||
|
||||
1. Read `flake.lock` and find the `nixpkgs` node's `rev` field
|
||||
2. Call `index_revision` with that git hash (both servers share the same index)
|
||||
|
||||
**Options Tools (nixpkgs-options):**
|
||||
|
||||
- `search_options` - Search for options by name or description (e.g., query "nginx" or "postgresql")
|
||||
- `get_option` - Get full details for a specific option (e.g., `services.loki.configuration`)
|
||||
- `get_file` - Fetch the source file from nixpkgs that declares an option
|
||||
|
||||
**Package Tools (nixpkgs-packages):**
|
||||
|
||||
- `search_packages` - Search for packages by name or description (e.g., query "nginx" or "python")
|
||||
- `get_package` - Get full details for a specific package by attribute path (e.g., `firefox`, `python312Packages.requests`)
|
||||
- `get_file` - Fetch the source file from nixpkgs that defines a package
|
||||
|
||||
This ensures documentation matches the exact nixpkgs version (currently NixOS 25.11) used by this flake.
|
||||
|
||||
### Lab Monitoring Log Queries
|
||||
|
||||
The **lab-monitoring** MCP server can query logs from Loki. All hosts ship systemd journal logs via Promtail.
|
||||
|
||||
**Loki Label Reference:**
|
||||
|
||||
- `host` - Hostname (e.g., `ns1`, `ns2`, `monitoring01`, `ha1`). Use this label, not `hostname`.
|
||||
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `prometheus.service`, `nixos-upgrade.service`)
|
||||
- `job` - Either `systemd-journal` (most logs) or `varlog` (file-based logs like caddy access logs)
|
||||
- `filename` - For `varlog` job, the log file path (e.g., `/var/log/caddy/nix-cache.log`)
|
||||
|
||||
Journal log entries are JSON-formatted with the actual log message in the `MESSAGE` field. Other useful fields include `PRIORITY` and `SYSLOG_IDENTIFIER`.
|
||||
|
||||
**Example LogQL queries:**
|
||||
```
|
||||
# Logs from a specific service on a host
|
||||
{host="ns2", systemd_unit="nsd.service"}
|
||||
|
||||
# Substring match on log content
|
||||
{host="ns1", systemd_unit="nsd.service"} |= "error"
|
||||
|
||||
# File-based logs (e.g., caddy access logs)
|
||||
{job="varlog", hostname="nix-cache01"}
|
||||
```
|
||||
|
||||
Default lookback is 1 hour. Use the `start` parameter with relative durations (e.g., `24h`, `168h`) for older logs.
|
||||
|
||||
### Lab Monitoring Prometheus Queries
|
||||
|
||||
The **lab-monitoring** MCP server can query Prometheus metrics via PromQL. The `instance` label uses the FQDN format `<host>.home.2rjus.net:<port>`.
|
||||
|
||||
**Prometheus Job Names:**
|
||||
|
||||
- `node-exporter` - System metrics from all hosts (CPU, memory, disk, network)
|
||||
- `caddy` - Reverse proxy metrics (http-proxy)
|
||||
- `nix-cache_caddy` - Nix binary cache metrics
|
||||
- `home-assistant` - Home automation metrics
|
||||
- `jellyfin` - Media server metrics
|
||||
- `loki` / `prometheus` / `grafana` - Monitoring stack self-metrics
|
||||
- `step-ca` - Internal CA metrics
|
||||
- `pve-exporter` - Proxmox hypervisor metrics
|
||||
- `smartctl` - Disk SMART health (gunter)
|
||||
- `wireguard` - VPN metrics (http-proxy)
|
||||
- `pushgateway` - Push-based metrics (e.g., backup results)
|
||||
- `restic_rest` - Backup server metrics
|
||||
- `labmon` / `ghettoptt` / `alertmanager` - Other service metrics
|
||||
|
||||
**Example PromQL queries:**
|
||||
```
|
||||
# Check all targets are up
|
||||
up
|
||||
|
||||
# CPU usage for a specific host
|
||||
rate(node_cpu_seconds_total{instance=~"ns1.*", mode!="idle"}[5m])
|
||||
|
||||
# Memory usage across all hosts
|
||||
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
|
||||
|
||||
# Disk space
|
||||
node_filesystem_avail_bytes{mountpoint="/"}
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Directory Structure
|
||||
|
||||
- `/flake.nix` - Central flake defining all 16 NixOS configurations
|
||||
- `/flake.nix` - Central flake defining all NixOS configurations
|
||||
- `/hosts/<hostname>/` - Per-host configurations
|
||||
- `default.nix` - Entry point, imports configuration.nix and services
|
||||
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
||||
- `/system/` - Shared system-level configurations applied to ALL hosts
|
||||
- Core modules: nix.nix, sshd.nix, sops.nix, acme.nix, autoupgrade.nix
|
||||
- Core modules: nix.nix, sshd.nix, sops.nix (legacy), vault-secrets.nix, acme.nix, autoupgrade.nix
|
||||
- Monitoring: node-exporter and promtail on every host
|
||||
- `/modules/` - Custom NixOS modules
|
||||
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
|
||||
- `/lib/` - Nix library functions
|
||||
- `dns-zone.nix` - DNS zone generation functions
|
||||
- `monitoring.nix` - Prometheus scrape target generation functions
|
||||
- `/services/` - Reusable service modules, selectively imported by hosts
|
||||
- `home-assistant/` - Home automation stack
|
||||
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
||||
- `ns/` - DNS services (authoritative, resolver)
|
||||
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
||||
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
||||
- `/secrets/` - SOPS-encrypted secrets with age encryption
|
||||
- `/secrets/` - SOPS-encrypted secrets with age encryption (legacy, only used by ca)
|
||||
- `/common/` - Shared configurations (e.g., VM guest agent)
|
||||
- `/docs/` - Documentation and plans
|
||||
- `plans/` - Future plans and proposals
|
||||
- `plans/completed/` - Completed plans (moved here when done)
|
||||
- `/playbooks/` - Ansible playbooks for fleet management
|
||||
- `/.sops.yaml` - SOPS configuration with age keys for all servers
|
||||
- `/.sops.yaml` - SOPS configuration with age keys (legacy, only used by ca)
|
||||
|
||||
### Configuration Inheritance
|
||||
|
||||
@@ -98,11 +237,13 @@ hosts/<hostname>/default.nix
|
||||
All hosts automatically get:
|
||||
- Nix binary cache (nix-cache.home.2rjus.net)
|
||||
- SSH with root login enabled
|
||||
- SOPS secrets management with auto-generated age keys
|
||||
- OpenBao (Vault) secrets management via AppRole
|
||||
- Internal ACME CA integration (ca.home.2rjus.net)
|
||||
- Daily auto-upgrades with auto-reboot
|
||||
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
||||
- Monitoring scrape target auto-registration via `homelab.monitoring` options
|
||||
- Custom root CA trust
|
||||
- DNS zone auto-registration via `homelab.dns` options
|
||||
|
||||
### Active Hosts
|
||||
|
||||
@@ -116,19 +257,16 @@ Production servers managed by `rebuild-all.sh`:
|
||||
- `nix-cache01` - Binary cache server
|
||||
- `pgdb1` - PostgreSQL database
|
||||
- `nats1` - NATS messaging server
|
||||
- `auth01` - Authentication service
|
||||
|
||||
Template/test hosts:
|
||||
- `template1` - Base template for cloning new hosts
|
||||
- `nixos-test1` - Test environment
|
||||
|
||||
### Flake Inputs
|
||||
|
||||
- `nixpkgs` - NixOS 25.11 stable (primary)
|
||||
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
||||
- `sops-nix` - Secrets management
|
||||
- `sops-nix` - Secrets management (legacy, only used by ca)
|
||||
- Custom packages from git.t-juice.club:
|
||||
- `backup-helper` - Backup automation module
|
||||
- `alerttonotify` - Alert routing
|
||||
- `labmon` - Lab monitoring
|
||||
|
||||
@@ -143,12 +281,21 @@ Template/test hosts:
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- Uses SOPS with age encryption
|
||||
- Each server has unique age key in `.sops.yaml`
|
||||
- Keys auto-generated at `/var/lib/sops-nix/key.txt` on first boot
|
||||
Most hosts use OpenBao (Vault) for secrets:
|
||||
- Vault server at `vault01.home.2rjus.net:8200`
|
||||
- AppRole authentication with credentials at `/var/lib/vault/approle/`
|
||||
- Secrets defined in Terraform (`terraform/vault/secrets.tf`)
|
||||
- AppRole policies in Terraform (`terraform/vault/approle.tf`)
|
||||
- NixOS module: `system/vault-secrets.nix` with `vault.secrets.<name>` options
|
||||
- `extractKey` option extracts a single key from vault JSON as a plain file
|
||||
- Secrets fetched at boot by `vault-secret-<name>.service` systemd units
|
||||
- Fallback to cached secrets in `/var/lib/vault/cache/` when Vault is unreachable
|
||||
- Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
||||
|
||||
Legacy SOPS (only used by `ca` host):
|
||||
- SOPS with age encryption, keys in `.sops.yaml`
|
||||
- Shared secrets: `/secrets/secrets.yaml`
|
||||
- Per-host secrets: `/secrets/<hostname>/`
|
||||
- All production servers can decrypt shared secrets; host-specific secrets require specific host keys
|
||||
|
||||
### Auto-Upgrade System
|
||||
|
||||
@@ -246,14 +393,19 @@ This means:
|
||||
1. Create `/hosts/<hostname>/` directory
|
||||
2. Copy structure from `template1` or similar host
|
||||
3. Add host entry to `flake.nix` nixosConfigurations
|
||||
4. Add hostname to dns zone files. Merge to master. Run auto-upgrade on dns servers.
|
||||
5. User clones template host
|
||||
6. User runs `prepare-host.sh` on new host, this deletes files which should be regenerated, like ssh host keys, machine-id etc. It also creates a new age key, and prints the public key
|
||||
7. This key is then added to `.sops.yaml`
|
||||
8. Create `/secrets/<hostname>/` if needed
|
||||
9. Configure networking (static IP, DNS servers)
|
||||
10. Commit changes, and merge to master.
|
||||
11. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
||||
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
|
||||
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
|
||||
6. Add `vault.enable = true;` to the host configuration
|
||||
7. Add AppRole policy in `terraform/vault/approle.tf` and any secrets in `secrets.tf`
|
||||
8. Run `tofu apply` in `terraform/vault/`
|
||||
9. User clones template host
|
||||
10. User runs `prepare-host.sh` on new host
|
||||
11. Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
||||
12. Commit changes, and merge to master.
|
||||
13. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
||||
14. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
|
||||
|
||||
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
||||
|
||||
### Important Patterns
|
||||
|
||||
@@ -267,6 +419,8 @@ This means:
|
||||
|
||||
**Firewall**: Disabled on most hosts (trusted network). Enable selectively in host configuration if needed.
|
||||
|
||||
**Shell scripts**: Use `pkgs.writeShellApplication` instead of `pkgs.writeShellScript` or `pkgs.writeShellScriptBin` for creating shell scripts. `writeShellApplication` provides automatic shellcheck validation, sets strict bash options (`set -euo pipefail`), and allows declaring `runtimeInputs` for dependencies. When referencing the executable path (e.g., in `ExecStart`), use `lib.getExe myScript` to get the proper `bin/` path.
|
||||
|
||||
### Monitoring Stack
|
||||
|
||||
All hosts ship metrics and logs to `monitoring01`:
|
||||
@@ -276,9 +430,45 @@ All hosts ship metrics and logs to `monitoring01`:
|
||||
- **Tracing**: Tempo for distributed tracing
|
||||
- **Profiling**: Pyroscope for continuous profiling
|
||||
|
||||
**Scrape Target Auto-Generation:**
|
||||
|
||||
Prometheus scrape targets are automatically generated from host configurations, following the same pattern as DNS zone generation:
|
||||
|
||||
- **Node-exporter**: All flake hosts with static IPs are automatically added as node-exporter targets
|
||||
- **Service targets**: Defined via `homelab.monitoring.scrapeTargets` in service modules
|
||||
- **External targets**: Non-flake hosts defined in `/services/monitoring/external-targets.nix`
|
||||
- **Library**: `lib/monitoring.nix` provides `generateNodeExporterTargets` and `generateScrapeConfigs`
|
||||
|
||||
Host monitoring options (`homelab.monitoring.*`):
|
||||
- `enable` (default: `true`) - Include host in Prometheus node-exporter scrape targets
|
||||
- `scrapeTargets` (default: `[]`) - Additional scrape targets exposed by this host (job_name, port, metrics_path, scheme, scrape_interval, honor_labels)
|
||||
|
||||
Service modules declare their scrape targets directly (e.g., `services/ca/default.nix` declares step-ca on port 9000). The Prometheus config on monitoring01 auto-generates scrape configs from all hosts.
|
||||
|
||||
To add monitoring targets for non-NixOS hosts, edit `/services/monitoring/external-targets.nix`.
|
||||
|
||||
### DNS Architecture
|
||||
|
||||
- `ns1` (10.69.13.5) - Primary authoritative DNS + resolver
|
||||
- `ns2` (10.69.13.6) - Secondary authoritative DNS (AXFR from ns1)
|
||||
- Zone files managed in `/services/ns/`
|
||||
- All hosts point to ns1/ns2 for DNS resolution
|
||||
|
||||
**Zone Auto-Generation:**
|
||||
|
||||
DNS zone entries are automatically generated from host configurations:
|
||||
|
||||
- **Flake-managed hosts**: A records extracted from `systemd.network.networks` static IPs
|
||||
- **CNAMEs**: Defined via `homelab.dns.cnames` option in host configs
|
||||
- **External hosts**: Non-flake hosts defined in `/services/ns/external-hosts.nix`
|
||||
- **Serial number**: Uses `self.sourceInfo.lastModified` (git commit timestamp)
|
||||
|
||||
Host DNS options (`homelab.dns.*`):
|
||||
- `enable` (default: `true`) - Include host in DNS zone generation
|
||||
- `cnames` (default: `[]`) - List of CNAME aliases pointing to this host
|
||||
|
||||
Hosts are automatically excluded from DNS if:
|
||||
- `homelab.dns.enable = false` (e.g., template hosts)
|
||||
- No static IP configured (e.g., DHCP-only hosts)
|
||||
- Network interface is a VPN/tunnel (wg*, tun*, tap*)
|
||||
|
||||
To add DNS entries for non-NixOS hosts, edit `/services/ns/external-hosts.nix`.
|
||||
|
||||
126
README.md
126
README.md
@@ -1,11 +1,125 @@
|
||||
# nixos-servers
|
||||
|
||||
Nixos configs for my homelab servers.
|
||||
NixOS Flake-based configuration repository for a homelab infrastructure. All hosts run NixOS 25.11 and are managed declaratively through this single repository.
|
||||
|
||||
## Configurations in use
|
||||
## Hosts
|
||||
|
||||
* ha1
|
||||
* ns1
|
||||
* ns2
|
||||
* template1
|
||||
| Host | Role |
|
||||
|------|------|
|
||||
| `ns1`, `ns2` | Primary/secondary authoritative DNS |
|
||||
| `ca` | Internal Certificate Authority |
|
||||
| `ha1` | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||
| `http-proxy` | Reverse proxy |
|
||||
| `monitoring01` | Prometheus, Grafana, Loki, Tempo, Pyroscope |
|
||||
| `jelly01` | Jellyfin media server |
|
||||
| `nix-cache01` | Nix binary cache |
|
||||
| `pgdb1` | PostgreSQL |
|
||||
| `nats1` | NATS messaging |
|
||||
| `vault01` | OpenBao (Vault) secrets management |
|
||||
| `template1`, `template2` | VM templates for cloning new hosts |
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
flake.nix # Flake entry point, defines all host configurations
|
||||
hosts/<hostname>/ # Per-host configuration
|
||||
system/ # Shared modules applied to ALL hosts
|
||||
services/ # Reusable service modules, selectively imported per host
|
||||
modules/ # Custom NixOS module definitions
|
||||
lib/ # Nix library functions (DNS zone generation, etc.)
|
||||
secrets/ # SOPS-encrypted secrets (legacy, only used by ca)
|
||||
common/ # Shared configurations (e.g., VM guest agent)
|
||||
terraform/ # OpenTofu configs for Proxmox VM provisioning
|
||||
terraform/vault/ # OpenTofu configs for OpenBao (secrets, PKI, AppRoles)
|
||||
playbooks/ # Ansible playbooks for template building and fleet ops
|
||||
scripts/ # Helper scripts (create-host, vault-fetch)
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
**Automatic DNS zone generation** - A records are derived from each host's static IP configuration. CNAME aliases are defined via `homelab.dns.cnames`. No manual zone file editing required.
|
||||
|
||||
**OpenBao (Vault) secrets** - Hosts authenticate via AppRole and fetch secrets at boot. Secrets and policies are managed as code in `terraform/vault/`. Legacy SOPS remains only for the `ca` host.
|
||||
|
||||
**Daily auto-upgrades** - All hosts pull from the master branch and automatically rebuild and reboot on a randomized schedule.
|
||||
|
||||
**Shared base configuration** - Every host automatically gets SSH, monitoring (node-exporter + Promtail), internal ACME certificates, and Nix binary cache access via the `system/` modules.
|
||||
|
||||
**Proxmox VM provisioning** - Build VM templates with Ansible and deploy VMs with OpenTofu from `terraform/`.
|
||||
|
||||
**OpenBao (Vault) secrets** - Centralized secrets management with AppRole authentication, PKI infrastructure, and automated bootstrap. Managed as code in `terraform/vault/`.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Enter dev shell (provides ansible, opentofu, openbao, create-host)
|
||||
nix develop
|
||||
|
||||
# Build a host configuration locally
|
||||
nix build .#nixosConfigurations.<hostname>.config.system.build.toplevel
|
||||
|
||||
# List all configurations
|
||||
nix flake show
|
||||
```
|
||||
|
||||
Deployments are done by merging to master and triggering the auto-upgrade on the target host.
|
||||
|
||||
## Provisioning New Hosts
|
||||
|
||||
The repository includes an automated pipeline for creating and deploying new hosts on Proxmox.
|
||||
|
||||
### 1. Generate host configuration
|
||||
|
||||
The `create-host` tool (available in the dev shell) generates all required files for a new host:
|
||||
|
||||
```bash
|
||||
create-host \
|
||||
--hostname myhost \
|
||||
--ip 10.69.13.50/24 \
|
||||
--cpu 4 \
|
||||
--memory 4096 \
|
||||
--disk 50G
|
||||
```
|
||||
|
||||
This creates:
|
||||
- `hosts/<hostname>/` - NixOS configuration (networking, imports, hardware)
|
||||
- Entry in `flake.nix`
|
||||
- VM definition in `terraform/vms.tf`
|
||||
- Vault AppRole policy and wrapped bootstrap token
|
||||
|
||||
Omit `--ip` for DHCP. Use `--dry-run` to preview changes. Use `--force` to regenerate an existing host's config.
|
||||
|
||||
### 2. Build and deploy the VM template
|
||||
|
||||
The Proxmox VM template is built from `hosts/template2` and deployed with Ansible:
|
||||
|
||||
```bash
|
||||
nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml
|
||||
```
|
||||
|
||||
This only needs to be re-run when the base template changes.
|
||||
|
||||
### 3. Deploy the VM
|
||||
|
||||
```bash
|
||||
cd terraform && tofu apply
|
||||
```
|
||||
|
||||
### 4. Automatic bootstrap
|
||||
|
||||
On first boot, the VM automatically:
|
||||
1. Receives its hostname and Vault credentials via cloud-init
|
||||
2. Unwraps the Vault token and stores AppRole credentials
|
||||
3. Runs `nixos-rebuild boot` against the flake on the master branch
|
||||
4. Reboots into the host-specific configuration
|
||||
5. Services fetch their secrets from Vault at startup
|
||||
|
||||
No manual intervention is required after `tofu apply`.
|
||||
|
||||
## Network
|
||||
|
||||
- Domain: `home.2rjus.net`
|
||||
- Infrastructure subnet: `10.69.13.0/24`
|
||||
- DNS: ns1/ns2 authoritative with primary-secondary AXFR
|
||||
- Internal CA for TLS certificates (migrating from step-ca to OpenBao PKI)
|
||||
- Centralized monitoring at monitoring01
|
||||
|
||||
104
TODO.md
104
TODO.md
@@ -155,7 +155,7 @@ create-host \
|
||||
|
||||
### Phase 4: Secrets Management with OpenBao (Vault)
|
||||
|
||||
**Status:** 🚧 Phases 4a & 4b Complete, 4c & 4d In Progress
|
||||
**Status:** 🚧 Phases 4a, 4b, 4c (partial), & 4d Complete
|
||||
|
||||
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
||||
|
||||
@@ -339,6 +339,8 @@ vault01.home.2rjus.net (10.69.13.19)
|
||||
|
||||
#### Phase 4c: PKI Migration (Replace step-ca)
|
||||
|
||||
**Status:** 🚧 Partially Complete - vault01 and test host migrated, remaining hosts pending
|
||||
|
||||
**Goal:** Migrate hosts from step-ca to OpenBao PKI for TLS certificates
|
||||
|
||||
**Note:** PKI infrastructure already set up in Phase 4b (root CA, intermediate CA, ACME support)
|
||||
@@ -349,27 +351,33 @@ vault01.home.2rjus.net (10.69.13.19)
|
||||
- [x] Intermediate CA (`pki_int/` mount, 5 year TTL, EC P-384)
|
||||
- [x] Signed intermediate with root CA
|
||||
- [x] Configured CRL, OCSP, and issuing certificate URLs
|
||||
- [x] Enable ACME support (completed in Phase 4b)
|
||||
- [x] Enable ACME support (completed in Phase 4b, fixed in Phase 4c)
|
||||
- [x] Enabled ACME on intermediate CA
|
||||
- [x] Created PKI role for `*.home.2rjus.net`
|
||||
- [x] Set certificate TTLs (30 day max) and allowed domains
|
||||
- [x] ACME directory: `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Download and distribute root CA certificate
|
||||
- [ ] Export root CA: `bao read -field=certificate pki/cert/ca > homelab-root-ca.crt`
|
||||
- [ ] Add to NixOS trust store on all hosts via `security.pki.certificateFiles`
|
||||
- [ ] Deploy via auto-upgrade
|
||||
- [ ] Test certificate issuance
|
||||
- [ ] Issue test certificate using ACME client (lego/certbot)
|
||||
- [ ] Or issue static certificate via OpenBao CLI
|
||||
- [ ] Verify certificate chain and trust
|
||||
- [ ] Migrate vault01's own certificate
|
||||
- [ ] Issue new certificate from OpenBao PKI (self-issued)
|
||||
- [ ] Replace self-signed bootstrap certificate
|
||||
- [ ] Update service configuration
|
||||
- [x] Fixed ACME response headers (added Replay-Nonce, Link, Location to allowed_response_headers)
|
||||
- [x] Configured cluster path for ACME
|
||||
- [x] Download and distribute root CA certificate
|
||||
- [x] Added root CA to `system/pki/root-ca.nix`
|
||||
- [x] Distributed to all hosts via system imports
|
||||
- [x] Test certificate issuance
|
||||
- [x] Tested ACME issuance on vaulttest01 successfully
|
||||
- [x] Verified certificate chain and trust
|
||||
- [x] Migrate vault01's own certificate
|
||||
- [x] Created `bootstrap-vault-cert` script for initial certificate issuance via bao CLI
|
||||
- [x] Issued certificate with SANs (vault01.home.2rjus.net + vault.home.2rjus.net)
|
||||
- [x] Updated service to read certificates from `/var/lib/acme/vault01.home.2rjus.net/`
|
||||
- [x] Configured ACME for automatic renewals
|
||||
- [ ] Migrate hosts from step-ca to OpenBao
|
||||
- [x] Tested on vaulttest01 (non-production host)
|
||||
- [ ] Standardize hostname usage across all configurations
|
||||
- [ ] Use `vault.home.2rjus.net` (CNAME) consistently everywhere
|
||||
- [ ] Update NixOS configurations to use CNAME instead of vault01
|
||||
- [ ] Update Terraform configurations to use CNAME
|
||||
- [ ] Audit and fix mixed usage of vault01.home.2rjus.net vs vault.home.2rjus.net
|
||||
- [ ] Update `system/acme.nix` to use OpenBao ACME endpoint
|
||||
- [ ] Change server to `https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Test on one host (non-critical service)
|
||||
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
||||
- [ ] Roll out to all hosts via auto-upgrade
|
||||
- [ ] Configure SSH CA in OpenBao (optional, future work)
|
||||
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
||||
@@ -384,7 +392,39 @@ vault01.home.2rjus.net (10.69.13.19)
|
||||
- [ ] Archive step-ca configuration for backup
|
||||
- [ ] Update documentation
|
||||
|
||||
**Deliverable:** All TLS certificates issued by OpenBao PKI, step-ca retired
|
||||
**Implementation Details (2026-02-03):**
|
||||
|
||||
**ACME Configuration Fix:**
|
||||
The key blocker was that OpenBao's PKI mount was filtering out required ACME response headers. The solution was to add `allowed_response_headers` to the Terraform mount configuration:
|
||||
```hcl
|
||||
allowed_response_headers = [
|
||||
"Replay-Nonce", # Required for ACME nonce generation
|
||||
"Link", # Required for ACME navigation
|
||||
"Location" # Required for ACME resource location
|
||||
]
|
||||
```
|
||||
|
||||
**Cluster Path Configuration:**
|
||||
ACME requires the cluster path to include the full API path:
|
||||
```hcl
|
||||
path = "${var.vault_address}/v1/${vault_mount.pki_int.path}"
|
||||
aia_path = "${var.vault_address}/v1/${vault_mount.pki_int.path}"
|
||||
```
|
||||
|
||||
**Bootstrap Process:**
|
||||
Since vault01 needed a certificate from its own PKI (chicken-and-egg problem), we created a `bootstrap-vault-cert` script that:
|
||||
1. Uses the Unix socket (no TLS) to issue a certificate via `bao` CLI
|
||||
2. Places it in the ACME directory structure
|
||||
3. Includes both vault01.home.2rjus.net and vault.home.2rjus.net as SANs
|
||||
4. After restart, ACME manages renewals automatically
|
||||
|
||||
**Files Modified:**
|
||||
- `terraform/vault/pki.tf` - Added allowed_response_headers, cluster config, ACME config
|
||||
- `services/vault/default.nix` - Updated cert paths, added bootstrap script, configured ACME
|
||||
- `system/pki/root-ca.nix` - Added OpenBao root CA to trust store
|
||||
- `hosts/vaulttest01/configuration.nix` - Overrode ACME server for testing
|
||||
|
||||
**Deliverable:** ✅ vault01 and vaulttest01 using OpenBao PKI, remaining hosts still on step-ca
|
||||
|
||||
---
|
||||
|
||||
@@ -484,36 +524,6 @@ vault01.home.2rjus.net (10.69.13.19)
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: DNS Automation
|
||||
|
||||
**Goal:** Automatically generate DNS entries from host configurations
|
||||
|
||||
**Approach:** Leverage Nix to generate zone file entries from flake host configurations
|
||||
|
||||
Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs.
|
||||
|
||||
**Tasks:**
|
||||
- [ ] Add optional CNAME field to host configurations
|
||||
- [ ] Add `networking.cnames = [ "alias1" "alias2" ]` or similar option
|
||||
- [ ] Document in host configuration template
|
||||
- [ ] Create Nix function to extract DNS records from all hosts
|
||||
- [ ] Parse each host's `networking.hostName` and IP configuration
|
||||
- [ ] Collect any defined CNAMEs
|
||||
- [ ] Generate zone file fragment with A and CNAME records
|
||||
- [ ] Integrate auto-generated records into zone files
|
||||
- [ ] Keep manual entries separate (for non-flake hosts/services)
|
||||
- [ ] Include generated fragment in main zone file
|
||||
- [ ] Add comments showing which records are auto-generated
|
||||
- [ ] Update zone file serial number automatically
|
||||
- [ ] Test zone file validity after generation
|
||||
- [ ] Either:
|
||||
- [ ] Automatically trigger DNS server reload (Ansible)
|
||||
- [ ] Or document manual step: merge to master, run upgrade on ns1/ns2
|
||||
|
||||
**Deliverable:** DNS A records and CNAMEs automatically generated from host configs
|
||||
|
||||
---
|
||||
|
||||
### Phase 6: Integration Script
|
||||
|
||||
**Goal:** Single command to create and deploy a new host
|
||||
|
||||
192
docs/plans/auth-system-replacement.md
Normal file
192
docs/plans/auth-system-replacement.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Authentication System Replacement Plan
|
||||
|
||||
## Overview
|
||||
|
||||
Replace the current auth01 setup (LLDAP + Authelia) with a modern, unified authentication solution. The current setup is not in active use, making this a good time to evaluate alternatives.
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Central user database** - Manage users across all homelab hosts from a single source
|
||||
2. **Linux PAM/NSS integration** - Users can SSH into hosts using central credentials
|
||||
3. **UID/GID consistency** - Proper POSIX attributes for NAS share permissions
|
||||
4. **OIDC provider** - Single sign-on for homelab web services (Grafana, etc.)
|
||||
|
||||
## Options Evaluated
|
||||
|
||||
### OpenLDAP (raw)
|
||||
|
||||
- **NixOS Support:** Good (`services.openldap` with `declarativeContents`)
|
||||
- **Pros:** Most widely supported, very flexible
|
||||
- **Cons:** LDIF format is painful, schema management is complex, no built-in OIDC, requires SSSD on each client
|
||||
- **Verdict:** Doesn't address LDAP complexity concerns
|
||||
|
||||
### LLDAP + Authelia (current)
|
||||
|
||||
- **NixOS Support:** Both have good modules
|
||||
- **Pros:** Already configured, lightweight, nice web UIs
|
||||
- **Cons:** Two services to manage, limited POSIX attribute support in LLDAP, requires SSSD on every client host
|
||||
- **Verdict:** Workable but has friction for NAS/UID goals
|
||||
|
||||
### FreeIPA
|
||||
|
||||
- **NixOS Support:** None
|
||||
- **Pros:** Full enterprise solution (LDAP + Kerberos + DNS + CA)
|
||||
- **Cons:** Extremely heavy, wants to own DNS, designed for Red Hat ecosystems, massive overkill for homelab
|
||||
- **Verdict:** Overkill, no NixOS support
|
||||
|
||||
### Keycloak
|
||||
|
||||
- **NixOS Support:** None
|
||||
- **Pros:** Good OIDC/SAML, nice UI
|
||||
- **Cons:** Primarily an identity broker not a user directory, poor POSIX support, heavy (Java)
|
||||
- **Verdict:** Wrong tool for Linux user management
|
||||
|
||||
### Authentik
|
||||
|
||||
- **NixOS Support:** None (would need Docker)
|
||||
- **Pros:** All-in-one with LDAP outpost and OIDC, modern UI
|
||||
- **Cons:** Heavy stack (Python + PostgreSQL + Redis), LDAP is a separate component
|
||||
- **Verdict:** Would work but requires Docker and is heavy
|
||||
|
||||
### Kanidm
|
||||
|
||||
- **NixOS Support:** Excellent - first-class module with PAM/NSS integration
|
||||
- **Pros:**
|
||||
- Native PAM/NSS module (no SSSD needed)
|
||||
- Built-in OIDC provider
|
||||
- Optional LDAP interface for legacy services
|
||||
- Declarative provisioning via NixOS (users, groups, OAuth2 clients)
|
||||
- Modern, written in Rust
|
||||
- Single service handles everything
|
||||
- **Cons:** Newer project, smaller community than LDAP
|
||||
- **Verdict:** Best fit for requirements
|
||||
|
||||
### Pocket-ID
|
||||
|
||||
- **NixOS Support:** Unknown
|
||||
- **Pros:** Very lightweight, passkey-first
|
||||
- **Cons:** No LDAP, no PAM/NSS integration - purely OIDC for web apps
|
||||
- **Verdict:** Doesn't solve Linux user management goal
|
||||
|
||||
## Recommendation: Kanidm
|
||||
|
||||
Kanidm is the recommended solution for the following reasons:
|
||||
|
||||
| Requirement | Kanidm Support |
|
||||
|-------------|----------------|
|
||||
| Central user database | Native |
|
||||
| Linux PAM/NSS (host login) | Native NixOS module |
|
||||
| UID/GID for NAS | POSIX attributes supported |
|
||||
| OIDC for services | Built-in |
|
||||
| Declarative config | Excellent NixOS provisioning |
|
||||
| Simplicity | Modern API, LDAP optional |
|
||||
| NixOS integration | First-class |
|
||||
|
||||
### Key NixOS Features
|
||||
|
||||
**Server configuration:**
|
||||
```nix
|
||||
services.kanidm.enableServer = true;
|
||||
services.kanidm.serverSettings = {
|
||||
domain = "home.2rjus.net";
|
||||
origin = "https://auth.home.2rjus.net";
|
||||
ldapbindaddress = "0.0.0.0:636"; # Optional LDAP interface
|
||||
};
|
||||
```
|
||||
|
||||
**Declarative user provisioning:**
|
||||
```nix
|
||||
services.kanidm.provision.enable = true;
|
||||
services.kanidm.provision.persons.torjus = {
|
||||
displayName = "Torjus";
|
||||
groups = [ "admins" "nas-users" ];
|
||||
};
|
||||
```
|
||||
|
||||
**Declarative OAuth2 clients:**
|
||||
```nix
|
||||
services.kanidm.provision.systems.oauth2.grafana = {
|
||||
displayName = "Grafana";
|
||||
originUrl = "https://grafana.home.2rjus.net/login/generic_oauth";
|
||||
originLanding = "https://grafana.home.2rjus.net";
|
||||
};
|
||||
```
|
||||
|
||||
**Client host configuration (add to system/):**
|
||||
```nix
|
||||
services.kanidm.enableClient = true;
|
||||
services.kanidm.enablePam = true;
|
||||
services.kanidm.clientSettings.uri = "https://auth.home.2rjus.net";
|
||||
```
|
||||
|
||||
## NAS Integration
|
||||
|
||||
### Current: TrueNAS CORE (FreeBSD)
|
||||
|
||||
TrueNAS CORE has a built-in LDAP client. Kanidm's read-only LDAP interface will work for NFS share permissions:
|
||||
|
||||
- **NFS shares**: Only need consistent UID/GID mapping - Kanidm's LDAP provides this
|
||||
- **No SMB requirement**: SMB would need Samba schema attributes (deprecated in TrueNAS 13.0+), but we're NFS-only
|
||||
|
||||
Configuration approach:
|
||||
1. Enable Kanidm's LDAP interface (`ldapbindaddress = "0.0.0.0:636"`)
|
||||
2. Import internal CA certificate into TrueNAS
|
||||
3. Configure TrueNAS LDAP client with Kanidm's Base DN and bind credentials
|
||||
4. Users/groups appear in TrueNAS permission dropdowns
|
||||
|
||||
Note: Kanidm's LDAP is read-only and uses LDAPS only (no StartTLS). This is fine for our use case.
|
||||
|
||||
### Future: NixOS NAS
|
||||
|
||||
When the NAS is migrated to NixOS, it becomes a first-class citizen:
|
||||
|
||||
- Native Kanidm PAM/NSS integration (same as other hosts)
|
||||
- No LDAP compatibility layer needed
|
||||
- Full integration with the rest of the homelab
|
||||
|
||||
This future migration path is a strong argument for Kanidm over LDAP-only solutions.
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
1. **Create Kanidm service module** in `services/kanidm/`
|
||||
- Server configuration
|
||||
- TLS via internal ACME
|
||||
- Vault secrets for admin passwords
|
||||
|
||||
2. **Configure declarative provisioning**
|
||||
- Define initial users and groups
|
||||
- Set up POSIX attributes (UID/GID ranges)
|
||||
|
||||
3. **Add OIDC clients** for homelab services
|
||||
- Grafana
|
||||
- Other services as needed
|
||||
|
||||
4. **Create client module** in `system/` for PAM/NSS
|
||||
- Enable on all hosts that need central auth
|
||||
- Configure trusted CA
|
||||
|
||||
5. **Test NAS integration**
|
||||
- Configure TrueNAS LDAP client to connect to Kanidm
|
||||
- Verify UID/GID mapping works with NFS shares
|
||||
|
||||
6. **Migrate auth01**
|
||||
- Remove LLDAP and Authelia services
|
||||
- Deploy Kanidm
|
||||
- Update DNS CNAMEs if needed
|
||||
|
||||
7. **Documentation**
|
||||
- User management procedures
|
||||
- Adding new OAuth2 clients
|
||||
- Troubleshooting PAM/NSS issues
|
||||
|
||||
## Open Questions
|
||||
|
||||
- What UID/GID range should be reserved for Kanidm-managed users?
|
||||
- Which hosts should have PAM/NSS enabled initially?
|
||||
- What OAuth2 clients are needed at launch?
|
||||
|
||||
## References
|
||||
|
||||
- [Kanidm Documentation](https://kanidm.github.io/kanidm/stable/)
|
||||
- [NixOS Kanidm Module](https://search.nixos.org/options?query=services.kanidm)
|
||||
- [Kanidm PAM/NSS Integration](https://kanidm.github.io/kanidm/stable/pam_and_nsswitch.html)
|
||||
61
docs/plans/completed/dns-automation.md
Normal file
61
docs/plans/completed/dns-automation.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# DNS Automation
|
||||
|
||||
**Status:** Completed (2026-02-04)
|
||||
|
||||
**Goal:** Automatically generate DNS entries from host configurations
|
||||
|
||||
**Approach:** Leverage Nix to generate zone file entries from flake host configurations
|
||||
|
||||
Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs.
|
||||
|
||||
## Implementation
|
||||
|
||||
- [x] Add optional CNAME field to host configurations
|
||||
- [x] Added `homelab.dns.cnames` option in `modules/homelab/dns.nix`
|
||||
- [x] Added `homelab.dns.enable` to allow opting out (defaults to true)
|
||||
- [x] Documented in CLAUDE.md
|
||||
- [x] Create Nix function to extract DNS records from all hosts
|
||||
- [x] Created `lib/dns-zone.nix` with extraction functions
|
||||
- [x] Parses each host's `networking.hostName` and `systemd.network.networks` IP configuration
|
||||
- [x] Collects CNAMEs from `homelab.dns.cnames`
|
||||
- [x] Filters out VPN interfaces (wg*, tun*, tap*, vti*)
|
||||
- [x] Generates complete zone file with A and CNAME records
|
||||
- [x] Integrate auto-generated records into zone files
|
||||
- [x] External hosts separated to `services/ns/external-hosts.nix`
|
||||
- [x] Zone includes comments showing which records are auto-generated vs external
|
||||
- [x] Update zone file serial number automatically
|
||||
- [x] Uses `self.sourceInfo.lastModified` (git commit timestamp)
|
||||
- [x] Test zone file validity after generation
|
||||
- [x] NSD validates zone at build time via `nsd-checkzone`
|
||||
- [x] Deploy process documented
|
||||
- [x] Merge to master, run auto-upgrade on ns1/ns2
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `modules/homelab/dns.nix` | Defines `homelab.dns.*` options |
|
||||
| `modules/homelab/default.nix` | Module import hub |
|
||||
| `lib/dns-zone.nix` | Zone generation functions |
|
||||
| `services/ns/external-hosts.nix` | Non-flake host records |
|
||||
| `services/ns/master-authorative.nix` | Uses generated zone |
|
||||
| `services/ns/secondary-authorative.nix` | Uses generated zone |
|
||||
|
||||
## Usage
|
||||
|
||||
View generated zone:
|
||||
```bash
|
||||
nix eval .#nixosConfigurations.ns1.config.services.nsd.zones.'"home.2rjus.net"'.data --raw
|
||||
```
|
||||
|
||||
Add CNAMEs to a host:
|
||||
```nix
|
||||
homelab.dns.cnames = [ "alias1" "alias2" ];
|
||||
```
|
||||
|
||||
Exclude a host from DNS:
|
||||
```nix
|
||||
homelab.dns.enable = false;
|
||||
```
|
||||
|
||||
Add non-flake hosts: Edit `services/ns/external-hosts.nix`
|
||||
23
docs/plans/completed/host-cleanup.md
Normal file
23
docs/plans/completed/host-cleanup.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Host Cleanup
|
||||
|
||||
## Overview
|
||||
|
||||
Remove decommissioned/unused host configurations that are no longer reachable on the network.
|
||||
|
||||
## Hosts to review
|
||||
|
||||
The following hosts return "no route to host" from Prometheus scraping and are likely no longer needed:
|
||||
|
||||
- `media1` (10.69.12.82)
|
||||
- `ns3` (10.69.13.7)
|
||||
- `ns4` (10.69.13.8)
|
||||
- `nixos-test1` (10.69.13.10)
|
||||
|
||||
## Steps
|
||||
|
||||
1. Confirm each host is truly decommissioned (not just temporarily powered off)
|
||||
2. Remove host directory from `hosts/`
|
||||
3. Remove `nixosConfigurations` entry from `flake.nix`
|
||||
4. Remove host's age key from `.sops.yaml`
|
||||
5. Remove per-host secrets from `secrets/<hostname>/` if any
|
||||
6. Verify DNS zone and Prometheus targets no longer include the removed hosts after rebuild
|
||||
128
docs/plans/completed/monitoring-gaps.md
Normal file
128
docs/plans/completed/monitoring-gaps.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Monitoring Gaps Audit
|
||||
|
||||
## Overview
|
||||
|
||||
Audit of services running in the homelab that lack monitoring coverage, either missing Prometheus scrape targets, alerting rules, or both.
|
||||
|
||||
## Services with No Monitoring
|
||||
|
||||
### PostgreSQL (`pgdb1`)
|
||||
|
||||
- **Current state:** No scrape targets, no alert rules
|
||||
- **Risk:** A database outage would go completely unnoticed by Prometheus
|
||||
- **Recommendation:** Enable `services.prometheus.exporters.postgres` (available in nixpkgs). This exposes connection counts, query throughput, replication lag, table/index stats, and more. Add alerts for at least `postgres_down` (systemd unit state) and connection pool exhaustion.
|
||||
|
||||
### Authelia (`auth01`)
|
||||
|
||||
- **Current state:** No scrape targets, no alert rules
|
||||
- **Risk:** The authentication gateway being down blocks access to all proxied services
|
||||
- **Recommendation:** Authelia exposes Prometheus metrics natively at `/metrics`. Add a scrape target and at minimum an `authelia_down` systemd unit state alert.
|
||||
|
||||
### LLDAP (`auth01`)
|
||||
|
||||
- **Current state:** No scrape targets, no alert rules
|
||||
- **Risk:** LLDAP is a dependency of Authelia -- if LDAP is down, authentication breaks even if Authelia is running
|
||||
- **Recommendation:** Add an `lldap_down` systemd unit state alert. LLDAP does not expose Prometheus metrics natively, so systemd unit monitoring via node-exporter may be sufficient.
|
||||
|
||||
### Vault / OpenBao (`vault01`)
|
||||
|
||||
- **Current state:** No scrape targets, no alert rules
|
||||
- **Risk:** Secrets management service failures go undetected
|
||||
- **Recommendation:** OpenBao supports Prometheus telemetry output natively. Add a scrape target for the telemetry endpoint and alerts for `vault_down` (systemd unit) and seal status.
|
||||
|
||||
### Gitea Actions Runner
|
||||
|
||||
- **Current state:** No scrape targets, no alert rules
|
||||
- **Risk:** CI/CD failures go undetected
|
||||
- **Recommendation:** Add at minimum a systemd unit state alert. The runner itself has limited metrics exposure.
|
||||
|
||||
## Services with Partial Monitoring
|
||||
|
||||
### Jellyfin (`jelly01`)
|
||||
|
||||
- **Current state:** Has scrape targets (port 8096), metrics are being collected, but zero alert rules
|
||||
- **Metrics available:** 184 metrics, all .NET runtime / ASP.NET Core level. No Jellyfin-specific metrics (active streams, library size, transcoding sessions). Key useful metrics:
|
||||
- `microsoft_aspnetcore_hosting_failed_requests` - rate of HTTP errors
|
||||
- `microsoft_aspnetcore_hosting_current_requests` - in-flight requests
|
||||
- `process_working_set_bytes` - memory usage (~256 MB currently)
|
||||
- `dotnet_gc_pause_ratio` - GC pressure
|
||||
- `up{job="jellyfin"}` - basic availability
|
||||
- **Recommendation:** Add a `jellyfin_down` alert using either `up{job="jellyfin"} == 0` or systemd unit state. Consider alerting on sustained `failed_requests` rate increase.
|
||||
|
||||
### NATS (`nats1`)
|
||||
|
||||
- **Current state:** Has a `nats_down` alert (systemd unit state via node-exporter), but no NATS-specific metrics
|
||||
- **Metrics available:** NATS has a built-in `/metrics` endpoint exposing connection counts, message throughput, JetStream consumer lag, and more
|
||||
- **Recommendation:** Add a scrape target for the NATS metrics endpoint. Consider alerts for connection count spikes, slow consumers, and JetStream storage usage.
|
||||
|
||||
### DNS - Unbound (`ns1`, `ns2`)
|
||||
|
||||
- **Current state:** Has `unbound_down` alert (systemd unit state), but no DNS query metrics
|
||||
- **Available in nixpkgs:** `services.prometheus.exporters.unbound.enable` (package: `prometheus-unbound-exporter` v0.5.0). Exposes query counts, cache hit ratios, response types (SERVFAIL, NXDOMAIN), upstream latency.
|
||||
- **Recommendation:** Enable the unbound exporter on ns1/ns2. Add alerts for cache hit ratio drops and SERVFAIL rate spikes.
|
||||
|
||||
### DNS - NSD (`ns1`, `ns2`)
|
||||
|
||||
- **Current state:** Has `nsd_down` alert (systemd unit state), no NSD-specific metrics
|
||||
- **Available in nixpkgs:** Nothing. No exporter package or NixOS module. Community `nsd_exporter` exists but is not packaged.
|
||||
- **Recommendation:** The existing systemd unit alert is likely sufficient. NSD is a simple authoritative-only server with limited operational metrics. Not worth packaging a custom exporter for now.
|
||||
|
||||
## Existing Monitoring (for reference)
|
||||
|
||||
These services have adequate alerting and/or scrape targets:
|
||||
|
||||
| Service | Scrape Targets | Alert Rules |
|
||||
|---|---|---|
|
||||
| Monitoring stack (Prometheus, Grafana, Loki, Tempo, Pyroscope) | Yes | 7 alerts |
|
||||
| Home Assistant (+ Zigbee2MQTT, Mosquitto) | Yes (port 8123) | 3 alerts |
|
||||
| HTTP Proxy (Caddy) | Yes (port 80) | 3 alerts |
|
||||
| Nix Cache (Harmonia, build-flakes) | Via Caddy | 4 alerts |
|
||||
| CA (step-ca) | Yes (port 9000) | 4 certificate alerts |
|
||||
|
||||
## Per-Service Resource Metrics (systemd-exporter)
|
||||
|
||||
### Current State
|
||||
|
||||
No per-service CPU, memory, or IO metrics are collected. The existing node-exporter systemd collector only provides unit state (active/inactive/failed), socket stats, and timer triggers. While systemd tracks per-unit resource usage via cgroups internally (visible in `systemctl status` and `systemd-cgtop`), this data is not exported to Prometheus.
|
||||
|
||||
### Available Solution
|
||||
|
||||
The `prometheus-systemd-exporter` package (v0.7.0) is available in nixpkgs with a ready-made NixOS module:
|
||||
|
||||
```nix
|
||||
services.prometheus.exporters.systemd.enable = true;
|
||||
```
|
||||
|
||||
**Options:** `enable`, `port`, `extraFlags`, `user`, `group`
|
||||
|
||||
This exporter reads cgroup data and exposes per-unit metrics including:
|
||||
- CPU seconds consumed per service
|
||||
- Memory usage per service
|
||||
- Task/process counts per service
|
||||
- Restart counts
|
||||
- IO usage
|
||||
|
||||
### Recommendation
|
||||
|
||||
Enable on all hosts via the shared `system/` config (same pattern as node-exporter). Add a corresponding scrape job on monitoring01. This would give visibility into resource consumption per service across the fleet, useful for capacity planning and diagnosing noisy-neighbor issues on shared hosts.
|
||||
|
||||
## Suggested Priority
|
||||
|
||||
1. **PostgreSQL** - Critical infrastructure, easy to add with existing nixpkgs module
|
||||
2. **Authelia + LLDAP** - Auth outage affects all proxied services
|
||||
3. **Unbound exporter** - Ready-to-go NixOS module, just needs enabling
|
||||
4. **Jellyfin alerts** - Metrics already collected, just needs alert rules
|
||||
5. **NATS metrics** - Built-in endpoint, just needs a scrape target
|
||||
6. **Vault/OpenBao** - Native telemetry support
|
||||
7. **Actions Runner** - Lower priority, basic systemd alert sufficient
|
||||
|
||||
## Node-Exporter Targets Currently Down
|
||||
|
||||
Noted during audit -- these node-exporter targets are failing:
|
||||
|
||||
- `nixos-test1.home.2rjus.net:9100` - no route to host
|
||||
- `media1.home.2rjus.net:9100` - no route to host
|
||||
- `ns3.home.2rjus.net:9100` - no route to host
|
||||
- `ns4.home.2rjus.net:9100` - no route to host
|
||||
|
||||
These may be decommissioned or powered-off hosts that should be removed from the scrape config.
|
||||
86
docs/plans/completed/sops-to-openbao-migration.md
Normal file
86
docs/plans/completed/sops-to-openbao-migration.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Sops to OpenBao Secrets Migration Plan
|
||||
|
||||
## Status: Complete (except ca, deferred)
|
||||
|
||||
## Remaining sops cleanup
|
||||
|
||||
The `sops-nix` flake input, `system/sops.nix`, `.sops.yaml`, and `secrets/` directory are
|
||||
still present because `ca` still uses sops for its step-ca secrets (5 secrets in
|
||||
`services/ca/default.nix`). The `services/authelia/` and `services/lldap/` modules also
|
||||
reference sops but are only used by auth01 (decommissioned).
|
||||
|
||||
Once `ca` is migrated to OpenBao PKI (Phase 4c in host-migration-to-opentofu.md), remove:
|
||||
- `sops-nix` input from `flake.nix`
|
||||
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
||||
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
||||
- `system/sops.nix` and its import in `system/default.nix`
|
||||
- `.sops.yaml`
|
||||
- `secrets/` directory
|
||||
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
||||
|
||||
## Overview
|
||||
|
||||
Migrate all hosts from sops-nix secrets to OpenBao (vault) secrets management. Pilot with ha1, then roll out to remaining hosts in waves.
|
||||
|
||||
## Pre-requisites (completed)
|
||||
|
||||
1. Hardcoded root password hash in `system/root-user.nix` (removes sops dependency for all hosts)
|
||||
2. Added `extractKey` option to `system/vault-secrets.nix` (extracts single key as file)
|
||||
|
||||
## Deployment Order
|
||||
|
||||
### Pilot: ha1
|
||||
- Terraform: shared/backup/password secret, ha1 AppRole policy
|
||||
- Provision AppRole credentials via `playbooks/provision-approle.yml`
|
||||
- NixOS: vault.enable + backup-helper vault secret
|
||||
|
||||
### Wave 1: nats1, jelly01, pgdb1
|
||||
- No service secrets (only root password, already handled)
|
||||
- Just need AppRole policies + credential provisioning
|
||||
|
||||
### Wave 2: monitoring01
|
||||
- 3 secrets: backup password, nats nkey, pve-exporter config
|
||||
- Updates: alerttonotify.nix, pve.nix, configuration.nix
|
||||
|
||||
### Wave 3: ns1, then ns2 (critical - deploy ns1 first, verify, then ns2)
|
||||
- DNS zone transfer key (shared/dns/xfer-key)
|
||||
|
||||
### Wave 4: http-proxy
|
||||
- WireGuard private key
|
||||
|
||||
### Wave 5: nix-cache01
|
||||
- Cache signing key + Gitea Actions token
|
||||
|
||||
### Wave 6: ca (DEFERRED - waiting for PKI migration)
|
||||
|
||||
### Skipped: auth01 (decommissioned)
|
||||
|
||||
## Terraform variables needed
|
||||
|
||||
User must extract from sops and add to `terraform/vault/terraform.tfvars`:
|
||||
|
||||
| Variable | Source |
|
||||
|----------|--------|
|
||||
| `backup_helper_secret` | `sops -d secrets/secrets.yaml` |
|
||||
| `ns_xfer_key` | `sops -d secrets/secrets.yaml` |
|
||||
| `nats_nkey` | `sops -d secrets/secrets.yaml` |
|
||||
| `pve_exporter_config` | `sops -d secrets/monitoring01/pve-exporter.yaml` |
|
||||
| `wireguard_private_key` | `sops -d secrets/http-proxy/wireguard.yaml` |
|
||||
| `cache_signing_key` | `sops -d secrets/nix-cache01/cache-secret` |
|
||||
| `actions_token_1` | `sops -d secrets/nix-cache01/actions_token_1` |
|
||||
|
||||
## Provisioning AppRole credentials
|
||||
|
||||
```bash
|
||||
export BAO_ADDR='https://vault01.home.2rjus.net:8200'
|
||||
export BAO_TOKEN='<root-token>'
|
||||
nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>
|
||||
```
|
||||
|
||||
## Verification (per host)
|
||||
|
||||
1. `systemctl status vault-secret-*` - all secret fetch services succeeded
|
||||
2. Check secret files exist at expected paths with correct permissions
|
||||
3. Verify dependent services are running
|
||||
4. Check `/var/lib/vault/cache/` is populated (fallback ready)
|
||||
5. Reboot host to verify boot-time secret fetching works
|
||||
109
docs/plans/completed/zigbee-sensor-battery-monitoring.md
Normal file
109
docs/plans/completed/zigbee-sensor-battery-monitoring.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Zigbee Sensor Battery Monitoring
|
||||
|
||||
**Status:** Completed
|
||||
**Branch:** `zigbee-battery-fix`
|
||||
**Commit:** `c515a6b home-assistant: fix zigbee sensor battery reporting`
|
||||
|
||||
## Problem
|
||||
|
||||
Three Aqara Zigbee temperature sensors report `battery: 0` in their MQTT payload, making the `hass_sensor_battery_percent` Prometheus metric useless for battery monitoring on these devices.
|
||||
|
||||
Affected sensors:
|
||||
- **Temp Living Room** (`0x54ef441000a54d3c`) — WSDCGQ12LM
|
||||
- **Temp Office** (`0x54ef441000a547bd`) — WSDCGQ12LM
|
||||
- **temp_server** (`0x54ef441000a564b6`) — WSDCGQ12LM
|
||||
|
||||
The **Temp Bedroom** sensor (`0x00124b0025495463`) is a SONOFF SNZB-02 and reports battery correctly.
|
||||
|
||||
## Findings
|
||||
|
||||
- All three sensors are actively reporting temperature, humidity, and pressure data — they are not dead.
|
||||
- The Zigbee2MQTT payload includes a `voltage` field (e.g., `2707` = 2.707V), which indicates healthy battery levels (~40-60% for a CR2032 coin cell).
|
||||
- CR2032 voltage reference: ~3.0V fresh, ~2.7V mid-life, ~2.1V dead.
|
||||
- The `voltage` field is not exposed as a Prometheus metric — it exists only in the MQTT payload.
|
||||
- This is a known firmware quirk with some Aqara WSDCGQ12LM sensors that always report 0% battery.
|
||||
|
||||
## Device Inventory
|
||||
|
||||
Full list of Zigbee devices on ha1 (12 total):
|
||||
|
||||
| Device | IEEE Address | Model | Type |
|
||||
|--------|-------------|-------|------|
|
||||
| temp_server | 0x54ef441000a564b6 | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||
| (Temp Living Room) | 0x54ef441000a54d3c | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||
| (Temp Office) | 0x54ef441000a547bd | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||
| (Temp Bedroom) | 0x00124b0025495463 | SNZB-02 | Temperature sensor (battery works) |
|
||||
| (Water leak) | 0x54ef4410009ac117 | SJCGQ12LM | Water leak sensor |
|
||||
| btn_livingroom | 0x54ef441000a1f907 | WXKG13LM | Wireless mini switch |
|
||||
| btn_bedroom | 0x54ef441000a1ee71 | WXKG13LM | Wireless mini switch |
|
||||
| (Hue bulb) | 0x001788010dc35d06 | 9290024688 | Hue E27 1100lm (Router) |
|
||||
| (Hue bulb) | 0x001788010dc5f003 | 9290024688 | Hue E27 1100lm (Router) |
|
||||
| (Hue ceiling) | 0x001788010e371aa4 | 915005997301 | Hue Infuse medium (Router) |
|
||||
| (Hue ceiling) | 0x001788010d253b99 | 915005997301 | Hue Infuse medium (Router) |
|
||||
| (Hue wall) | 0x001788010d1b599a | 929003052901 | Hue Sana wall light (Router, transition=5) |
|
||||
|
||||
## Implementation
|
||||
|
||||
### Solution 1: Calculate battery from voltage in Zigbee2MQTT (Implemented)
|
||||
|
||||
Override the Home Assistant battery entity's `value_template` in Zigbee2MQTT device configuration to calculate battery percentage from voltage.
|
||||
|
||||
**Formula:** `(voltage - 2100) / 9` (maps 2100-3000mV to 0-100%)
|
||||
|
||||
**Changes in `services/home-assistant/default.nix`:**
|
||||
- Device configuration moved from external `devices.yaml` to inline NixOS config
|
||||
- Three affected sensors have `homeassistant.sensor_battery.value_template` override
|
||||
- All 12 devices now declaratively managed
|
||||
|
||||
**Expected battery values based on current voltages:**
|
||||
| Sensor | Voltage | Expected Battery |
|
||||
|--------|---------|------------------|
|
||||
| Temp Living Room | 2710 mV | ~68% |
|
||||
| Temp Office | 2658 mV | ~62% |
|
||||
| temp_server | 2765 mV | ~74% |
|
||||
|
||||
### Solution 2: Alert on sensor staleness (Implemented)
|
||||
|
||||
Added Prometheus alert `zigbee_sensor_stale` in `services/monitoring/rules.yml` that fires when a Zigbee temperature sensor hasn't updated in over 1 hour. This provides defense-in-depth for detecting dead sensors regardless of battery reporting accuracy.
|
||||
|
||||
**Alert details:**
|
||||
- Expression: `(time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 3600`
|
||||
- Severity: warning
|
||||
- For: 5m
|
||||
|
||||
## Pre-Deployment Verification
|
||||
|
||||
### Backup Verification
|
||||
|
||||
Before deployment, verified ha1 backup configuration and ran manual backup:
|
||||
|
||||
**Backup paths:**
|
||||
- `/var/lib/hass` ✓
|
||||
- `/var/lib/zigbee2mqtt` ✓
|
||||
- `/var/lib/mosquitto` ✓
|
||||
|
||||
**Manual backup (2026-02-05 22:45:23):**
|
||||
- Snapshot ID: `59704dfa`
|
||||
- Files: 77 total (0 new, 13 changed, 64 unmodified)
|
||||
- Data: 62.635 MiB processed, 6.928 MiB stored (compressed)
|
||||
|
||||
### Other directories reviewed
|
||||
|
||||
- `/var/lib/vault` — Contains AppRole credentials; not backed up (can be re-provisioned via Ansible)
|
||||
- `/var/lib/sops-nix` — Legacy; ha1 uses Vault now
|
||||
|
||||
## Post-Deployment Steps
|
||||
|
||||
After deploying to ha1:
|
||||
|
||||
1. Restart zigbee2mqtt service (automatic on NixOS rebuild)
|
||||
2. In Home Assistant, the battery entities may need to be re-discovered:
|
||||
- Go to Settings → Devices & Services → MQTT
|
||||
- The new `value_template` should take effect after entity re-discovery
|
||||
- If not, try disabling and re-enabling the battery entities
|
||||
|
||||
## Notes
|
||||
|
||||
- Device configuration is now declarative in NixOS. Future device additions via Zigbee2MQTT frontend will need to be added to the NixOS config to persist.
|
||||
- The `devices.yaml` file on ha1 will be overwritten on service start but can be removed after confirming the new config works.
|
||||
- The NixOS zigbee2mqtt module defaults to `devices = "devices.yaml"` but our explicit inline config overrides this.
|
||||
179
docs/plans/homelab-exporter.md
Normal file
179
docs/plans/homelab-exporter.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Homelab Infrastructure Exporter
|
||||
|
||||
## Overview
|
||||
|
||||
Build a Prometheus exporter for metrics specific to our homelab infrastructure. Unlike the generic nixos-exporter, this covers services and patterns unique to our environment.
|
||||
|
||||
## Current State
|
||||
|
||||
### Existing Exporters
|
||||
- **node-exporter** (all hosts): System metrics
|
||||
- **systemd-exporter** (all hosts): Service restart counts, IP accounting
|
||||
- **labmon** (monitoring01): TLS certificate monitoring, step-ca health
|
||||
- **Service-specific**: unbound, postgres, nats, jellyfin, home-assistant, caddy, step-ca
|
||||
|
||||
### Gaps
|
||||
- No visibility into Vault/OpenBao lease expiry
|
||||
- No ACME certificate expiry from internal CA
|
||||
- No Proxmox guest agent metrics from inside VMs
|
||||
|
||||
## Metrics
|
||||
|
||||
### Vault/OpenBao Metrics
|
||||
|
||||
| Metric | Description | Source |
|
||||
|--------|-------------|--------|
|
||||
| `homelab_vault_token_expiry_seconds` | Seconds until AppRole token expires | Token metadata or lease file |
|
||||
| `homelab_vault_token_renewable` | 1 if token is renewable | Token metadata |
|
||||
|
||||
Labels: `role` (AppRole name)
|
||||
|
||||
### ACME Certificate Metrics
|
||||
|
||||
| Metric | Description | Source |
|
||||
|--------|-------------|--------|
|
||||
| `homelab_acme_cert_expiry_seconds` | Seconds until certificate expires | Parse cert from `/var/lib/acme/*/cert.pem` |
|
||||
| `homelab_acme_cert_not_after` | Unix timestamp of cert expiry | Certificate NotAfter field |
|
||||
|
||||
Labels: `domain`, `issuer`
|
||||
|
||||
Note: labmon already monitors external TLS endpoints. This covers local ACME-managed certs.
|
||||
|
||||
### Proxmox Guest Metrics (future)
|
||||
|
||||
| Metric | Description | Source |
|
||||
|--------|-------------|--------|
|
||||
| `homelab_proxmox_guest_info` | Info gauge with VM ID, name | QEMU guest agent |
|
||||
| `homelab_proxmox_guest_agent_running` | 1 if guest agent is responsive | Agent ping |
|
||||
|
||||
### DNS Zone Metrics (future)
|
||||
|
||||
| Metric | Description | Source |
|
||||
|--------|-------------|--------|
|
||||
| `homelab_dns_zone_serial` | Current zone serial number | DNS AXFR or zone file |
|
||||
|
||||
Labels: `zone`
|
||||
|
||||
## Architecture
|
||||
|
||||
Single binary with collectors enabled via config. Runs on hosts that need specific collectors.
|
||||
|
||||
```
|
||||
homelab-exporter
|
||||
├── main.go
|
||||
├── collector/
|
||||
│ ├── vault.go # Vault/OpenBao token metrics
|
||||
│ ├── acme.go # ACME certificate metrics
|
||||
│ └── proxmox.go # Proxmox guest agent (future)
|
||||
└── config/
|
||||
└── config.go
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
listen_addr: ":9970"
|
||||
collectors:
|
||||
vault:
|
||||
enabled: true
|
||||
token_path: "/var/lib/vault/token"
|
||||
acme:
|
||||
enabled: true
|
||||
cert_dirs:
|
||||
- "/var/lib/acme"
|
||||
proxmox:
|
||||
enabled: false
|
||||
```
|
||||
|
||||
## NixOS Module
|
||||
|
||||
```nix
|
||||
services.homelab-exporter = {
|
||||
enable = true;
|
||||
port = 9970;
|
||||
collectors = {
|
||||
vault = {
|
||||
enable = true;
|
||||
tokenPath = "/var/lib/vault/token";
|
||||
};
|
||||
acme = {
|
||||
enable = true;
|
||||
certDirs = [ "/var/lib/acme" ];
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
# Auto-register scrape target
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "homelab-exporter";
|
||||
port = 9970;
|
||||
}];
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
### Deployment
|
||||
|
||||
Deploy on hosts that have relevant data:
|
||||
- **All hosts with ACME certs**: acme collector
|
||||
- **All hosts with Vault**: vault collector
|
||||
- **Proxmox VMs**: proxmox collector (when implemented)
|
||||
|
||||
### Relationship with nixos-exporter
|
||||
|
||||
These are complementary:
|
||||
- **nixos-exporter** (port 9971): Generic NixOS metrics, deploy everywhere
|
||||
- **homelab-exporter** (port 9970): Infrastructure-specific, deploy selectively
|
||||
|
||||
Both can run on the same host if needed.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Language
|
||||
|
||||
Go - consistent with labmon and nixos-exporter.
|
||||
|
||||
### Phase 1: Core + ACME
|
||||
1. Create git repository (git.t-juice.club/torjus/homelab-exporter)
|
||||
2. Implement ACME certificate collector
|
||||
3. HTTP server with `/metrics`
|
||||
4. NixOS module
|
||||
|
||||
### Phase 2: Vault Collector
|
||||
1. Implement token expiry detection
|
||||
2. Handle missing/expired tokens gracefully
|
||||
|
||||
### Phase 3: Dashboard
|
||||
1. Create Grafana dashboard for infrastructure health
|
||||
2. Add to existing monitoring service module
|
||||
|
||||
## Alert Examples
|
||||
|
||||
```yaml
|
||||
- alert: VaultTokenExpiringSoon
|
||||
expr: homelab_vault_token_expiry_seconds < 3600
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Vault token on {{ $labels.instance }} expires in < 1 hour"
|
||||
|
||||
- alert: ACMECertExpiringSoon
|
||||
expr: homelab_acme_cert_expiry_seconds < 7 * 24 * 3600
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "ACME cert {{ $labels.domain }} on {{ $labels.instance }} expires in < 7 days"
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
|
||||
- [ ] How to read Vault token expiry without re-authenticating?
|
||||
- [ ] Should ACME collector also check key/cert match?
|
||||
|
||||
## Notes
|
||||
|
||||
- Port 9970 (labmon uses 9969, nixos-exporter will use 9971)
|
||||
- Keep infrastructure-specific logic here, generic NixOS stuff in nixos-exporter
|
||||
- Consider merging Proxmox metrics with pve-exporter if overlap is significant
|
||||
224
docs/plans/host-migration-to-opentofu.md
Normal file
224
docs/plans/host-migration-to-opentofu.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# Host Migration to OpenTofu
|
||||
|
||||
## Overview
|
||||
|
||||
Migrate all existing hosts (provisioned manually before the OpenTofu pipeline) into the new
|
||||
OpenTofu-managed provisioning workflow. Hosts are categorized by their state requirements:
|
||||
stateless hosts are simply recreated, stateful hosts require backup and restore, and some
|
||||
hosts are decommissioned or deferred.
|
||||
|
||||
## Current State
|
||||
|
||||
Hosts already managed by OpenTofu: `vault01`, `testvm01`, `vaulttest01`
|
||||
|
||||
Hosts to migrate:
|
||||
|
||||
| Host | Category | Notes |
|
||||
|------|----------|-------|
|
||||
| ns1 | Stateless | Primary DNS, recreate |
|
||||
| ns2 | Stateless | Secondary DNS, recreate |
|
||||
| nix-cache01 | Stateless | Binary cache, recreate |
|
||||
| http-proxy | Stateless | Reverse proxy, recreate |
|
||||
| nats1 | Stateless | Messaging, recreate |
|
||||
| auth01 | Decommission | No longer in use |
|
||||
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
||||
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
||||
| pgdb1 | Stateful | PostgreSQL databases |
|
||||
| jump | Decommission | No longer needed |
|
||||
| ca | Deferred | Pending Phase 4c PKI migration to OpenBao |
|
||||
|
||||
## Phase 1: Backup Preparation
|
||||
|
||||
Before migrating any stateful host, ensure restic backups are in place and verified.
|
||||
|
||||
### 1a. Expand monitoring01 Grafana Backup
|
||||
|
||||
The existing backup only covers `/var/lib/grafana/plugins` and a sqlite dump of `grafana.db`.
|
||||
Expand to back up all of `/var/lib/grafana/` to capture config directory and any other state.
|
||||
|
||||
### 1b. Add Jellyfin Backup to jelly01
|
||||
|
||||
No backup currently exists. Add a restic backup job for `/var/lib/jellyfin/` which contains:
|
||||
- `config/` — server settings, library configuration
|
||||
- `data/` — user watch history, playback state, library metadata
|
||||
|
||||
Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup.
|
||||
The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates.
|
||||
|
||||
### 1c. Add PostgreSQL Backup to pgdb1
|
||||
|
||||
No backup currently exists. Add a restic backup job with a `pg_dumpall` pre-hook to capture
|
||||
all databases and roles. The dump should be piped through restic's stdin backup (similar to
|
||||
the Grafana DB dump pattern on monitoring01).
|
||||
|
||||
### 1d. Verify Existing ha1 Backup
|
||||
|
||||
ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify
|
||||
these backups are current and restorable before proceeding with migration.
|
||||
|
||||
### 1e. Verify All Backups
|
||||
|
||||
After adding/expanding backup jobs:
|
||||
1. Trigger a manual backup run on each host
|
||||
2. Verify backup integrity with `restic check`
|
||||
3. Test a restore to a temporary location to confirm data is recoverable
|
||||
|
||||
## Phase 2: Declare pgdb1 Databases in Nix
|
||||
|
||||
Before migrating pgdb1, audit the manually-created databases and users on the running
|
||||
instance, then declare them in the Nix configuration using `ensureDatabases` and
|
||||
`ensureUsers`. This makes the PostgreSQL setup reproducible on the new host.
|
||||
|
||||
Steps:
|
||||
1. SSH to pgdb1, run `\l` and `\du` in psql to list databases and roles
|
||||
2. Add `ensureDatabases` and `ensureUsers` to `services/postgres/postgres.nix`
|
||||
3. Document any non-default PostgreSQL settings or extensions per database
|
||||
|
||||
After reprovisioning, the databases will be created by NixOS, and data restored from the
|
||||
`pg_dumpall` backup.
|
||||
|
||||
## Phase 3: Stateless Host Migration
|
||||
|
||||
These hosts have no meaningful state and can be recreated fresh. For each host:
|
||||
|
||||
1. Add the host definition to `terraform/vms.tf` (using `create-host` or manually)
|
||||
2. Commit and push to master
|
||||
3. Run `tofu apply` to provision the new VM
|
||||
4. Wait for bootstrap to complete (VM pulls config from master and reboots)
|
||||
5. Verify the host is functional
|
||||
6. Decommission the old VM in Proxmox
|
||||
|
||||
### Migration Order
|
||||
|
||||
Migrate stateless hosts in an order that minimizes disruption:
|
||||
|
||||
1. **nix-cache01** — low risk, no downstream dependencies during migration
|
||||
2. **nats1** — low risk, verify no persistent JetStream streams first
|
||||
4. **http-proxy** — brief disruption to proxied services, migrate during low-traffic window
|
||||
5. **ns1, ns2** — migrate one at a time, verify DNS resolution between each
|
||||
|
||||
For ns1/ns2: migrate ns2 first (secondary), verify AXFR works, then migrate ns1. All hosts
|
||||
use both ns1 and ns2 as resolvers, so one being down briefly is tolerable.
|
||||
|
||||
## Phase 4: Stateful Host Migration
|
||||
|
||||
For each stateful host, the procedure is:
|
||||
|
||||
1. Trigger a final restic backup
|
||||
2. Stop services on the old host (to prevent state drift during migration)
|
||||
3. Provision the new VM via `tofu apply`
|
||||
4. Wait for bootstrap to complete
|
||||
5. Stop the relevant services on the new host
|
||||
6. Restore data from restic backup
|
||||
7. Start services and verify functionality
|
||||
8. Decommission the old VM
|
||||
|
||||
### 4a. pgdb1
|
||||
|
||||
1. Run final `pg_dumpall` backup via restic
|
||||
2. Stop PostgreSQL on the old host
|
||||
3. Provision new pgdb1 via OpenTofu
|
||||
4. After bootstrap, NixOS creates the declared databases/users
|
||||
5. Restore data with `pg_restore` or `psql < dumpall.sql`
|
||||
6. Verify database connectivity from gunter (`10.69.30.105`)
|
||||
7. Decommission old VM
|
||||
|
||||
### 4b. monitoring01
|
||||
|
||||
1. Run final Grafana backup
|
||||
2. Provision new monitoring01 via OpenTofu
|
||||
3. After bootstrap, restore `/var/lib/grafana/` from restic
|
||||
4. Restart Grafana, verify dashboards and datasources are intact
|
||||
5. Prometheus and Loki start fresh with empty data (acceptable)
|
||||
6. Verify all scrape targets are being collected
|
||||
7. Decommission old VM
|
||||
|
||||
### 4c. jelly01
|
||||
|
||||
1. Run final Jellyfin backup
|
||||
2. Provision new jelly01 via OpenTofu
|
||||
3. After bootstrap, restore `/var/lib/jellyfin/` from restic
|
||||
4. Verify NFS mount to NAS is working
|
||||
5. Start Jellyfin, verify watch history and library metadata are present
|
||||
6. Decommission old VM
|
||||
|
||||
### 4d. ha1
|
||||
|
||||
1. Verify latest restic backup is current
|
||||
2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host
|
||||
3. Provision new ha1 via OpenTofu
|
||||
4. After bootstrap, restore `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`
|
||||
5. Start services, verify Home Assistant is functional
|
||||
6. Verify Zigbee devices are still paired and communicating
|
||||
7. Decommission old VM
|
||||
|
||||
**Note:** ha1 currently has 2 GB RAM, which is consistently tight. Average memory usage has
|
||||
climbed from ~57% (30-day avg) to ~70% currently, with a 30-day low of only 187 MB free.
|
||||
Consider increasing to 4 GB when reprovisioning to allow headroom for additional integrations.
|
||||
|
||||
**Note:** ha1 is the highest-risk migration due to Zigbee device pairings. The Zigbee
|
||||
coordinator state in `/var/lib/zigbee2mqtt` should preserve pairings, but verify on a
|
||||
non-critical time window.
|
||||
|
||||
**USB Passthrough:** The ha1 VM has a USB device passed through from the Proxmox hypervisor
|
||||
(the Zigbee coordinator). The new VM must be configured with the same USB passthrough in
|
||||
OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropriate
|
||||
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
||||
through before starting Zigbee2MQTT on the new host.
|
||||
|
||||
## Phase 5: Decommission jump and auth01 Hosts
|
||||
|
||||
### jump
|
||||
1. Verify nothing depends on the jump host (no SSH proxy configs pointing to it, etc.)
|
||||
2. Remove host configuration from `hosts/jump/`
|
||||
3. Remove from `flake.nix`
|
||||
4. Remove any secrets in `secrets/jump/`
|
||||
5. Remove from `.sops.yaml`
|
||||
6. Destroy the VM in Proxmox
|
||||
7. Commit cleanup
|
||||
|
||||
### auth01
|
||||
1. Remove host configuration from `hosts/auth01/`
|
||||
2. Remove from `flake.nix`
|
||||
3. Remove any secrets in `secrets/auth01/`
|
||||
4. Remove from `.sops.yaml`
|
||||
5. Remove `services/authelia/` and `services/lldap/` (only used by auth01)
|
||||
6. Destroy the VM in Proxmox
|
||||
7. Commit cleanup
|
||||
|
||||
## Phase 6: Decommission ca Host (Deferred)
|
||||
|
||||
Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
||||
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
||||
the same cleanup steps as the jump host.
|
||||
|
||||
## Phase 7: Remove sops-nix
|
||||
|
||||
Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
||||
all remnants:
|
||||
- `sops-nix` input from `flake.nix` and `flake.lock`
|
||||
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
||||
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
||||
- `system/sops.nix` and its import in `system/default.nix`
|
||||
- `.sops.yaml`
|
||||
- `secrets/` directory
|
||||
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
||||
- Template scripts that generate age keys for sops (`hosts/template/scripts.nix`,
|
||||
`hosts/template2/scripts.nix`)
|
||||
|
||||
See `docs/plans/completed/sops-to-openbao-migration.md` for full context.
|
||||
|
||||
## Notes
|
||||
|
||||
- Each host migration should be done individually, not in bulk, to limit blast radius
|
||||
- Keep the old VM running until the new one is verified — do not destroy prematurely
|
||||
- The old VMs use IPs that the new VMs need, so the old VM must be shut down before
|
||||
the new one is provisioned (or use a temporary IP and swap after verification)
|
||||
- Stateful migrations should be done during low-usage windows
|
||||
- After all migrations are complete, the only hosts not in OpenTofu will be ca (deferred)
|
||||
- Since many hosts are being recreated, this is a good opportunity to establish consistent
|
||||
hostname naming conventions before provisioning the new VMs. Current naming is inconsistent
|
||||
(e.g. `ns1` vs `nix-cache01`, `ha1` vs `auth01`, `pgdb1` vs `http-proxy`). Decide on a
|
||||
convention before starting migrations — e.g. whether to always use numeric suffixes, a
|
||||
consistent format like `service-NN`, role-based vs function-based names, etc.
|
||||
176
docs/plans/nixos-exporter.md
Normal file
176
docs/plans/nixos-exporter.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# NixOS Prometheus Exporter
|
||||
|
||||
## Overview
|
||||
|
||||
Build a generic Prometheus exporter for NixOS-specific metrics. This exporter should be useful for any NixOS deployment, not just our homelab.
|
||||
|
||||
## Goal
|
||||
|
||||
Provide visibility into NixOS system state that standard exporters don't cover:
|
||||
- Generation management (count, age, current vs booted)
|
||||
- Flake input freshness
|
||||
- Upgrade status
|
||||
|
||||
## Metrics
|
||||
|
||||
### Core Metrics
|
||||
|
||||
| Metric | Description | Source |
|
||||
|--------|-------------|--------|
|
||||
| `nixos_generation_count` | Number of system generations | Count entries in `/nix/var/nix/profiles/system-*` |
|
||||
| `nixos_current_generation` | Active generation number | Parse `readlink /run/current-system` |
|
||||
| `nixos_booted_generation` | Generation that was booted | Parse `/run/booted-system` |
|
||||
| `nixos_generation_age_seconds` | Age of current generation | File mtime of current system profile |
|
||||
| `nixos_config_mismatch` | 1 if booted != current, 0 otherwise | Compare symlink targets |
|
||||
|
||||
### Flake Metrics (optional collector)
|
||||
|
||||
| Metric | Description | Source |
|
||||
|--------|-------------|--------|
|
||||
| `nixos_flake_input_age_seconds` | Age of each flake.lock input | Parse `lastModified` from flake.lock |
|
||||
| `nixos_flake_input_info` | Info gauge with rev label | Parse `rev` from flake.lock |
|
||||
|
||||
Labels: `input` (e.g., "nixpkgs", "home-manager")
|
||||
|
||||
### Future Metrics
|
||||
|
||||
| Metric | Description | Source |
|
||||
|--------|-------------|--------|
|
||||
| `nixos_upgrade_pending` | 1 if remote differs from local | Compare flake refs (expensive) |
|
||||
| `nixos_store_size_bytes` | Size of /nix/store | `du` or filesystem stats |
|
||||
| `nixos_store_path_count` | Number of store paths | Count entries |
|
||||
|
||||
## Architecture
|
||||
|
||||
Single binary with optional collectors enabled via config or flags.
|
||||
|
||||
```
|
||||
nixos-exporter
|
||||
├── main.go
|
||||
├── collector/
|
||||
│ ├── generation.go # Core generation metrics
|
||||
│ └── flake.go # Flake input metrics
|
||||
└── config/
|
||||
└── config.go
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
listen_addr: ":9971"
|
||||
collectors:
|
||||
generation:
|
||||
enabled: true
|
||||
flake:
|
||||
enabled: false
|
||||
lock_path: "/etc/nixos/flake.lock" # or auto-detect from /run/current-system
|
||||
```
|
||||
|
||||
Command-line alternative:
|
||||
```bash
|
||||
nixos-exporter --listen=:9971 --collector.flake --flake.lock-path=/etc/nixos/flake.lock
|
||||
```
|
||||
|
||||
## NixOS Module
|
||||
|
||||
```nix
|
||||
services.prometheus.exporters.nixos = {
|
||||
enable = true;
|
||||
port = 9971;
|
||||
collectors = [ "generation" "flake" ];
|
||||
flake.lockPath = "/etc/nixos/flake.lock";
|
||||
};
|
||||
```
|
||||
|
||||
The module should integrate with nixpkgs' existing `services.prometheus.exporters.*` pattern.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Language
|
||||
|
||||
Go - mature prometheus client library, single static binary, easy cross-compilation.
|
||||
|
||||
### Phase 1: Core
|
||||
1. Create git repository
|
||||
2. Implement generation collector (count, current, booted, age, mismatch)
|
||||
3. Basic HTTP server with `/metrics` endpoint
|
||||
4. NixOS module
|
||||
|
||||
### Phase 2: Flake Collector
|
||||
1. Parse flake.lock JSON format
|
||||
2. Extract lastModified timestamps per input
|
||||
3. Add input labels
|
||||
|
||||
### Phase 3: Packaging
|
||||
1. Add to nixpkgs or publish as flake
|
||||
2. Documentation
|
||||
3. Example Grafana dashboard
|
||||
|
||||
## Example Output
|
||||
|
||||
```
|
||||
# HELP nixos_generation_count Total number of system generations
|
||||
# TYPE nixos_generation_count gauge
|
||||
nixos_generation_count 47
|
||||
|
||||
# HELP nixos_current_generation Currently active generation number
|
||||
# TYPE nixos_current_generation gauge
|
||||
nixos_current_generation 47
|
||||
|
||||
# HELP nixos_booted_generation Generation that was booted
|
||||
# TYPE nixos_booted_generation gauge
|
||||
nixos_booted_generation 46
|
||||
|
||||
# HELP nixos_generation_age_seconds Age of current generation in seconds
|
||||
# TYPE nixos_generation_age_seconds gauge
|
||||
nixos_generation_age_seconds 3600
|
||||
|
||||
# HELP nixos_config_mismatch 1 if booted generation differs from current
|
||||
# TYPE nixos_config_mismatch gauge
|
||||
nixos_config_mismatch 1
|
||||
|
||||
# HELP nixos_flake_input_age_seconds Age of flake input in seconds
|
||||
# TYPE nixos_flake_input_age_seconds gauge
|
||||
nixos_flake_input_age_seconds{input="nixpkgs"} 259200
|
||||
nixos_flake_input_age_seconds{input="home-manager"} 86400
|
||||
```
|
||||
|
||||
## Alert Examples
|
||||
|
||||
```yaml
|
||||
- alert: NixOSConfigStale
|
||||
expr: nixos_generation_age_seconds > 7 * 24 * 3600
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "NixOS config on {{ $labels.instance }} is over 7 days old"
|
||||
|
||||
- alert: NixOSRebootRequired
|
||||
expr: nixos_config_mismatch == 1
|
||||
for: 24h
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
summary: "{{ $labels.instance }} needs reboot to apply config"
|
||||
|
||||
- alert: NixpkgsInputStale
|
||||
expr: nixos_flake_input_age_seconds{input="nixpkgs"} > 30 * 24 * 3600
|
||||
for: 1d
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
summary: "nixpkgs input on {{ $labels.instance }} is over 30 days old"
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
|
||||
- [ ] How to detect flake.lock path automatically? (check /run/current-system for flake info)
|
||||
- [ ] Should generation collector need root? (probably not, just reading symlinks)
|
||||
- [ ] Include in nixpkgs or distribute as standalone flake?
|
||||
|
||||
## Notes
|
||||
|
||||
- Port 9971 suggested (9970 reserved for homelab-exporter)
|
||||
- Keep scope focused on NixOS-specific metrics - don't duplicate node-exporter
|
||||
- Consider submitting to prometheus exporter registry once stable
|
||||
27
docs/plans/nixos-improvements.md
Normal file
27
docs/plans/nixos-improvements.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# NixOS Infrastructure Improvements
|
||||
|
||||
This document contains planned improvements to the NixOS infrastructure that are not directly part of the automated deployment pipeline.
|
||||
|
||||
## Planned
|
||||
|
||||
### Custom NixOS Options for Service and System Configuration
|
||||
|
||||
Currently, most service configurations in `services/` and shared system configurations in `system/` are written as plain NixOS module imports without declaring custom options. This means host-specific customization is done by directly setting upstream NixOS options or by duplicating configuration across hosts.
|
||||
|
||||
The `homelab.dns` module (`modules/homelab/dns.nix`) is the first example of defining custom options under a `homelab.*` namespace. This pattern should be extended to more of the repository's configuration.
|
||||
|
||||
**Goals:**
|
||||
|
||||
- Define `homelab.*` options for services and shared configuration where it makes sense, following the pattern established by `homelab.dns`
|
||||
- Allow hosts to enable/configure services declaratively (e.g. `homelab.monitoring.enable`, `homelab.http-proxy.virtualHosts`) rather than importing opaque module files
|
||||
- Keep options simple and focused — wrap only the parts that vary between hosts or that benefit from a clearer interface. Not everything needs a custom option.
|
||||
|
||||
**Candidate areas:**
|
||||
|
||||
- `system/` modules (e.g. auto-upgrade schedule, ACME CA URL, monitoring endpoints)
|
||||
- `services/` modules where multiple hosts use the same service with different parameters
|
||||
- Cross-cutting concerns that are currently implicit (e.g. which Loki endpoint promtail ships to)
|
||||
|
||||
## Completed
|
||||
|
||||
- [DNS Automation](completed/dns-automation.md) - Automatically generate DNS entries from host configurations
|
||||
119
docs/plans/prometheus-scrape-target-labels.md
Normal file
119
docs/plans/prometheus-scrape-target-labels.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Prometheus Scrape Target Labels
|
||||
|
||||
## Goal
|
||||
|
||||
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
|
||||
|
||||
## Motivation
|
||||
|
||||
Some hosts have workloads that make generic alert thresholds inappropriate. For example, `nix-cache01` regularly hits high CPU during builds, requiring a longer `for` duration on `high_cpu_load`. Currently this is handled by excluding specific instance names in PromQL expressions, which is brittle and doesn't scale.
|
||||
|
||||
With per-host labels, alert rules can use semantic filters like `{priority!="low"}` instead of `{instance!="nix-cache01.home.2rjus.net:9100"}`.
|
||||
|
||||
## Proposed Labels
|
||||
|
||||
### `priority`
|
||||
|
||||
Indicates alerting importance. Hosts with `priority = "low"` can have relaxed thresholds or longer durations in alert rules.
|
||||
|
||||
Values: `"high"` (default), `"low"`
|
||||
|
||||
### `role`
|
||||
|
||||
Describes the function of the host. Useful for grouping in dashboards and targeting role-specific alert rules.
|
||||
|
||||
Values: free-form string, e.g. `"dns"`, `"build-host"`, `"database"`, `"monitoring"`
|
||||
|
||||
**Note on multiple roles:** Prometheus labels are strictly string values, not lists. For hosts that serve multiple roles there are a few options:
|
||||
|
||||
- **Separate boolean labels:** `role_build_host = "true"`, `role_cache_server = "true"` -- flexible but verbose, and requires updating the module when new roles are added.
|
||||
- **Delimited string:** `role = "build-host,cache-server"` -- works with regex matchers (`{role=~".*build-host.*"}`), but regex matching is less clean and more error-prone.
|
||||
- **Pick a primary role:** `role = "build-host"` -- simplest, and probably sufficient since most hosts have one primary role.
|
||||
|
||||
Recommendation: start with a single primary role string. If multi-role matching becomes a real need, switch to separate boolean labels.
|
||||
|
||||
### `dns_role`
|
||||
|
||||
For DNS servers specifically, distinguish between primary and secondary resolvers. The secondary resolver (ns2) receives very little traffic and has a cold cache, making generic cache hit ratio alerts inappropriate.
|
||||
|
||||
Values: `"primary"`, `"secondary"`
|
||||
|
||||
Example use case: The `unbound_low_cache_hit_ratio` alert fires on ns2 because its cache hit ratio (~62%) is lower than ns1 (~90%). This is expected behavior since ns2 gets ~100x less traffic. With a `dns_role` label, the alert can either exclude secondaries or use different thresholds:
|
||||
|
||||
```promql
|
||||
# Only alert on primary DNS
|
||||
unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"}
|
||||
|
||||
# Or use different thresholds
|
||||
(unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"})
|
||||
or
|
||||
(unbound_cache_hit_ratio < 0.5 and on(instance) unbound_up{dns_role="secondary"})
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
### 1. Add `labels` option to `homelab.monitoring`
|
||||
|
||||
In `modules/homelab/monitoring.nix`, add:
|
||||
|
||||
```nix
|
||||
labels = lib.mkOption {
|
||||
type = lib.types.attrsOf lib.types.str;
|
||||
default = { };
|
||||
description = "Custom labels to attach to this host's scrape targets";
|
||||
};
|
||||
```
|
||||
|
||||
### 2. Update `lib/monitoring.nix`
|
||||
|
||||
- `extractHostMonitoring` should carry `labels` through in its return value.
|
||||
- `generateNodeExporterTargets` currently returns a flat list of target strings. It needs to return structured `static_configs` entries instead, grouping targets by their label sets:
|
||||
|
||||
```nix
|
||||
# Before (flat list):
|
||||
["ns1.home.2rjus.net:9100", "ns2.home.2rjus.net:9100", ...]
|
||||
|
||||
# After (grouped by labels):
|
||||
[
|
||||
{ targets = ["ns1.home.2rjus.net:9100", "ns2.home.2rjus.net:9100", ...]; }
|
||||
{ targets = ["nix-cache01.home.2rjus.net:9100"]; labels = { priority = "low"; role = "build-host"; }; }
|
||||
]
|
||||
```
|
||||
|
||||
This requires grouping hosts by their label attrset and producing one `static_configs` entry per unique label combination. Hosts with no custom labels get grouped together with no extra labels (preserving current behavior).
|
||||
|
||||
### 3. Update `services/monitoring/prometheus.nix`
|
||||
|
||||
Change the node-exporter scrape config to use the new structured output:
|
||||
|
||||
```nix
|
||||
# Before:
|
||||
static_configs = [{ targets = nodeExporterTargets; }];
|
||||
|
||||
# After:
|
||||
static_configs = nodeExporterTargets;
|
||||
```
|
||||
|
||||
### 4. Set labels on hosts
|
||||
|
||||
Example in `hosts/nix-cache01/configuration.nix` or the relevant service module:
|
||||
|
||||
```nix
|
||||
homelab.monitoring.labels = {
|
||||
priority = "low";
|
||||
role = "build-host";
|
||||
};
|
||||
```
|
||||
|
||||
### 5. Update alert rules
|
||||
|
||||
After implementing labels, review and update `services/monitoring/rules.yml`:
|
||||
|
||||
- Replace instance-name exclusions with label-based filters (e.g. `{priority!="low"}` instead of `{instance!="nix-cache01.home.2rjus.net:9100"}`).
|
||||
- Consider whether any other rules should differentiate by priority or role.
|
||||
|
||||
Specifically, the `high_cpu_load` rule currently has a nix-cache01 exclusion that should be replaced with a `priority`-based filter.
|
||||
|
||||
### 6. Consider labels for `generateScrapeConfigs` (service targets)
|
||||
|
||||
The same label propagation could be applied to service-level scrape targets. This is optional and can be deferred -- service targets are more specialized and less likely to need generic label-based filtering.
|
||||
122
docs/plans/remote-access.md
Normal file
122
docs/plans/remote-access.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Remote Access to Homelab Services
|
||||
|
||||
## Status: Planning
|
||||
|
||||
## Goal
|
||||
|
||||
Enable remote access to some or all homelab services from outside the internal network, without exposing anything directly to the internet.
|
||||
|
||||
## Current State
|
||||
|
||||
- All services are only accessible from the internal 10.69.13.x network
|
||||
- Exception: jelly01 has a WireGuard link to an external VPS
|
||||
- No services are directly exposed to the public internet
|
||||
|
||||
## Constraints
|
||||
|
||||
- Nothing should be directly accessible from the outside
|
||||
- Must use VPN or overlay network (no port forwarding of services)
|
||||
- Self-hosted solutions preferred over managed services
|
||||
|
||||
## Options
|
||||
|
||||
### 1. WireGuard Gateway (Internal Router)
|
||||
|
||||
A dedicated NixOS host on the internal network with a WireGuard tunnel out to the VPS. The VPS becomes the public entry point, and the gateway routes traffic to internal services. Firewall rules on the gateway control which services are reachable.
|
||||
|
||||
**Pros:**
|
||||
- Simple, well-understood technology
|
||||
- Already running WireGuard for jelly01
|
||||
- Full control over routing and firewall rules
|
||||
- Excellent NixOS module support
|
||||
- No extra dependencies
|
||||
|
||||
**Cons:**
|
||||
- Hub-and-spoke topology (all traffic goes through VPS)
|
||||
- Manual peer management
|
||||
- Adding a new client device means editing configs on both VPS and gateway
|
||||
|
||||
### 2. WireGuard Mesh (No Relay)
|
||||
|
||||
Each client device connects directly to a WireGuard endpoint. Could be on the VPS which forwards to the homelab, or if there is a routable IP at home, directly to an internal host.
|
||||
|
||||
**Pros:**
|
||||
- Simple and fast
|
||||
- No extra software
|
||||
|
||||
**Cons:**
|
||||
- Manual key and endpoint management for every peer
|
||||
- Doesn't scale well
|
||||
- If behind CGNAT, still needs the VPS as intermediary
|
||||
|
||||
### 3. Headscale (Self-Hosted Tailscale)
|
||||
|
||||
Run a Headscale control server (on the VPS or internally) and install the Tailscale client on homelab hosts and personal devices. Gets the Tailscale mesh networking UX without depending on Tailscale's infrastructure.
|
||||
|
||||
**Pros:**
|
||||
- Mesh topology - devices communicate directly via NAT traversal (DERP relay as fallback)
|
||||
- Easy to add/remove devices
|
||||
- ACL support for granular access control
|
||||
- MagicDNS for service discovery
|
||||
- Good NixOS support for both headscale server and tailscale client
|
||||
- Subnet routing lets you expose the entire 10.69.13.x network or specific hosts without installing tailscale on every host
|
||||
|
||||
**Cons:**
|
||||
- More moving parts than plain WireGuard
|
||||
- Headscale is a third-party reimplementation, can lag behind Tailscale features
|
||||
- Need to run and maintain the control server
|
||||
|
||||
### 4. Tailscale (Managed)
|
||||
|
||||
Same as Headscale but using Tailscale's hosted control plane.
|
||||
|
||||
**Pros:**
|
||||
- Zero infrastructure to manage on the control plane side
|
||||
- Polished UX, well-maintained clients
|
||||
- Free tier covers personal use
|
||||
|
||||
**Cons:**
|
||||
- Dependency on Tailscale's service
|
||||
- Less aligned with self-hosting preference
|
||||
- Coordination metadata goes through their servers (data plane is still peer-to-peer)
|
||||
|
||||
### 5. Netbird (Self-Hosted)
|
||||
|
||||
Open-source alternative to Tailscale with a self-hostable management server. WireGuard-based, supports ACLs and NAT traversal.
|
||||
|
||||
**Pros:**
|
||||
- Fully self-hostable
|
||||
- Web UI for management
|
||||
- ACL and peer grouping support
|
||||
|
||||
**Cons:**
|
||||
- Heavier to self-host (needs multiple components: management server, signal server, TURN relay)
|
||||
- Less mature NixOS module support compared to Tailscale/Headscale
|
||||
|
||||
### 6. Nebula (by Defined Networking)
|
||||
|
||||
Certificate-based mesh VPN. Each node gets a certificate from a CA you control. No central coordination server needed at runtime.
|
||||
|
||||
**Pros:**
|
||||
- No always-on control plane
|
||||
- Certificate-based identity
|
||||
- Lightweight
|
||||
|
||||
**Cons:**
|
||||
- Less convenient for ad-hoc device addition (need to issue certs)
|
||||
- NAT traversal less mature than Tailscale's
|
||||
- Smaller community/ecosystem
|
||||
|
||||
## Key Decision Points
|
||||
|
||||
- **Static public IP vs CGNAT?** Determines whether clients can connect directly to home network or need VPS relay.
|
||||
- **Number of client devices?** If just phone and laptop, plain WireGuard via VPS is fine. More devices favors Headscale.
|
||||
- **Per-service vs per-network access?** Gateway with firewall rules gives per-service control. Headscale ACLs can also do this. Plain WireGuard gives network-level access with gateway firewall for finer control.
|
||||
- **Subnet routing vs per-host agents?** With Headscale/Tailscale, can either install client on every host, or use a single subnet router that advertises the 10.69.13.x range. The latter is closer to the gateway approach and avoids touching every host.
|
||||
|
||||
## Leading Candidates
|
||||
|
||||
Based on existing WireGuard experience, self-hosting preference, and NixOS stack:
|
||||
|
||||
1. **Headscale with a subnet router** - Best balance of convenience and self-hosting
|
||||
2. **WireGuard gateway via VPS** - Simplest, most transparent, builds on existing setup
|
||||
40
flake.lock
generated
40
flake.lock
generated
@@ -21,27 +21,6 @@
|
||||
"url": "https://git.t-juice.club/torjus/alerttonotify"
|
||||
}
|
||||
},
|
||||
"backup-helper": {
|
||||
"inputs": {
|
||||
"nixpkgs": [
|
||||
"nixpkgs-unstable"
|
||||
]
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1738015166,
|
||||
"narHash": "sha256-573tR4aXNjILKvYnjZUM5DZZME2H6YTHJkUKs3ZehFU=",
|
||||
"ref": "master",
|
||||
"rev": "f9540cc065692c7ca80735e7b08399459e0ea6d6",
|
||||
"revCount": 35,
|
||||
"type": "git",
|
||||
"url": "https://git.t-juice.club/torjus/backup-helper"
|
||||
},
|
||||
"original": {
|
||||
"ref": "master",
|
||||
"type": "git",
|
||||
"url": "https://git.t-juice.club/torjus/backup-helper"
|
||||
}
|
||||
},
|
||||
"labmon": {
|
||||
"inputs": {
|
||||
"nixpkgs": [
|
||||
@@ -65,11 +44,11 @@
|
||||
},
|
||||
"nixpkgs": {
|
||||
"locked": {
|
||||
"lastModified": 1769900590,
|
||||
"narHash": "sha256-I7Lmgj3owOTBGuauy9FL6qdpeK2umDoe07lM4V+PnyA=",
|
||||
"lastModified": 1770136044,
|
||||
"narHash": "sha256-tlFqNG/uzz2++aAmn4v8J0vAkV3z7XngeIIB3rM3650=",
|
||||
"owner": "nixos",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "41e216c0ca66c83b12ab7a98cc326b5db01db646",
|
||||
"rev": "e576e3c9cf9bad747afcddd9e34f51d18c855b4e",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
@@ -81,11 +60,11 @@
|
||||
},
|
||||
"nixpkgs-unstable": {
|
||||
"locked": {
|
||||
"lastModified": 1770019141,
|
||||
"narHash": "sha256-VKS4ZLNx4PNrABoB0L8KUpc1fE7CLpQXQs985tGfaCU=",
|
||||
"lastModified": 1770197578,
|
||||
"narHash": "sha256-AYqlWrX09+HvGs8zM6ebZ1pwUqjkfpnv8mewYwAo+iM=",
|
||||
"owner": "nixos",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "cb369ef2efd432b3cdf8622b0ffc0a97a02f3137",
|
||||
"rev": "00c21e4c93d963c50d4c0c89bfa84ed6e0694df2",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
@@ -98,7 +77,6 @@
|
||||
"root": {
|
||||
"inputs": {
|
||||
"alerttonotify": "alerttonotify",
|
||||
"backup-helper": "backup-helper",
|
||||
"labmon": "labmon",
|
||||
"nixpkgs": "nixpkgs",
|
||||
"nixpkgs-unstable": "nixpkgs-unstable",
|
||||
@@ -112,11 +90,11 @@
|
||||
]
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1769921679,
|
||||
"narHash": "sha256-twBMKGQvaztZQxFxbZnkg7y/50BW9yjtCBWwdjtOZew=",
|
||||
"lastModified": 1770145881,
|
||||
"narHash": "sha256-ktjWTq+D5MTXQcL9N6cDZXUf9kX8JBLLBLT0ZyOTSYY=",
|
||||
"owner": "Mic92",
|
||||
"repo": "sops-nix",
|
||||
"rev": "1e89149dcfc229e7e2ae24a8030f124a31e4f24f",
|
||||
"rev": "17eea6f3816ba6568b8c81db8a4e6ca438b30b7c",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
|
||||
88
flake.nix
88
flake.nix
@@ -9,10 +9,6 @@
|
||||
url = "github:Mic92/sops-nix";
|
||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||
};
|
||||
backup-helper = {
|
||||
url = "git+https://git.t-juice.club/torjus/backup-helper?ref=master";
|
||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||
};
|
||||
alerttonotify = {
|
||||
url = "git+https://git.t-juice.club/torjus/alerttonotify?ref=master";
|
||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||
@@ -29,7 +25,6 @@
|
||||
nixpkgs,
|
||||
nixpkgs-unstable,
|
||||
sops-nix,
|
||||
backup-helper,
|
||||
alerttonotify,
|
||||
labmon,
|
||||
...
|
||||
@@ -90,55 +85,6 @@
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
ns3 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/ns3
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
ns4 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/ns4
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
nixos-test1 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/nixos-test1
|
||||
sops-nix.nixosModules.sops
|
||||
backup-helper.nixosModules.backup-helper
|
||||
];
|
||||
};
|
||||
ha1 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
@@ -153,7 +99,6 @@
|
||||
)
|
||||
./hosts/ha1
|
||||
sops-nix.nixosModules.sops
|
||||
backup-helper.nixosModules.backup-helper
|
||||
];
|
||||
};
|
||||
template1 = nixpkgs.lib.nixosSystem {
|
||||
@@ -234,7 +179,6 @@
|
||||
)
|
||||
./hosts/monitoring01
|
||||
sops-nix.nixosModules.sops
|
||||
backup-helper.nixosModules.backup-helper
|
||||
labmon.nixosModules.labmon
|
||||
];
|
||||
};
|
||||
@@ -270,22 +214,6 @@
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
media1 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/media1
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
pgdb1 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
@@ -318,22 +246,6 @@
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
auth01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
inherit inputs self sops-nix;
|
||||
};
|
||||
modules = [
|
||||
(
|
||||
{ config, pkgs, ... }:
|
||||
{
|
||||
nixpkgs.overlays = commonOverlays;
|
||||
}
|
||||
)
|
||||
./hosts/auth01
|
||||
sops-nix.nixosModules.sops
|
||||
];
|
||||
};
|
||||
testvm01 = nixpkgs.lib.nixosSystem {
|
||||
inherit system;
|
||||
specialArgs = {
|
||||
|
||||
@@ -1,65 +0,0 @@
|
||||
{
|
||||
pkgs,
|
||||
...
|
||||
}:
|
||||
|
||||
{
|
||||
imports = [
|
||||
../template/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
../../common/vm
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot.loader.grub = {
|
||||
enable = true;
|
||||
device = "/dev/sda";
|
||||
configurationLimit = 3;
|
||||
};
|
||||
|
||||
networking.hostName = "auth01";
|
||||
networking.domain = "home.2rjus.net";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = true;
|
||||
networking.nameservers = [
|
||||
"10.69.13.5"
|
||||
"10.69.13.6"
|
||||
];
|
||||
|
||||
systemd.network.enable = true;
|
||||
systemd.network.networks."ens18" = {
|
||||
matchConfig.Name = "ens18";
|
||||
address = [
|
||||
"10.69.13.18/24"
|
||||
];
|
||||
routes = [
|
||||
{ Gateway = "10.69.13.1"; }
|
||||
];
|
||||
linkConfig.RequiredForOnline = "routable";
|
||||
};
|
||||
time.timeZone = "Europe/Oslo";
|
||||
|
||||
nix.settings.experimental-features = [
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
git
|
||||
];
|
||||
|
||||
services.qemuGuest.enable = true;
|
||||
|
||||
# Open ports in the firewall.
|
||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
system.stateVersion = "23.11"; # Did you read the comment?
|
||||
}
|
||||
@@ -1,8 +0,0 @@
|
||||
{ ... }:
|
||||
{
|
||||
imports = [
|
||||
./configuration.nix
|
||||
../../services/lldap
|
||||
../../services/authelia
|
||||
];
|
||||
}
|
||||
@@ -55,16 +55,35 @@
|
||||
git
|
||||
];
|
||||
|
||||
# Vault secrets management
|
||||
vault.enable = true;
|
||||
vault.secrets.backup-helper = {
|
||||
secretPath = "shared/backup/password";
|
||||
extractKey = "password";
|
||||
outputDir = "/run/secrets/backup_helper_secret";
|
||||
services = [ "restic-backups-ha1" ];
|
||||
};
|
||||
|
||||
# Backup service dirs
|
||||
sops.secrets."backup_helper_secret" = { };
|
||||
backup-helper = {
|
||||
enable = true;
|
||||
password-file = "/run/secrets/backup_helper_secret";
|
||||
backup-dirs = [
|
||||
services.restic.backups.ha1 = {
|
||||
repository = "rest:http://10.69.12.52:8000/backup-nix";
|
||||
passwordFile = "/run/secrets/backup_helper_secret";
|
||||
paths = [
|
||||
"/var/lib/hass"
|
||||
"/var/lib/zigbee2mqtt"
|
||||
"/var/lib/mosquitto"
|
||||
];
|
||||
timerConfig = {
|
||||
OnCalendar = "daily";
|
||||
Persistent = true;
|
||||
RandomizedDelaySec = "2h";
|
||||
};
|
||||
pruneOpts = [
|
||||
"--keep-daily 7"
|
||||
"--keep-weekly 4"
|
||||
"--keep-monthly 6"
|
||||
"--keep-within 1d"
|
||||
];
|
||||
};
|
||||
|
||||
# Open ports in the firewall.
|
||||
|
||||
@@ -11,6 +11,20 @@
|
||||
../../common/vm
|
||||
];
|
||||
|
||||
homelab.dns.cnames = [
|
||||
"nzbget"
|
||||
"radarr"
|
||||
"sonarr"
|
||||
"ha"
|
||||
"z2m"
|
||||
"grafana"
|
||||
"prometheus"
|
||||
"alertmanager"
|
||||
"jelly"
|
||||
"pyroscope"
|
||||
"pushgw"
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot.loader.grub = {
|
||||
@@ -46,6 +60,8 @@
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
|
||||
@@ -1,9 +1,12 @@
|
||||
{ config, ... }:
|
||||
{
|
||||
sops.secrets.wireguard_private_key = {
|
||||
sopsFile = ../../secrets/http-proxy/wireguard.yaml;
|
||||
key = "wg_private_key";
|
||||
vault.secrets.wireguard = {
|
||||
secretPath = "hosts/http-proxy/wireguard";
|
||||
extractKey = "private_key";
|
||||
outputDir = "/run/secrets/wireguard_private_key";
|
||||
services = [ "wireguard-wg0" ];
|
||||
};
|
||||
|
||||
networking.wireguard = {
|
||||
enable = true;
|
||||
useNetworkd = true;
|
||||
@@ -13,7 +16,7 @@
|
||||
ips = [ "10.69.222.3/24" ];
|
||||
mtu = 1384;
|
||||
listenPort = 51820;
|
||||
privateKeyFile = config.sops.secrets.wireguard_private_key.path;
|
||||
privateKeyFile = "/run/secrets/wireguard_private_key";
|
||||
peers = [
|
||||
{
|
||||
name = "docker2.t-juice.club";
|
||||
@@ -26,7 +29,11 @@
|
||||
};
|
||||
};
|
||||
};
|
||||
# monitoring
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "wireguard";
|
||||
port = 9586;
|
||||
}];
|
||||
|
||||
services.prometheus.exporters.wireguard = {
|
||||
enable = true;
|
||||
};
|
||||
|
||||
@@ -1,76 +0,0 @@
|
||||
{
|
||||
pkgs,
|
||||
...
|
||||
}:
|
||||
|
||||
{
|
||||
imports = [
|
||||
./hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot = {
|
||||
loader.systemd-boot = {
|
||||
enable = true;
|
||||
configurationLimit = 5;
|
||||
memtest86.enable = true;
|
||||
};
|
||||
loader.efi.canTouchEfiVariables = true;
|
||||
supportedFilesystems = [ "nfs" ];
|
||||
};
|
||||
|
||||
networking.hostName = "media1";
|
||||
networking.domain = "home.2rjus.net";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = true;
|
||||
networking.nameservers = [
|
||||
"10.69.13.5"
|
||||
"10.69.13.6"
|
||||
];
|
||||
|
||||
systemd.network.enable = true;
|
||||
systemd.network.networks."enp2s0" = {
|
||||
matchConfig.Name = "enp2s0";
|
||||
address = [
|
||||
"10.69.12.82/24"
|
||||
];
|
||||
routes = [
|
||||
{ Gateway = "10.69.12.1"; }
|
||||
];
|
||||
linkConfig.RequiredForOnline = "routable";
|
||||
};
|
||||
time.timeZone = "Europe/Oslo";
|
||||
|
||||
# Graphics
|
||||
hardware.graphics = {
|
||||
enable = true;
|
||||
extraPackages = with pkgs; [
|
||||
libvdpau-va-gl
|
||||
libva-vdpau-driver
|
||||
];
|
||||
};
|
||||
|
||||
nix.settings.experimental-features = [
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
git
|
||||
];
|
||||
|
||||
# Open ports in the firewall.
|
||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
system.stateVersion = "23.11"; # Did you read the comment?
|
||||
}
|
||||
@@ -1,7 +0,0 @@
|
||||
{ ... }:
|
||||
{
|
||||
imports = [
|
||||
./configuration.nix
|
||||
./kodi.nix
|
||||
];
|
||||
}
|
||||
@@ -1,33 +0,0 @@
|
||||
{ config, lib, pkgs, modulesPath, ... }:
|
||||
|
||||
{
|
||||
imports =
|
||||
[
|
||||
(modulesPath + "/installer/scan/not-detected.nix")
|
||||
];
|
||||
|
||||
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "usb_storage" "usbhid" "sd_mod" "rtsx_usb_sdmmc" ];
|
||||
boot.initrd.kernelModules = [ ];
|
||||
boot.kernelModules = [ "kvm-amd" ];
|
||||
boot.extraModulePackages = [ ];
|
||||
|
||||
fileSystems."/" =
|
||||
{
|
||||
device = "/dev/disk/by-uuid/3e7c311c-b1a3-4be7-b8bf-e497cba64302";
|
||||
fsType = "btrfs";
|
||||
};
|
||||
|
||||
fileSystems."/boot" =
|
||||
{
|
||||
device = "/dev/disk/by-uuid/F0D7-E5C1";
|
||||
fsType = "vfat";
|
||||
options = [ "fmask=0022" "dmask=0022" ];
|
||||
};
|
||||
|
||||
swapDevices =
|
||||
[{ device = "/dev/disk/by-uuid/1a06a36f-da61-4d36-b94e-b852836c328a"; }];
|
||||
|
||||
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
||||
hardware.cpu.amd.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
|
||||
}
|
||||
|
||||
@@ -1,29 +0,0 @@
|
||||
{ pkgs, ... }:
|
||||
let
|
||||
kodipkg = pkgs.kodi-wayland.withPackages (
|
||||
p: with p; [
|
||||
jellyfin
|
||||
]
|
||||
);
|
||||
in
|
||||
{
|
||||
users.users.kodi = {
|
||||
isNormalUser = true;
|
||||
description = "Kodi Media Center user";
|
||||
};
|
||||
#services.xserver = {
|
||||
# enable = true;
|
||||
#};
|
||||
services.cage = {
|
||||
enable = true;
|
||||
user = "kodi";
|
||||
environment = {
|
||||
XKB_DEFAULT_LAYOUT = "no";
|
||||
};
|
||||
program = "${kodipkg}/bin/kodi";
|
||||
};
|
||||
|
||||
environment.systemPackages = with pkgs; [
|
||||
firefox
|
||||
];
|
||||
}
|
||||
@@ -56,16 +56,46 @@
|
||||
|
||||
services.qemuGuest.enable = true;
|
||||
|
||||
sops.secrets."backup_helper_secret" = { };
|
||||
backup-helper = {
|
||||
enable = true;
|
||||
password-file = "/run/secrets/backup_helper_secret";
|
||||
backup-dirs = [
|
||||
"/var/lib/grafana/plugins"
|
||||
# Vault secrets management
|
||||
vault.enable = true;
|
||||
vault.secrets.backup-helper = {
|
||||
secretPath = "shared/backup/password";
|
||||
extractKey = "password";
|
||||
outputDir = "/run/secrets/backup_helper_secret";
|
||||
services = [ "restic-backups-grafana" "restic-backups-grafana-db" ];
|
||||
};
|
||||
|
||||
services.restic.backups.grafana = {
|
||||
repository = "rest:http://10.69.12.52:8000/backup-nix";
|
||||
passwordFile = "/run/secrets/backup_helper_secret";
|
||||
paths = [ "/var/lib/grafana/plugins" ];
|
||||
timerConfig = {
|
||||
OnCalendar = "daily";
|
||||
Persistent = true;
|
||||
RandomizedDelaySec = "2h";
|
||||
};
|
||||
pruneOpts = [
|
||||
"--keep-daily 7"
|
||||
"--keep-weekly 4"
|
||||
"--keep-monthly 6"
|
||||
"--keep-within 1d"
|
||||
];
|
||||
backup-commands = [
|
||||
# "grafana.db:${pkgs.sqlite}/bin/sqlite /var/lib/grafana/data/grafana.db .dump"
|
||||
"grafana.db:${pkgs.sqlite}/bin/sqlite3 /var/lib/grafana/data/grafana.db .dump"
|
||||
};
|
||||
|
||||
services.restic.backups.grafana-db = {
|
||||
repository = "rest:http://10.69.12.52:8000/backup-nix";
|
||||
passwordFile = "/run/secrets/backup_helper_secret";
|
||||
command = [ "${pkgs.sqlite}/bin/sqlite3" "/var/lib/grafana/data/grafana.db" ".dump" ];
|
||||
timerConfig = {
|
||||
OnCalendar = "daily";
|
||||
Persistent = true;
|
||||
RandomizedDelaySec = "2h";
|
||||
};
|
||||
pruneOpts = [
|
||||
"--keep-daily 7"
|
||||
"--keep-weekly 4"
|
||||
"--keep-monthly 6"
|
||||
"--keep-within 1d"
|
||||
];
|
||||
};
|
||||
|
||||
|
||||
@@ -11,6 +11,8 @@
|
||||
../../common/vm
|
||||
];
|
||||
|
||||
homelab.dns.cnames = [ "nix-cache" "actions1" ];
|
||||
|
||||
fileSystems."/nix" = {
|
||||
device = "/dev/disk/by-label/nixcache";
|
||||
fsType = "xfs";
|
||||
@@ -50,6 +52,8 @@
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
|
||||
@@ -1,67 +0,0 @@
|
||||
{ config, lib, pkgs, ... }:
|
||||
|
||||
{
|
||||
imports =
|
||||
[
|
||||
../template/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/sda";
|
||||
|
||||
networking.hostName = "nixos-test1";
|
||||
networking.domain = "home.2rjus.net";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = true;
|
||||
networking.nameservers = [
|
||||
"10.69.13.5"
|
||||
"10.69.13.6"
|
||||
];
|
||||
|
||||
systemd.network.enable = true;
|
||||
systemd.network.networks."ens18" = {
|
||||
matchConfig.Name = "ens18";
|
||||
address = [
|
||||
"10.69.13.10/24"
|
||||
];
|
||||
routes = [
|
||||
{ Gateway = "10.69.13.1"; }
|
||||
];
|
||||
linkConfig.RequiredForOnline = "routable";
|
||||
};
|
||||
time.timeZone = "Europe/Oslo";
|
||||
|
||||
nix.settings.experimental-features = [ "nix-command" "flakes" ];
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
git
|
||||
];
|
||||
|
||||
# Open ports in the firewall.
|
||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
# Secrets
|
||||
# Backup helper
|
||||
sops.secrets."backup_helper_secret" = { };
|
||||
backup-helper = {
|
||||
enable = true;
|
||||
password-file = "/run/secrets/backup_helper_secret";
|
||||
backup-dirs = [
|
||||
"/etc/machine-id"
|
||||
"/etc/os-release"
|
||||
];
|
||||
};
|
||||
|
||||
system.stateVersion = "23.11"; # Did you read the comment?
|
||||
}
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
{ ... }: {
|
||||
imports = [
|
||||
./configuration.nix
|
||||
];
|
||||
}
|
||||
@@ -47,6 +47,8 @@
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
|
||||
@@ -47,6 +47,8 @@
|
||||
"nix-command"
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
|
||||
@@ -1,56 +0,0 @@
|
||||
{ config, lib, pkgs, ... }:
|
||||
|
||||
{
|
||||
imports =
|
||||
[
|
||||
../template/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
../../services/ns/master-authorative.nix
|
||||
../../services/ns/resolver.nix
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/sda";
|
||||
|
||||
networking.hostName = "ns3";
|
||||
networking.domain = "home.2rjus.net";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = false;
|
||||
networking.nameservers = [
|
||||
"10.69.13.5"
|
||||
"10.69.13.6"
|
||||
];
|
||||
|
||||
systemd.network.enable = true;
|
||||
systemd.network.networks."ens18" = {
|
||||
matchConfig.Name = "ens18";
|
||||
address = [
|
||||
"10.69.13.7/24"
|
||||
];
|
||||
routes = [
|
||||
{ Gateway = "10.69.13.1"; }
|
||||
];
|
||||
linkConfig.RequiredForOnline = "routable";
|
||||
};
|
||||
time.timeZone = "Europe/Oslo";
|
||||
|
||||
nix.settings.experimental-features = [ "nix-command" "flakes" ];
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
git
|
||||
];
|
||||
|
||||
# Open ports in the firewall.
|
||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
system.stateVersion = "23.11"; # Did you read the comment?
|
||||
}
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
{ ... }: {
|
||||
imports = [
|
||||
./configuration.nix
|
||||
];
|
||||
}
|
||||
@@ -1,36 +0,0 @@
|
||||
{ config, lib, pkgs, modulesPath, ... }:
|
||||
|
||||
{
|
||||
imports =
|
||||
[
|
||||
(modulesPath + "/profiles/qemu-guest.nix")
|
||||
];
|
||||
|
||||
boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "virtio_pci" "virtio_scsi" "sd_mod" "sr_mod" ];
|
||||
boot.initrd.kernelModules = [ ];
|
||||
# boot.kernelModules = [ ];
|
||||
# boot.extraModulePackages = [ ];
|
||||
|
||||
fileSystems."/" =
|
||||
{
|
||||
device = "/dev/disk/by-uuid/6889aba9-61ed-4687-ab10-e5cf4017ac8d";
|
||||
fsType = "xfs";
|
||||
};
|
||||
|
||||
fileSystems."/boot" =
|
||||
{
|
||||
device = "/dev/disk/by-uuid/BC07-3B7A";
|
||||
fsType = "vfat";
|
||||
};
|
||||
|
||||
swapDevices =
|
||||
[{ device = "/dev/disk/by-uuid/64e5757b-6625-4dd2-aa2a-66ca93444d23"; }];
|
||||
|
||||
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
||||
# (the default) this is the recommended approach. When using systemd-networkd it's
|
||||
# still possible to use this option, but it's recommended to use it in conjunction
|
||||
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
||||
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
||||
|
||||
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
||||
}
|
||||
@@ -1,56 +0,0 @@
|
||||
{ config, lib, pkgs, ... }:
|
||||
|
||||
{
|
||||
imports =
|
||||
[
|
||||
../template/hardware-configuration.nix
|
||||
|
||||
../../system
|
||||
../../services/ns/secondary-authorative.nix
|
||||
../../services/ns/resolver.nix
|
||||
];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
# Use the systemd-boot EFI boot loader.
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/sda";
|
||||
|
||||
networking.hostName = "ns4";
|
||||
networking.domain = "home.2rjus.net";
|
||||
networking.useNetworkd = true;
|
||||
networking.useDHCP = false;
|
||||
services.resolved.enable = false;
|
||||
networking.nameservers = [
|
||||
"10.69.13.5"
|
||||
"10.69.13.6"
|
||||
];
|
||||
|
||||
systemd.network.enable = true;
|
||||
systemd.network.networks."ens18" = {
|
||||
matchConfig.Name = "ens18";
|
||||
address = [
|
||||
"10.69.13.8/24"
|
||||
];
|
||||
routes = [
|
||||
{ Gateway = "10.69.13.1"; }
|
||||
];
|
||||
linkConfig.RequiredForOnline = "routable";
|
||||
};
|
||||
time.timeZone = "Europe/Oslo";
|
||||
|
||||
nix.settings.experimental-features = [ "nix-command" "flakes" ];
|
||||
environment.systemPackages = with pkgs; [
|
||||
vim
|
||||
wget
|
||||
git
|
||||
];
|
||||
|
||||
# Open ports in the firewall.
|
||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||
# Or disable the firewall altogether.
|
||||
networking.firewall.enable = false;
|
||||
|
||||
system.stateVersion = "23.11"; # Did you read the comment?
|
||||
}
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
{ ... }: {
|
||||
imports = [
|
||||
./configuration.nix
|
||||
];
|
||||
}
|
||||
@@ -1,36 +0,0 @@
|
||||
{ config, lib, pkgs, modulesPath, ... }:
|
||||
|
||||
{
|
||||
imports =
|
||||
[
|
||||
(modulesPath + "/profiles/qemu-guest.nix")
|
||||
];
|
||||
|
||||
boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "virtio_pci" "virtio_scsi" "sd_mod" "sr_mod" ];
|
||||
boot.initrd.kernelModules = [ ];
|
||||
# boot.kernelModules = [ ];
|
||||
# boot.extraModulePackages = [ ];
|
||||
|
||||
fileSystems."/" =
|
||||
{
|
||||
device = "/dev/disk/by-uuid/6889aba9-61ed-4687-ab10-e5cf4017ac8d";
|
||||
fsType = "xfs";
|
||||
};
|
||||
|
||||
fileSystems."/boot" =
|
||||
{
|
||||
device = "/dev/disk/by-uuid/BC07-3B7A";
|
||||
fsType = "vfat";
|
||||
};
|
||||
|
||||
swapDevices =
|
||||
[{ device = "/dev/disk/by-uuid/64e5757b-6625-4dd2-aa2a-66ca93444d23"; }];
|
||||
|
||||
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
||||
# (the default) this is the recommended approach. When using systemd-networkd it's
|
||||
# still possible to use this option, but it's recommended to use it in conjunction
|
||||
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
||||
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
||||
|
||||
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
||||
}
|
||||
@@ -8,6 +8,9 @@
|
||||
../../system
|
||||
];
|
||||
|
||||
# Template host - exclude from DNS zone generation
|
||||
homelab.dns.enable = false;
|
||||
|
||||
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/sda";
|
||||
|
||||
@@ -1,7 +1,9 @@
|
||||
{ pkgs, ... }:
|
||||
let
|
||||
prepare-host-script = pkgs.writeShellScriptBin "prepare-host.sh"
|
||||
''
|
||||
prepare-host-script = pkgs.writeShellApplication {
|
||||
name = "prepare-host.sh";
|
||||
runtimeInputs = [ pkgs.age ];
|
||||
text = ''
|
||||
echo "Removing machine-id"
|
||||
rm -f /etc/machine-id || true
|
||||
|
||||
@@ -24,8 +26,9 @@ let
|
||||
echo "Generate age key"
|
||||
rm -rf /var/lib/sops-nix || true
|
||||
mkdir -p /var/lib/sops-nix
|
||||
${pkgs.age}/bin/age-keygen -o /var/lib/sops-nix/key.txt
|
||||
age-keygen -o /var/lib/sops-nix/key.txt
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
environment.systemPackages = [ prepare-host-script ];
|
||||
|
||||
@@ -1,7 +1,9 @@
|
||||
{ pkgs, ... }:
|
||||
let
|
||||
prepare-host-script = pkgs.writeShellScriptBin "prepare-host.sh"
|
||||
''
|
||||
prepare-host-script = pkgs.writeShellApplication {
|
||||
name = "prepare-host.sh";
|
||||
runtimeInputs = [ pkgs.age ];
|
||||
text = ''
|
||||
echo "Removing machine-id"
|
||||
rm -f /etc/machine-id || true
|
||||
|
||||
@@ -24,8 +26,9 @@ let
|
||||
echo "Generate age key"
|
||||
rm -rf /var/lib/sops-nix || true
|
||||
mkdir -p /var/lib/sops-nix
|
||||
${pkgs.age}/bin/age-keygen -o /var/lib/sops-nix/key.txt
|
||||
age-keygen -o /var/lib/sops-nix/key.txt
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
environment.systemPackages = [ prepare-host-script ];
|
||||
|
||||
@@ -13,6 +13,9 @@
|
||||
../../common/vm
|
||||
];
|
||||
|
||||
# Test VM - exclude from DNS zone generation
|
||||
homelab.dns.enable = false;
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
|
||||
@@ -14,6 +14,8 @@
|
||||
../../services/vault
|
||||
];
|
||||
|
||||
homelab.dns.cnames = [ "vault" ];
|
||||
|
||||
nixpkgs.config.allowUnfree = true;
|
||||
boot.loader.grub.enable = true;
|
||||
boot.loader.grub.device = "/dev/vda";
|
||||
|
||||
@@ -5,6 +5,32 @@
|
||||
...
|
||||
}:
|
||||
|
||||
let
|
||||
vault-test-script = pkgs.writeShellApplication {
|
||||
name = "vault-test";
|
||||
text = ''
|
||||
echo "=== Vault Secret Test ==="
|
||||
echo "Secret path: hosts/vaulttest01/test-service"
|
||||
|
||||
if [ -f /run/secrets/test-service/password ]; then
|
||||
echo "✓ Password file exists"
|
||||
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
|
||||
else
|
||||
echo "✗ Password file missing!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -d /var/lib/vault/cache/test-service ]; then
|
||||
echo "✓ Cache directory exists"
|
||||
else
|
||||
echo "✗ Cache directory missing!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Test successful!"
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
imports = [
|
||||
../template2/hardware-configuration.nix
|
||||
@@ -79,32 +105,23 @@
|
||||
Type = "oneshot";
|
||||
RemainAfterExit = true;
|
||||
|
||||
ExecStart = pkgs.writeShellScript "vault-test" ''
|
||||
echo "=== Vault Secret Test ==="
|
||||
echo "Secret path: hosts/vaulttest01/test-service"
|
||||
|
||||
if [ -f /run/secrets/test-service/password ]; then
|
||||
echo "✓ Password file exists"
|
||||
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
|
||||
else
|
||||
echo "✗ Password file missing!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -d /var/lib/vault/cache/test-service ]; then
|
||||
echo "✓ Cache directory exists"
|
||||
else
|
||||
echo "✗ Cache directory missing!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Test successful!"
|
||||
'';
|
||||
ExecStart = lib.getExe vault-test-script;
|
||||
|
||||
StandardOutput = "journal+console";
|
||||
};
|
||||
};
|
||||
|
||||
# Test ACME certificate issuance from OpenBao PKI
|
||||
# Override the global ACME server (from system/acme.nix) to use OpenBao instead of step-ca
|
||||
security.acme.defaults.server = lib.mkForce "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory";
|
||||
|
||||
# Request a certificate for this host
|
||||
# Using HTTP-01 challenge with standalone listener on port 80
|
||||
security.acme.certs."vaulttest01.home.2rjus.net" = {
|
||||
listenHTTP = ":80";
|
||||
enableDebugLogs = true;
|
||||
};
|
||||
|
||||
system.stateVersion = "25.11"; # Did you read the comment?
|
||||
}
|
||||
|
||||
|
||||
@@ -6,10 +6,6 @@ import subprocess
|
||||
IGNORED_HOSTS = [
|
||||
"inc1",
|
||||
"inc2",
|
||||
"media1",
|
||||
"nixos-test1",
|
||||
"ns3",
|
||||
"ns4",
|
||||
"template1",
|
||||
]
|
||||
|
||||
|
||||
160
lib/dns-zone.nix
Normal file
160
lib/dns-zone.nix
Normal file
@@ -0,0 +1,160 @@
|
||||
{ lib }:
|
||||
let
|
||||
# Pad string on the right to reach a fixed width
|
||||
rightPad = width: str:
|
||||
let
|
||||
len = builtins.stringLength str;
|
||||
padding = if len >= width then "" else lib.strings.replicate (width - len) " ";
|
||||
in
|
||||
str + padding;
|
||||
|
||||
# Extract IP address from CIDR notation (e.g., "10.69.13.5/24" -> "10.69.13.5")
|
||||
extractIP = address:
|
||||
let
|
||||
parts = lib.splitString "/" address;
|
||||
in
|
||||
builtins.head parts;
|
||||
|
||||
# Check if a network interface name looks like a VPN/tunnel interface
|
||||
isVpnInterface = ifaceName:
|
||||
lib.hasPrefix "wg" ifaceName ||
|
||||
lib.hasPrefix "tun" ifaceName ||
|
||||
lib.hasPrefix "tap" ifaceName ||
|
||||
lib.hasPrefix "vti" ifaceName;
|
||||
|
||||
# Extract DNS information from a single host configuration
|
||||
# Returns null if host should not be included in DNS
|
||||
extractHostDNS = name: hostConfig:
|
||||
let
|
||||
cfg = hostConfig.config;
|
||||
# Handle cases where homelab module might not be imported
|
||||
dnsConfig = (cfg.homelab or { }).dns or { enable = true; cnames = [ ]; };
|
||||
hostname = cfg.networking.hostName;
|
||||
networks = cfg.systemd.network.networks or { };
|
||||
|
||||
# Filter out VPN interfaces and find networks with static addresses
|
||||
# Check matchConfig.Name instead of network unit name (which can have prefixes like "40-")
|
||||
physicalNetworks = lib.filterAttrs
|
||||
(netName: netCfg:
|
||||
let
|
||||
ifaceName = netCfg.matchConfig.Name or "";
|
||||
in
|
||||
!(isVpnInterface ifaceName) && (netCfg.address or [ ]) != [ ])
|
||||
networks;
|
||||
|
||||
# Get addresses from physical networks only
|
||||
networkAddresses = lib.flatten (
|
||||
lib.mapAttrsToList
|
||||
(netName: netCfg: netCfg.address or [ ])
|
||||
physicalNetworks
|
||||
);
|
||||
|
||||
# Get the first address, if any
|
||||
firstAddress = if networkAddresses != [ ] then builtins.head networkAddresses else null;
|
||||
|
||||
# Check if host uses DHCP (no static address)
|
||||
usesDHCP = firstAddress == null ||
|
||||
lib.any
|
||||
(netName: (networks.${netName}.networkConfig.DHCP or "no") != "no")
|
||||
(lib.attrNames networks);
|
||||
in
|
||||
if !(dnsConfig.enable or true) || firstAddress == null then
|
||||
null
|
||||
else
|
||||
{
|
||||
inherit hostname;
|
||||
ip = extractIP firstAddress;
|
||||
cnames = dnsConfig.cnames or [ ];
|
||||
};
|
||||
|
||||
# Generate A record line
|
||||
generateARecord = hostname: ip:
|
||||
"${rightPad 20 hostname}IN A ${ip}";
|
||||
|
||||
# Generate CNAME record line
|
||||
generateCNAME = alias: target:
|
||||
"${rightPad 20 alias}IN CNAME ${target}";
|
||||
|
||||
# Generate zone file from flake configurations and external hosts
|
||||
generateZone =
|
||||
{ self
|
||||
, externalHosts
|
||||
, serial
|
||||
, domain ? "home.2rjus.net"
|
||||
, ttl ? 1800
|
||||
, refresh ? 3600
|
||||
, retry ? 900
|
||||
, expire ? 1209600
|
||||
, minTtl ? 120
|
||||
, nameservers ? [ "ns1" "ns2" ]
|
||||
, adminEmail ? "admin.test.2rjus.net"
|
||||
}:
|
||||
let
|
||||
# Extract DNS info from all flake hosts
|
||||
nixosConfigs = self.nixosConfigurations or { };
|
||||
hostDNSList = lib.filter (x: x != null) (
|
||||
lib.mapAttrsToList extractHostDNS nixosConfigs
|
||||
);
|
||||
|
||||
# Sort hosts by IP for consistent output
|
||||
sortedHosts = lib.sort (a: b: a.ip < b.ip) hostDNSList;
|
||||
|
||||
# Generate A records for flake hosts
|
||||
flakeARecords = lib.concatMapStringsSep "\n" (host:
|
||||
generateARecord host.hostname host.ip
|
||||
) sortedHosts;
|
||||
|
||||
# Generate CNAMEs for flake hosts
|
||||
flakeCNAMEs = lib.concatMapStringsSep "\n" (host:
|
||||
lib.concatMapStringsSep "\n" (cname:
|
||||
generateCNAME cname host.hostname
|
||||
) host.cnames
|
||||
) (lib.filter (h: h.cnames != [ ]) sortedHosts);
|
||||
|
||||
# Generate A records for external hosts
|
||||
externalARecords = lib.concatStringsSep "\n" (
|
||||
lib.mapAttrsToList (name: ip:
|
||||
generateARecord name ip
|
||||
) (externalHosts.aRecords or { })
|
||||
);
|
||||
|
||||
# Generate CNAMEs for external hosts
|
||||
externalCNAMEs = lib.concatStringsSep "\n" (
|
||||
lib.mapAttrsToList (alias: target:
|
||||
generateCNAME alias target
|
||||
) (externalHosts.cnames or { })
|
||||
);
|
||||
|
||||
# NS records
|
||||
nsRecords = lib.concatMapStringsSep "\n" (ns:
|
||||
" IN NS ${ns}.${domain}."
|
||||
) nameservers;
|
||||
|
||||
# SOA record
|
||||
soa = ''
|
||||
$ORIGIN ${domain}.
|
||||
$TTL ${toString ttl}
|
||||
@ IN SOA ns1.${domain}. ${adminEmail}. (
|
||||
${toString serial} ; serial number
|
||||
${toString refresh} ; refresh
|
||||
${toString retry} ; retry
|
||||
${toString expire} ; expire
|
||||
${toString minTtl} ; ttl
|
||||
)'';
|
||||
in
|
||||
lib.concatStringsSep "\n\n" (lib.filter (s: s != "") [
|
||||
soa
|
||||
nsRecords
|
||||
"; Flake-managed hosts (auto-generated)"
|
||||
flakeARecords
|
||||
(if flakeCNAMEs != "" then "; Flake-managed CNAMEs\n${flakeCNAMEs}" else "")
|
||||
"; External hosts (not managed by this flake)"
|
||||
externalARecords
|
||||
(if externalCNAMEs != "" then "; External CNAMEs\n${externalCNAMEs}" else "")
|
||||
""
|
||||
]);
|
||||
|
||||
in
|
||||
{
|
||||
inherit extractIP extractHostDNS generateARecord generateCNAME generateZone;
|
||||
}
|
||||
145
lib/monitoring.nix
Normal file
145
lib/monitoring.nix
Normal file
@@ -0,0 +1,145 @@
|
||||
{ lib }:
|
||||
let
|
||||
# Extract IP address from CIDR notation (e.g., "10.69.13.5/24" -> "10.69.13.5")
|
||||
extractIP = address:
|
||||
let
|
||||
parts = lib.splitString "/" address;
|
||||
in
|
||||
builtins.head parts;
|
||||
|
||||
# Check if a network interface name looks like a VPN/tunnel interface
|
||||
isVpnInterface = ifaceName:
|
||||
lib.hasPrefix "wg" ifaceName ||
|
||||
lib.hasPrefix "tun" ifaceName ||
|
||||
lib.hasPrefix "tap" ifaceName ||
|
||||
lib.hasPrefix "vti" ifaceName;
|
||||
|
||||
# Extract monitoring info from a single host configuration
|
||||
# Returns null if host should not be included
|
||||
extractHostMonitoring = name: hostConfig:
|
||||
let
|
||||
cfg = hostConfig.config;
|
||||
monConfig = (cfg.homelab or { }).monitoring or { enable = true; scrapeTargets = [ ]; };
|
||||
dnsConfig = (cfg.homelab or { }).dns or { enable = true; };
|
||||
hostname = cfg.networking.hostName;
|
||||
networks = cfg.systemd.network.networks or { };
|
||||
|
||||
# Filter out VPN interfaces and find networks with static addresses
|
||||
physicalNetworks = lib.filterAttrs
|
||||
(netName: netCfg:
|
||||
let
|
||||
ifaceName = netCfg.matchConfig.Name or "";
|
||||
in
|
||||
!(isVpnInterface ifaceName) && (netCfg.address or [ ]) != [ ])
|
||||
networks;
|
||||
|
||||
# Get addresses from physical networks only
|
||||
networkAddresses = lib.flatten (
|
||||
lib.mapAttrsToList
|
||||
(netName: netCfg: netCfg.address or [ ])
|
||||
physicalNetworks
|
||||
);
|
||||
|
||||
firstAddress = if networkAddresses != [ ] then builtins.head networkAddresses else null;
|
||||
in
|
||||
if !(monConfig.enable or true) || !(dnsConfig.enable or true) || firstAddress == null then
|
||||
null
|
||||
else
|
||||
{
|
||||
inherit hostname;
|
||||
ip = extractIP firstAddress;
|
||||
scrapeTargets = monConfig.scrapeTargets or [ ];
|
||||
};
|
||||
|
||||
# Generate node-exporter targets from all flake hosts
|
||||
generateNodeExporterTargets = self: externalTargets:
|
||||
let
|
||||
nixosConfigs = self.nixosConfigurations or { };
|
||||
hostList = lib.filter (x: x != null) (
|
||||
lib.mapAttrsToList extractHostMonitoring nixosConfigs
|
||||
);
|
||||
flakeTargets = map (host: "${host.hostname}.home.2rjus.net:9100") hostList;
|
||||
in
|
||||
flakeTargets ++ (externalTargets.nodeExporter or [ ]);
|
||||
|
||||
# Generate scrape configs from all flake hosts and external targets
|
||||
generateScrapeConfigs = self: externalTargets:
|
||||
let
|
||||
nixosConfigs = self.nixosConfigurations or { };
|
||||
hostList = lib.filter (x: x != null) (
|
||||
lib.mapAttrsToList extractHostMonitoring nixosConfigs
|
||||
);
|
||||
|
||||
# Collect all scrapeTargets from all hosts, grouped by job_name
|
||||
allTargets = lib.flatten (map
|
||||
(host:
|
||||
map
|
||||
(target: {
|
||||
inherit (target) job_name port metrics_path scheme scrape_interval honor_labels;
|
||||
hostname = host.hostname;
|
||||
})
|
||||
host.scrapeTargets
|
||||
)
|
||||
hostList
|
||||
);
|
||||
|
||||
# Group targets by job_name
|
||||
grouped = lib.groupBy (t: t.job_name) allTargets;
|
||||
|
||||
# Generate a scrape config for each job
|
||||
flakeScrapeConfigs = lib.mapAttrsToList
|
||||
(jobName: targets:
|
||||
let
|
||||
first = builtins.head targets;
|
||||
targetAddrs = map
|
||||
(t:
|
||||
let
|
||||
portStr = toString t.port;
|
||||
in
|
||||
"${t.hostname}.home.2rjus.net:${portStr}")
|
||||
targets;
|
||||
config = {
|
||||
job_name = jobName;
|
||||
static_configs = [{
|
||||
targets = targetAddrs;
|
||||
}];
|
||||
}
|
||||
// (lib.optionalAttrs (first.metrics_path != "/metrics") {
|
||||
metrics_path = first.metrics_path;
|
||||
})
|
||||
// (lib.optionalAttrs (first.scheme != "http") {
|
||||
scheme = first.scheme;
|
||||
})
|
||||
// (lib.optionalAttrs (first.scrape_interval != null) {
|
||||
scrape_interval = first.scrape_interval;
|
||||
})
|
||||
// (lib.optionalAttrs first.honor_labels {
|
||||
honor_labels = true;
|
||||
});
|
||||
in
|
||||
config
|
||||
)
|
||||
grouped;
|
||||
|
||||
# External scrape configs
|
||||
externalScrapeConfigs = map
|
||||
(ext: {
|
||||
job_name = ext.job_name;
|
||||
static_configs = [{
|
||||
targets = ext.targets;
|
||||
}];
|
||||
} // (lib.optionalAttrs (ext ? metrics_path) {
|
||||
metrics_path = ext.metrics_path;
|
||||
}) // (lib.optionalAttrs (ext ? scheme) {
|
||||
scheme = ext.scheme;
|
||||
}) // (lib.optionalAttrs (ext ? scrape_interval) {
|
||||
scrape_interval = ext.scrape_interval;
|
||||
}))
|
||||
(externalTargets.scrapeConfigs or [ ]);
|
||||
in
|
||||
flakeScrapeConfigs ++ externalScrapeConfigs;
|
||||
|
||||
in
|
||||
{
|
||||
inherit extractHostMonitoring generateNodeExporterTargets generateScrapeConfigs;
|
||||
}
|
||||
7
modules/homelab/default.nix
Normal file
7
modules/homelab/default.nix
Normal file
@@ -0,0 +1,7 @@
|
||||
{ ... }:
|
||||
{
|
||||
imports = [
|
||||
./dns.nix
|
||||
./monitoring.nix
|
||||
];
|
||||
}
|
||||
20
modules/homelab/dns.nix
Normal file
20
modules/homelab/dns.nix
Normal file
@@ -0,0 +1,20 @@
|
||||
{ config, lib, ... }:
|
||||
let
|
||||
cfg = config.homelab.dns;
|
||||
in
|
||||
{
|
||||
options.homelab.dns = {
|
||||
enable = lib.mkOption {
|
||||
type = lib.types.bool;
|
||||
default = true;
|
||||
description = "Include this host in DNS zone generation";
|
||||
};
|
||||
|
||||
cnames = lib.mkOption {
|
||||
type = lib.types.listOf lib.types.str;
|
||||
default = [ ];
|
||||
description = "CNAME records pointing to this host";
|
||||
example = [ "web" "api" ];
|
||||
};
|
||||
};
|
||||
}
|
||||
50
modules/homelab/monitoring.nix
Normal file
50
modules/homelab/monitoring.nix
Normal file
@@ -0,0 +1,50 @@
|
||||
{ config, lib, ... }:
|
||||
let
|
||||
cfg = config.homelab.monitoring;
|
||||
in
|
||||
{
|
||||
options.homelab.monitoring = {
|
||||
enable = lib.mkOption {
|
||||
type = lib.types.bool;
|
||||
default = true;
|
||||
description = "Include this host in Prometheus node-exporter scrape targets";
|
||||
};
|
||||
|
||||
scrapeTargets = lib.mkOption {
|
||||
type = lib.types.listOf (lib.types.submodule {
|
||||
options = {
|
||||
job_name = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
description = "Prometheus scrape job name";
|
||||
};
|
||||
port = lib.mkOption {
|
||||
type = lib.types.port;
|
||||
description = "Port to scrape metrics from";
|
||||
};
|
||||
metrics_path = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
default = "/metrics";
|
||||
description = "HTTP path to scrape metrics from";
|
||||
};
|
||||
scheme = lib.mkOption {
|
||||
type = lib.types.str;
|
||||
default = "http";
|
||||
description = "HTTP scheme (http or https)";
|
||||
};
|
||||
scrape_interval = lib.mkOption {
|
||||
type = lib.types.nullOr lib.types.str;
|
||||
default = null;
|
||||
description = "Override the global scrape interval for this target";
|
||||
};
|
||||
honor_labels = lib.mkOption {
|
||||
type = lib.types.bool;
|
||||
default = false;
|
||||
description = "Whether to honor labels from the scraped target";
|
||||
};
|
||||
};
|
||||
});
|
||||
default = [ ];
|
||||
description = "Additional Prometheus scrape targets exposed by this host";
|
||||
};
|
||||
};
|
||||
}
|
||||
78
playbooks/provision-approle.yml
Normal file
78
playbooks/provision-approle.yml
Normal file
@@ -0,0 +1,78 @@
|
||||
---
|
||||
# Provision OpenBao AppRole credentials to an existing host
|
||||
# Usage: nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=ha1
|
||||
# Requires: BAO_ADDR and BAO_TOKEN environment variables set
|
||||
|
||||
- name: Fetch AppRole credentials from OpenBao
|
||||
hosts: localhost
|
||||
connection: local
|
||||
gather_facts: false
|
||||
|
||||
vars:
|
||||
vault_addr: "{{ lookup('env', 'BAO_ADDR') | default('https://vault01.home.2rjus.net:8200', true) }}"
|
||||
domain: "home.2rjus.net"
|
||||
|
||||
tasks:
|
||||
- name: Validate hostname is provided
|
||||
ansible.builtin.fail:
|
||||
msg: "hostname variable is required. Use: -e hostname=<name>"
|
||||
when: hostname is not defined
|
||||
|
||||
- name: Get role-id for host
|
||||
ansible.builtin.command:
|
||||
cmd: "bao read -field=role_id auth/approle/role/{{ hostname }}/role-id"
|
||||
environment:
|
||||
BAO_ADDR: "{{ vault_addr }}"
|
||||
BAO_SKIP_VERIFY: "1"
|
||||
register: role_id_result
|
||||
changed_when: false
|
||||
|
||||
- name: Generate secret-id for host
|
||||
ansible.builtin.command:
|
||||
cmd: "bao write -field=secret_id -f auth/approle/role/{{ hostname }}/secret-id"
|
||||
environment:
|
||||
BAO_ADDR: "{{ vault_addr }}"
|
||||
BAO_SKIP_VERIFY: "1"
|
||||
register: secret_id_result
|
||||
changed_when: true
|
||||
|
||||
- name: Add target host to inventory
|
||||
ansible.builtin.add_host:
|
||||
name: "{{ hostname }}.{{ domain }}"
|
||||
groups: vault_target
|
||||
ansible_user: root
|
||||
vault_role_id: "{{ role_id_result.stdout }}"
|
||||
vault_secret_id: "{{ secret_id_result.stdout }}"
|
||||
|
||||
- name: Deploy AppRole credentials to host
|
||||
hosts: vault_target
|
||||
gather_facts: false
|
||||
|
||||
tasks:
|
||||
- name: Create AppRole directory
|
||||
ansible.builtin.file:
|
||||
path: /var/lib/vault/approle
|
||||
state: directory
|
||||
mode: "0700"
|
||||
owner: root
|
||||
group: root
|
||||
|
||||
- name: Write role-id
|
||||
ansible.builtin.copy:
|
||||
content: "{{ vault_role_id }}"
|
||||
dest: /var/lib/vault/approle/role-id
|
||||
mode: "0600"
|
||||
owner: root
|
||||
group: root
|
||||
|
||||
- name: Write secret-id
|
||||
ansible.builtin.copy:
|
||||
content: "{{ vault_secret_id }}"
|
||||
dest: /var/lib/vault/approle/secret-id
|
||||
mode: "0600"
|
||||
owner: root
|
||||
group: root
|
||||
|
||||
- name: Display success
|
||||
ansible.builtin.debug:
|
||||
msg: "AppRole credentials provisioned to {{ inventory_hostname }}"
|
||||
@@ -1,5 +1,6 @@
|
||||
"""CLI tool for generating NixOS host configurations."""
|
||||
|
||||
import shutil
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
@@ -10,7 +11,15 @@ from rich.panel import Panel
|
||||
from rich.table import Table
|
||||
|
||||
from generators import generate_host_files, generate_vault_terraform
|
||||
from manipulators import update_flake_nix, update_terraform_vms, add_wrapped_token_to_vm
|
||||
from manipulators import (
|
||||
update_flake_nix,
|
||||
update_terraform_vms,
|
||||
add_wrapped_token_to_vm,
|
||||
remove_from_flake_nix,
|
||||
remove_from_terraform_vms,
|
||||
remove_from_vault_terraform,
|
||||
check_entries_exist,
|
||||
)
|
||||
from models import HostConfig
|
||||
from vault_helper import generate_wrapped_token
|
||||
from validators import (
|
||||
@@ -46,9 +55,10 @@ def main(
|
||||
memory: int = typer.Option(2048, "--memory", help="Memory in MB"),
|
||||
disk: str = typer.Option("20G", "--disk", help="Disk size (e.g., 20G, 50G, 100G)"),
|
||||
dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes without creating files"),
|
||||
force: bool = typer.Option(False, "--force", help="Overwrite existing host configuration"),
|
||||
force: bool = typer.Option(False, "--force", help="Overwrite existing host configuration / skip confirmation for removal"),
|
||||
skip_vault: bool = typer.Option(False, "--skip-vault", help="Skip Vault configuration and token generation"),
|
||||
regenerate_token: bool = typer.Option(False, "--regenerate-token", help="Only regenerate Vault wrapped token (no other changes)"),
|
||||
remove: bool = typer.Option(False, "--remove", help="Remove host configuration and terraform entries"),
|
||||
) -> None:
|
||||
"""
|
||||
Create a new NixOS host configuration.
|
||||
@@ -64,6 +74,11 @@ def main(
|
||||
# Get repository root
|
||||
repo_root = get_repo_root()
|
||||
|
||||
# Handle removal mode
|
||||
if remove:
|
||||
handle_remove(hostname, repo_root, dry_run, force, ip, cpu, memory, disk, skip_vault, regenerate_token)
|
||||
return
|
||||
|
||||
# Handle token regeneration mode
|
||||
if regenerate_token:
|
||||
# Validate that incompatible options aren't used
|
||||
@@ -198,6 +213,166 @@ def main(
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def handle_remove(
|
||||
hostname: str,
|
||||
repo_root: Path,
|
||||
dry_run: bool,
|
||||
force: bool,
|
||||
ip: Optional[str],
|
||||
cpu: int,
|
||||
memory: int,
|
||||
disk: str,
|
||||
skip_vault: bool,
|
||||
regenerate_token: bool,
|
||||
) -> None:
|
||||
"""Handle the --remove workflow."""
|
||||
# Validate --remove isn't used with create options
|
||||
incompatible_options = []
|
||||
if ip:
|
||||
incompatible_options.append("--ip")
|
||||
if cpu != 2:
|
||||
incompatible_options.append("--cpu")
|
||||
if memory != 2048:
|
||||
incompatible_options.append("--memory")
|
||||
if disk != "20G":
|
||||
incompatible_options.append("--disk")
|
||||
if skip_vault:
|
||||
incompatible_options.append("--skip-vault")
|
||||
if regenerate_token:
|
||||
incompatible_options.append("--regenerate-token")
|
||||
|
||||
if incompatible_options:
|
||||
console.print(
|
||||
f"[bold red]Error:[/bold red] --remove cannot be used with: {', '.join(incompatible_options)}\n"
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
# Validate hostname exists (host directory must exist)
|
||||
host_dir = repo_root / "hosts" / hostname
|
||||
if not host_dir.exists():
|
||||
console.print(f"[bold red]Error:[/bold red] Host {hostname} does not exist")
|
||||
console.print(f"Host directory not found: {host_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
# Check what entries exist
|
||||
flake_exists, terraform_exists, vault_exists = check_entries_exist(hostname, repo_root)
|
||||
|
||||
# Collect all files in the host directory recursively
|
||||
files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()])
|
||||
|
||||
# Check for secrets directory
|
||||
secrets_dir = repo_root / "secrets" / hostname
|
||||
secrets_exist = secrets_dir.exists()
|
||||
|
||||
# Display summary
|
||||
if dry_run:
|
||||
console.print("\n[yellow][DRY RUN - No changes will be made][/yellow]\n")
|
||||
|
||||
console.print(f"\n[bold blue]Removing host: {hostname}[/bold blue]\n")
|
||||
|
||||
# Show host directory contents
|
||||
console.print("[bold]Directory to be deleted (and all contents):[/bold]")
|
||||
console.print(f" • hosts/{hostname}/")
|
||||
for f in files_in_host_dir:
|
||||
rel_path = f.relative_to(host_dir)
|
||||
console.print(f" - {rel_path}")
|
||||
|
||||
# Show entries to be removed
|
||||
console.print("\n[bold]Entries to be removed:[/bold]")
|
||||
if flake_exists:
|
||||
console.print(f" • flake.nix (nixosConfigurations.{hostname})")
|
||||
else:
|
||||
console.print(f" • flake.nix [dim](not found)[/dim]")
|
||||
|
||||
if terraform_exists:
|
||||
console.print(f' • terraform/vms.tf (locals.vms["{hostname}"])')
|
||||
else:
|
||||
console.print(f" • terraform/vms.tf [dim](not found)[/dim]")
|
||||
|
||||
if vault_exists:
|
||||
console.print(f' • terraform/vault/hosts-generated.tf (generated_host_policies["{hostname}"])')
|
||||
else:
|
||||
console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]")
|
||||
|
||||
# Warn about secrets directory
|
||||
if secrets_exist:
|
||||
console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists and will NOT be deleted[/yellow]")
|
||||
console.print(f" Manually remove if no longer needed: [white]rm -rf secrets/{hostname}/[/white]")
|
||||
console.print(f" Also update .sops.yaml to remove the host's age key")
|
||||
|
||||
# Exit if dry run
|
||||
if dry_run:
|
||||
console.print("\n[yellow][DRY RUN - No changes made][/yellow]\n")
|
||||
return
|
||||
|
||||
# Prompt for confirmation unless --force
|
||||
if not force:
|
||||
console.print("")
|
||||
confirm = typer.confirm("Proceed with removal?", default=False)
|
||||
if not confirm:
|
||||
console.print("\n[yellow]Removal cancelled[/yellow]\n")
|
||||
sys.exit(0)
|
||||
|
||||
# Perform removal
|
||||
console.print("\n[bold blue]Removing host configuration...[/bold blue]")
|
||||
|
||||
# Remove from terraform/vault/hosts-generated.tf
|
||||
if vault_exists:
|
||||
if remove_from_vault_terraform(hostname, repo_root):
|
||||
console.print("[green]✓[/green] Removed from terraform/vault/hosts-generated.tf")
|
||||
else:
|
||||
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf")
|
||||
|
||||
# Remove from terraform/vms.tf
|
||||
if terraform_exists:
|
||||
if remove_from_terraform_vms(hostname, repo_root):
|
||||
console.print("[green]✓[/green] Removed from terraform/vms.tf")
|
||||
else:
|
||||
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vms.tf")
|
||||
|
||||
# Remove from flake.nix
|
||||
if flake_exists:
|
||||
if remove_from_flake_nix(hostname, repo_root):
|
||||
console.print("[green]✓[/green] Removed from flake.nix")
|
||||
else:
|
||||
console.print("[yellow]⚠[/yellow] Could not remove from flake.nix")
|
||||
|
||||
# Delete hosts/<hostname>/ directory
|
||||
shutil.rmtree(host_dir)
|
||||
console.print(f"[green]✓[/green] Deleted hosts/{hostname}/")
|
||||
|
||||
# Success message
|
||||
console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n")
|
||||
|
||||
# Display next steps
|
||||
display_removal_next_steps(hostname, vault_exists)
|
||||
|
||||
|
||||
def display_removal_next_steps(hostname: str, had_vault: bool) -> None:
|
||||
"""Display next steps after successful removal."""
|
||||
vault_file = " terraform/vault/hosts-generated.tf" if had_vault else ""
|
||||
vault_apply = ""
|
||||
if had_vault:
|
||||
vault_apply = f"""
|
||||
3. Apply Vault changes:
|
||||
[white]cd terraform/vault && tofu apply[/white]
|
||||
"""
|
||||
|
||||
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
|
||||
|
||||
1. Review changes:
|
||||
[white]git diff[/white]
|
||||
|
||||
2. If VM exists in Proxmox, destroy it first:
|
||||
[white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white]
|
||||
{vault_apply}
|
||||
4. Commit changes:
|
||||
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_file}
|
||||
git commit -m "hosts: remove {hostname}"[/white]
|
||||
"""
|
||||
console.print(Panel(next_steps, border_style="cyan"))
|
||||
|
||||
|
||||
def display_config_summary(config: HostConfig) -> None:
|
||||
"""Display configuration summary table."""
|
||||
table = Table(title="Host Configuration", show_header=False)
|
||||
|
||||
@@ -2,10 +2,138 @@
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Tuple
|
||||
|
||||
from models import HostConfig
|
||||
|
||||
|
||||
def remove_from_flake_nix(hostname: str, repo_root: Path) -> bool:
|
||||
"""
|
||||
Remove host entry from flake.nix nixosConfigurations.
|
||||
|
||||
Args:
|
||||
hostname: Hostname to remove
|
||||
repo_root: Path to repository root
|
||||
|
||||
Returns:
|
||||
True if found and removed, False if not found
|
||||
"""
|
||||
flake_path = repo_root / "flake.nix"
|
||||
content = flake_path.read_text()
|
||||
|
||||
# Check if hostname exists
|
||||
hostname_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
|
||||
if not re.search(hostname_pattern, content, re.MULTILINE):
|
||||
return False
|
||||
|
||||
# Match the entire block from "hostname = " to "};"
|
||||
replace_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
|
||||
new_content, count = re.subn(replace_pattern, "", content, flags=re.MULTILINE | re.DOTALL)
|
||||
|
||||
if count == 0:
|
||||
return False
|
||||
|
||||
flake_path.write_text(new_content)
|
||||
return True
|
||||
|
||||
|
||||
def remove_from_terraform_vms(hostname: str, repo_root: Path) -> bool:
|
||||
"""
|
||||
Remove VM entry from terraform/vms.tf locals.vms map.
|
||||
|
||||
Args:
|
||||
hostname: Hostname to remove
|
||||
repo_root: Path to repository root
|
||||
|
||||
Returns:
|
||||
True if found and removed, False if not found
|
||||
"""
|
||||
terraform_path = repo_root / "terraform" / "vms.tf"
|
||||
content = terraform_path.read_text()
|
||||
|
||||
# Check if hostname exists
|
||||
hostname_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
|
||||
if not re.search(hostname_pattern, content, re.MULTILINE):
|
||||
return False
|
||||
|
||||
# Match the entire block from "hostname" = { to }
|
||||
replace_pattern = rf'^\s+"{re.escape(hostname)}" = \{{.*?^\s+\}}\n'
|
||||
new_content, count = re.subn(replace_pattern, "", content, flags=re.MULTILINE | re.DOTALL)
|
||||
|
||||
if count == 0:
|
||||
return False
|
||||
|
||||
terraform_path.write_text(new_content)
|
||||
return True
|
||||
|
||||
|
||||
def remove_from_vault_terraform(hostname: str, repo_root: Path) -> bool:
|
||||
"""
|
||||
Remove host policy from terraform/vault/hosts-generated.tf.
|
||||
|
||||
Args:
|
||||
hostname: Hostname to remove
|
||||
repo_root: Path to repository root
|
||||
|
||||
Returns:
|
||||
True if found and removed, False if not found
|
||||
"""
|
||||
vault_tf_path = repo_root / "terraform" / "vault" / "hosts-generated.tf"
|
||||
|
||||
if not vault_tf_path.exists():
|
||||
return False
|
||||
|
||||
content = vault_tf_path.read_text()
|
||||
|
||||
# Check if hostname exists in the policies
|
||||
if f'"{hostname}"' not in content:
|
||||
return False
|
||||
|
||||
# Match the host entry block within generated_host_policies
|
||||
# Pattern matches: "hostname" = { ... } with possible trailing newlines
|
||||
replace_pattern = rf'\s*"{re.escape(hostname)}" = \{{\s*paths = \[.*?\]\s*\}}\n?'
|
||||
new_content, count = re.subn(replace_pattern, "", content, flags=re.DOTALL)
|
||||
|
||||
if count == 0:
|
||||
return False
|
||||
|
||||
vault_tf_path.write_text(new_content)
|
||||
return True
|
||||
|
||||
|
||||
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool]:
|
||||
"""
|
||||
Check which entries exist for a hostname.
|
||||
|
||||
Args:
|
||||
hostname: Hostname to check
|
||||
repo_root: Path to repository root
|
||||
|
||||
Returns:
|
||||
Tuple of (flake_exists, terraform_vms_exists, vault_exists)
|
||||
"""
|
||||
# Check flake.nix
|
||||
flake_path = repo_root / "flake.nix"
|
||||
flake_content = flake_path.read_text()
|
||||
flake_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
|
||||
flake_exists = bool(re.search(flake_pattern, flake_content, re.MULTILINE))
|
||||
|
||||
# Check terraform/vms.tf
|
||||
terraform_path = repo_root / "terraform" / "vms.tf"
|
||||
terraform_content = terraform_path.read_text()
|
||||
terraform_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
|
||||
terraform_exists = bool(re.search(terraform_pattern, terraform_content, re.MULTILINE))
|
||||
|
||||
# Check terraform/vault/hosts-generated.tf
|
||||
vault_tf_path = repo_root / "terraform" / "vault" / "hosts-generated.tf"
|
||||
vault_exists = False
|
||||
if vault_tf_path.exists():
|
||||
vault_content = vault_tf_path.read_text()
|
||||
vault_exists = f'"{hostname}"' in vault_content
|
||||
|
||||
return (flake_exists, terraform_exists, vault_exists)
|
||||
|
||||
|
||||
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
|
||||
"""
|
||||
Add or update host entry in flake.nix nixosConfigurations.
|
||||
|
||||
@@ -137,9 +137,9 @@ fetch_from_vault() {
|
||||
|
||||
# Write each secret key to a separate file
|
||||
log "Writing secrets to $OUTPUT_DIR"
|
||||
echo "$SECRET_DATA" | jq -r 'to_entries[] | "\(.key)\n\(.value)"' | while read -r key; read -r value; do
|
||||
echo -n "$value" > "$OUTPUT_DIR/$key"
|
||||
echo -n "$value" > "$CACHE_DIR/$key"
|
||||
for key in $(echo "$SECRET_DATA" | jq -r 'keys[]'); do
|
||||
echo "$SECRET_DATA" | jq -j --arg k "$key" '.[$k]' > "$OUTPUT_DIR/$key"
|
||||
echo "$SECRET_DATA" | jq -j --arg k "$key" '.[$k]' > "$CACHE_DIR/$key"
|
||||
chmod 600 "$OUTPUT_DIR/$key"
|
||||
chmod 600 "$CACHE_DIR/$key"
|
||||
log " - Wrote secret key: $key"
|
||||
|
||||
@@ -1,29 +0,0 @@
|
||||
authelia_ldap_password: ENC[AES256_GCM,data:x2UDMpqQKoRVSlDSmK5XiC9x4/WWzmjk7cwtFA70waAD7xYQfXEOV+AeX1LlFfj0qHYrhyn//TLsa+tJzb7HPEAfl8vYR4MdkVFOm5vjPWWoF5Ul8ZVn8+B1VJLbiXkexv0/hfXL8NMzEcp/pF4H0Yei7xaKezu9OPtGzKufHws=,iv:88RXaOj8Zy9fGeDLAE0ItY7TKCCzxn6F0+kU5+Zy/XU=,tag:yPdCJ9d139iO6J97thVVgA==,type:str]
|
||||
authelia_jwt_secret: ENC[AES256_GCM,data:9ZHkT2o5KZLmml95g8HZce8fNBmaWtRn+175Gaz0KhsndNl3zdgGq3hydRuoZuEgLVsherJImVmb5DQAZpv04lUEsDKCYeFNwAyYl4Go2jCp1fI53fdcRCKlNVZA37pMi4AYaCoe8vIl/cwPOOBDEwK5raOBnklCzVERoO0B8a0=,iv:9CTWCw0ImZR0OSrl2znbhpRHlzAxA5Cpcy98JeH9Z+Y=,tag:L+0xKqiwXTi7XiDYWA1Bcw==,type:str]
|
||||
authelia_storage_encryption_key_file: ENC[AES256_GCM,data:RfbcQK8+rrW/Krd2rbDfgo7YI2YvQKqpLuDtk5DZJNNhw4giBh5nFp/8LNeo8r39/oiJLYTe6FjTLBu72TZz2wWrJFsBqjwQ/3TfATQGdLUsaXXRDr88ezHLTiYvEHIHJhUS5qsr7VMwBam5e7YGWBe5sGZCE/nX41ijyPUjtOY=,iv:sayYcAC38cApAtL+cDhgGNjWaHn+furKRowKL6AmfdU=,tag:1IZpnlpvDWGLLpZyU9iJUw==,type:str]
|
||||
authelia_session_secret: ENC[AES256_GCM,data:4PaLv4RRA7/9Z8QzETXLwo3OctJ0mvzQkYmHsGGF97nq9QeB3eo0xj4FyuCbkJGGZ/huAyRgmFBTyscY3wgxoc4t+8BdlYcSbefEk1/xRFjmG8ooXLKhvGJ5c6t72KJRcqsEGTiC0l9CFJWQ2qYcjM4dPwG8z0tjUZ6j25Zfx4M=,iv:QORJkf0w6iyuRHM/xuql1s7K75Qa49ygq+lwHfrm9rk=,tag:/HZ/qI80fKjmuTRwIwmX8g==,type:str]
|
||||
lldap_user_pass: ENC[AES256_GCM,data:56gF7uqVQ+/J5/lY/N904Q==,iv:qtY1XhHs4WWA4kPY56NigPvX4OslO0koZepgdv947zg=,tag:UDmJs8FPXskp7rUS2Sxinw==,type:str]
|
||||
sops:
|
||||
age:
|
||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBlc1dxK3FKU2ZGWTNGUmxZ
|
||||
aWx1NngySjVHclJTd3hXejJRTmVHRExReHcwCk55c0xMbGcyTktySkJZdHRZbzhK
|
||||
bEI3RzBHQkROTU1qWXBoU1RqTXppdVkKLS0tIHkwZ0QyNTMydWRqUlBtTEdhZ05r
|
||||
YVpuT1JadnlyN1hqNnJxYzVPT3pXN1UKDCeIv0xv+5pcoDdtYc+rYjwi8SLrqWth
|
||||
vdWepxmV2edajZRqcwFEC9weOZ1j2lh7Z3hR6RSN/+X3sFpqkpw+Yg==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age16prza00sqzuhwwcyakj6z4hvwkruwkqpmmrsn94a5ucgpkelncdq2ldctk
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSAvbU0wNmFLelRmNmJTRlho
|
||||
dTEwVXZqUVI5NHZkb1QyNUZ4R0pLVFZWVDM4CkhVc00zY2FKaVdNRXdGVk1ranpG
|
||||
MlRWWGJmd2FWeFE1dXU4WHVFL0FHZ3MKLS0tIGt2ZWlaOW5wNkJnQVkrTDZWTnY0
|
||||
RW5HRjA3cERCUU1CVWZhck12SGhTRUkK6k/zQ87TIETYouRBby7ujtwgpqIPKKv+
|
||||
2aLJW6lSWMVzL/f3ZrIeg12tJjHs3f44EXR6j3tfLfSKog2iL8Y57w==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
lastmodified: "2025-12-06T10:03:56Z"
|
||||
mac: ENC[AES256_GCM,data:SRNqx5n+xg/cNGiyze3CGKufox3IuXmOKLqNRDeJhBNMBHC1iYYCjRdHEVXsl7XSiYe51dSwjV0KrJa/SG1pRVkuyT+xyPrTjT2/DyXN7A/CESSAkBIwI7lkZmIf8DkxB3CELF1PgjIr1o2isxlBnkAnhEBTxQ7t8AzpcH7I5yU=,iv:P3FGQurZrL0ed5UuBPRFk11T0VRFtL6xI4iQ4LmYTec=,tag:8gQL08ojjIMyCl5E0Qs/Ww==,type:str]
|
||||
unencrypted_suffix: _unencrypted
|
||||
version: 3.11.0
|
||||
@@ -7,146 +7,101 @@ sops:
|
||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBnbC90WWJiRXRPZ1VUVWhO
|
||||
azc5R2lGeDhoRmQydXBnYlltbE81ajFQNW0wClRJNC9iaFV0NDRKRkw2Mm1vOHpN
|
||||
dVhnUm1nbElQRGQ4dmkxQ2FWdEdpdDAKLS0tIG9GNEpuZUFUQkVXbjZPREo0aEh4
|
||||
ZVMyY0Y0Zldvd244eSt2RVZDeUZKWmcKGQ7jq50qiXPLKCHq751Y2SA79vEjbSbt
|
||||
yhRiakVEjwf9A+/iSNvXYAr/tnKaYC+NTA7F6AKmYpBcrzlBGU68KA==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBuWXhzQWFmeCt1R05jREcz
|
||||
Ui9HZFN5dkxHNVE0RVJGZUJUa3hKK2sxdkhBCktYcGpLeGZIQzZIV3ZZWGs3YzF1
|
||||
T09sUEhPWkRkOWZFWkltQXBlM1lQV1UKLS0tIERRSlRUYW5QeW9TVjJFSmorOWNI
|
||||
ZytmaEhzMjVhRXI1S0hielF0NlBrMmcK4I1PtSf7tSvSIJxWBjTnfBCO8GEFHbuZ
|
||||
BkZskr5fRnWUIs72ZOGoTAVSO5ZNiBglOZ8YChl4Vz1U7bvdOCt0bw==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1hz2lz4k050ru3shrk5j3zk3f8azxmrp54pktw5a7nzjml4saudesx6jsl0
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBRTWFBRVRKeXR0UUloQ3FK
|
||||
Rmhsak45aFZBVUp4Szk5eHJhZmswV3JUcHh3Cis0N09JaCtOZE1pQUM5blg4WDY5
|
||||
Q0ZGajJSZnJVQzdJK0dxZjJNWHZkbGsKLS0tIEVtRVJROTlWdWl0cFlNZmZkajM5
|
||||
N3FpdU56WlFWaC9QYU5Kc1o2a1VkT0UK2Utr9mvK8If4JhjzD+l06xZxdE3nbvCO
|
||||
NixMiYDhuQ/a55Fu0653jqd35i3CI3HukzEI9G5zLEeCcXxTKR5Bjg==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBQcXM0RHlGcmZrYW4yNGZs
|
||||
S1ZqQzVaYmQ4MGhGaTFMUVIwOTk5K0tZZjB3ClN0QkhVeHRrNXZHdmZWMzFBRnJ6
|
||||
WTFtaWZyRmx2TitkOXkrVkFiYVd3RncKLS0tIExpeGUvY1VpODNDL2NCaUhtZkp0
|
||||
cGNVZTI3UGxlNWdFWVZMd3FlS3pDR3cKBulaMeonV++pArXOg3ilgKnW/51IyT6Z
|
||||
vH9HOJUix+ryEwDIcjv4aWx9pYDHthPFZUDC25kLYG91WrJFQOo2oA==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1w2q4gm2lrcgdzscq8du3ssyvk6qtzm4fcszc92z9ftclq23yyydqdga5um
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBFQVk0aUw0aStuOWhFMk5a
|
||||
UVJ5YWg2WjU2eVFUWDlobEIrRDlZV3dxelc0Clo0N3lvOUZNL3YrM2l3Y21VaUQz
|
||||
MTV5djdPWTBIUXFXVDZpZitRTVhMbVEKLS0tIFluV1NFTzd0cFFaR0RwVkhlSmNm
|
||||
VGdZNDlsUGI3cTQ1Tk9XRWtDSE1wNWMKQI226dcROyp/GprVZKtM0R57m5WbJyuR
|
||||
UZO74NqiDr7nxKfw+tHCfDLh94rbC1iP4jRiaQjDgfDDxviafSbGBA==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1snmhmpavqy7xddmw4nuny0u4xusqmnqxqarjmghkm5zaluff84eq5xatrd
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA4WVBzazE3VkNDWXUwMk5x
|
||||
NnZtL3N3THVBQytxZzdZNUhCeThURFBLdjBVClBpZjd5L3lKYjRZNVF2Z3hibW5R
|
||||
YTdTR0NzaVp4VEZlTjlaTHVFNXNSSUEKLS0tIDBGbmhGUFNJQ21zeW1SbWtyWWh0
|
||||
QkFXN2g5TlhBbnlmbW1aSUJQL1FOaWMKTv8OoaTxyG8XhKGZNs4aFR/9SXQ+RG6w
|
||||
+fxiUx7xQnOIYag9YQYfuAgoGzOaj/ha+i18WkQnx9LAgrjCTd+ejA==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age12a3nyvjs8jrwmpkf3tgawel3nwcklwsr35ktmytnvhpawqwzrsfqpgcy0q
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSAzcnVxL09JTEdsZ0FUb2VH
|
||||
a3dSY09uRFFCYnJXQno3YUFhMlpueHJreXdFCjQ4UWdRak5yK0VIT2lYUjBVK2h5
|
||||
RFJmMTlyVEpnS3JxdkE4ckp1UHpLM2sKLS0tIHVyZXRTSHQxL1p1dUxMKzkyV0pW
|
||||
a2o0bG9vZUtmckdYTkhLSVZtZVRtNlUKpALeaeaH4/wFUPPGsNArTAIIJOvBWWDp
|
||||
MUYPJjqLqBVmWzIgCexM2jsDOhtcCV26MXjzTXmZhthaGJMSp23kMQ==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBabTdsZWxZQjV2TGx2YjNM
|
||||
ZTgzWktqTjY0S0M3bFpNZXlDRDk5TSt3V2k0CjdWWTN0TlRlK1RpUm9xYW03MFFG
|
||||
aWN4a3o4VUVnYzBDd2FrelUraWtrMTAKLS0tIE1vTGpKYkhzcWErWDRreml2QmE2
|
||||
ZkNIWERKb1drdVR6MTBSTnVmdm51VEkKVNDYdyBSrUT7dUn6a4eF7ELQ2B2Pk6V9
|
||||
Z5fbT75ibuyX1JO315/gl2P/FhxmlRW1K6e+04gQe2R/t/3H11Q7YQ==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1d2w5zece9647qwyq4vas9qyqegg96xwmg6c86440a6eg4uj6dd2qrq0w3l
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA5M0liYUY1UHRHUDdvN3ds
|
||||
TVdiWDlrWFROSVdRTy9nOHFOUTdmTmlHSzE4CjBpU3gzdjdWaHQzNXRMRkxPdVps
|
||||
TEZXbVlYenUwc3o0TXRnaXg4MmVHQmcKLS0tIDlVeWQ4V0hjbWJqRlNUL2hOWVhp
|
||||
WEJvZWZzbWZFeWZVeWJ1c3pVOWI3MFUKN2QfuOaod5IBKkBkYzi3jvPty+8PRGMJ
|
||||
mozL7qydsb0bAZJtAwcL7HWCr1axar/Ertce0yMqhuthJ5bciVD5xQ==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1gcyfkxh4fq5zdp0dh484aj82ksz66wrly7qhnpv0r0p576sn9ekse8e9ju
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA5L3NmcFMyUUpLOW9mOW9v
|
||||
VXhMTjl5SEFsZ0pzR3lHb1VJL0IzUUxCckdzCnltZnVySkszVUtwbDdQNHAwVWxl
|
||||
V2xJU1BqSG0yMk5sTkpKRTIvc2JORFUKLS0tIHNydWZjdGg3clNpMDhGSGR6VVVh
|
||||
VU1Rbk9ybGRJOG1ETEh4a1orNUY2Z00KJmdp+wLHd+86RJJ/G0QbLp4BEDPXfE9o
|
||||
VZhPPSC6qtUcFV2z6rqSHSpsHPTlgzbCRqX39iePNhfQ2o0lR2P2zQ==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1g5luz2rtel3surgzuh62rkvtey7lythrvfenyq954vmeyfpxjqkqdj3wt8
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBbnhXSG5qdVJHSjNmQ3Qx
|
||||
Yk9zSVBkVTQyb3luYXgwbFJWbG9xK2tWZUdrCkh2MktoWmFOdkRldFNlQW1EMm9t
|
||||
ZHJRa3QrRzh0UElSNGkvSWcyYTUxZzgKLS0tIGdPT2dwWU9LbERYZGxzUTNEUHE1
|
||||
TmlIdWJjbmFvdnVQSURqUTBwbW9EL00Kaiy5ZGgHjKgAGvzbdjbwNExLf4MGDtiE
|
||||
NJEvnmNWkQyEhtx9YzUteY02Tl/D7zBzAWHlV3RjAWTNIwLmm7QgCw==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBVSFhDOFRVbnZWbVlQaG5G
|
||||
U0NWekU0NzI1SlpRN0NVS1hPN210MXY3Z244CmtFemR5OUpzdlBzMHBUV3g0SFFo
|
||||
eUtqNThXZDJ2b01yVVVuOFdwQVo2Qm8KLS0tIHpXRWd3OEpPRkpaVDNDTEJLMWEv
|
||||
ZlZtaFpBdzF0YXFmdjNkNUR3YkxBZU0KAub+HF/OBZQR9bx/SVadZcL6Ms+NQ7yq
|
||||
21HCcDTWyWHbN4ymUrIYXci1A/0tTOrQL9Mkvaz7IJh4VdHLPZrwwA==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1gq8434ku0xekqmvnseeunv83e779cg03c06gwrusnymdsr3rpufqx6vr3m
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBVSDFIa1hNZU1BNWxHckk1
|
||||
UEdJT282Y054eVNpb3VOZ2t3S2NndTkycXdNCk1sNk5uL2xpbXk1MG95dVM1OWVD
|
||||
TldUWmsrSmxGeHYweWhGWXpSaE0xRmcKLS0tIFlVbEp2UU1kM0hhbHlSZm96TFl2
|
||||
TkVaK0xHN1NxNzlpUVYyY2RpdisrQVkKG+DlyZVruH64nB9UtCPMbXhmRHj+zpr6
|
||||
CX4JOTXbUsueZIA4J/N93+d2J3V6yauoRYwCSl/JXX/gaSeSxF4z3A==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBWkhBL1NTdjFDeEhQcEgv
|
||||
Z3c3Z213L2ZhWGo0Qm5Zd1A1RTBDY3plUkh3CkNWV2ZtNWkrUjB0eWFzUlVtbHlk
|
||||
WTdTQjN4eDIzY0c0dyt6ajVXZ0krd1UKLS0tIHB4aEJqTTRMenV3UkFkTGEySjQ2
|
||||
YVM1a3ZPdUU4T244UU0rc3hVQ3NYczQK10wug4kTjsvv/iOPWi5WrVZMOYUq4/Mf
|
||||
oXS4sikXeUsqH1T2LUBjVnUieSneQVn7puYZlN+cpDQ0XdK/RZ+91A==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB3YWxPRTNaVTNLb2tYSzZ5
|
||||
ZmVMYXk2MlVXYzNtZGFJNlJLR2FIVWhKb1RFCmx5bXozeExlbEZBQzhpSHA0T1JE
|
||||
dFpHRm8rcFl1QjZ2anRGYjVxeGJqc0EKLS0tIGVibzRnRTA3Vk5yR3c4QVFsdy95
|
||||
bG1tejcremFiUjZaL3hmc1gwYzJIOGMKFmXmY60vABYlpfop2F020SaOEwV4TNya
|
||||
F0tgrIqbufU1Yw4RhxPdBb9Wv1cQu25lcqQLh1i4VH9BSaWKk6TDEA==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBYcEtHbjNWRkdodUxYdHRn
|
||||
MDBMU08zWDlKa0Z4cHJvc28rZk5pUjhnMjE0CmdzRmVGWDlYQ052Wm1zWnlYSFV6
|
||||
dURQK3JSbThxQlg3M2ZaL1hGRzVuL0UKLS0tIEI3UGZvbEpvRS9aR2J2Tnc1YmxZ
|
||||
aUY5Q2MrdHNQWDJNaGt5MWx6MVRrRVEKRPxyAekGHFMKs0Z6spVDayBA4EtPk18e
|
||||
jiFc97BGVtC5IoSu4icq3ZpKOdxymnkqKEt0YP/p/JTC+8MKvTJFQw==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1vpns76ykll8jgdlu3h05cur4ew2t3k7u03kxdg8y6ypfhsfhq9fqyurjey
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSAzRXM1VUJPNm90UUx4UEdZ
|
||||
cDY5czVQaGl0MEdIMStjTnphTmR5ZkFWTDBjClhTd0xmaHNWUXo3NXR6eEUzTkg2
|
||||
L3BqT1N6bTNsYitmTGVpREtiWEpzdlEKLS0tIFUybTczSlRNbDkxRVZjSnFvdmtq
|
||||
MVdRU3RPSHNqUzJzQWl1VVkyczFaencK72ZmWJIcfBTXlezmefvWeCGOC1BhpkXO
|
||||
bm+X+ihzNfktuOCl6ZIMo2n4aJ3hYakrMp4npO10a6s4o/ldqeiATg==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBQL3ZMUkI1dUV1T2tTSHhn
|
||||
SjhyQ3dKTytoaDBNcit1VHpwVGUzWVNpdjBnCklYZWtBYzBpcGxZSDBvM2tIZm9H
|
||||
bTFjb1ZCaDkrOU1JODVBVTBTbmxFbmcKLS0tIGtGcS9kejZPZlhHRXI5QnI5Wm9Q
|
||||
VjMxTDdWZEltWThKVDl0S24yWHJxZHcKgzH79zT2I7ZgyTbbbvIhLN/rEcfiomJH
|
||||
oSZDFvPiXlhPgy8bRyyq3l47CVpWbUI2Y7DFXRuODpLUirt3K3TmCA==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1hchvlf3apn8g8jq2743pw53sd6v6ay6xu6lqk0qufrjeccan9vzsc7hdfq
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBOL3F3OWRYVVdxWncwWmlk
|
||||
SnloWFdscE02L3ZRa0JGcFlwSU9tU3JRakhnCjZyTnR3T051Tmt2NGM2dkFaNGJz
|
||||
WVRnNDdNN0ozYXJnK0t4ZW5JRVQ2YzQKLS0tIFk0cFBxcVFETERNTGowMThJcDNR
|
||||
UW0wUUlFeHovSS9qYU5BRkJ6dnNjcWcKh2WcrmxsqMZeQ0/2HsaHeSqGsU3ILynU
|
||||
SHBziWHGlFoNirCVjljh/Mw4DM8v66i0ztIQtWV5cFaFhu4kVda5jA==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBPcm9zUm1XUkpLWm1Jb3Uw
|
||||
RncveGozOW5SRThEM1Y4SFF5RDdxUEhZTUE4CjVESHE5R3JZK0krOXZDL0RHR0oy
|
||||
Z3JKaEpydjRjeFFHck1ic2JTRU5yZTQKLS0tIGY2ck56eG95YnpDYlNqUDh5RVp1
|
||||
U3dRYkNleUtsQU1LMWpDbitJbnRIem8K+27HRtZihG8+k7ZC33XVfuXDFjC1e8lA
|
||||
kffmxp9kOEShZF3IKmAjVHFBiPXRyGk3fGPyQLmSMK2UOOfCy/a/qA==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB6ZkovUkMzdmhOUGpZUC91
|
||||
d1JFZGk1T2hOS2dlVFNHRGJKVTUwdUhpQmg0CnEybzlRdjBLcjVEckNtR0xzMDVk
|
||||
dURWbFdnTXk1alV5cjRSMkRrZ21vTjAKLS0tIEtDZlFCTGdVMU1PUWdBYTVOcTU4
|
||||
ZkZHYmJiTUdJUGZhTFdLM1EzdU9wNmsK3AqFfycJfrBpvnjccN1srNiVBCv107rt
|
||||
b/O5zcqKGR3Nzey7zAhlxasPCRKARyBTo292ScZ03QMU8p8HIukdzg==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBTZHlldDdSOEhjTklCSXQr
|
||||
U2pXajFwZnNqQzZOTzY5b3lkMzlyREhXRWo4CmxId2F6NkNqeHNCSWNrcUJIY0Nw
|
||||
cGF6NXJaQnovK1FYSXQ2TkJSTFloTUEKLS0tIHRhWk5aZ0lDVkZaZEJobm9FTDNw
|
||||
a29sZE1GL2ZQSk0vUEc1ZGhkUlpNRkEK9tfe7cNOznSKgxshd5Z6TQiNKp+XW6XH
|
||||
VvPgMqMitgiDYnUPj10bYo3kqhd0xZH2IhLXMnZnqqQ0I23zfPiNaw==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBlOVNVNmFzbTE2NmdiM1dP
|
||||
TlhuTGYyQWlWeFlkaVU3Tml2aDNJbmxXVnlZCmJSb001OVJTaGpRcllzN2JSWDFF
|
||||
b1MyYjdKZys4ZHRoUmFhdG1oYTA2RzQKLS0tIEhGeU9YcW9Wc0ZZK3I5UjB0RHFm
|
||||
bW1ucjZtYXFkT1A4bGszamFxaG5IaHMKqHuaWFi/ImnbDOZ9VisIN7jqplAYV8fo
|
||||
y3PeVX34LcYE0d8cxbvH8CTs/Ubirt6P1obrmAL9W9Y0ozpqdqQSjA==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB5bk9NVjJNWmMxUGd3cXRx
|
||||
amZ5SWJ3dHpHcnM4UHJxdmh6NnhFVmJQdldzCm95dHN3R21qSkE4Vm9VTnVPREp3
|
||||
dUQyS1B4MWhhdmd3dk5LQ0htZEtpTWMKLS0tIGFaa3MxVExFYk1MY2loOFBvWm1o
|
||||
L0NoRStkeW9VZVdpWlhteC8yTnRmMUkKMYjUdE1rGgVR29FnhJ5OEVjTB1Rh5Mtu
|
||||
M/DvlhW3a7tZU8nDF3IgG2GE5xOXZMDO9QWGdB8zO2RJZAr3Q+YIlA==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBXbXo4UWhoMUQxc1lMcnNB
|
||||
VWc1MUJuS3NnVnh4U254TE0wSDJTMzFSM3lrCnhHbmk1N0VqTlViT2dtZndGT1pn
|
||||
NmpPc01iMjk3TXZLU1htZjBvd2NBK2sKLS0tIEN3dGlRZHF5Ykgybjl6MzRBVUJ0
|
||||
Rm92SGdwanFHZlp6U00wMDUzL3MrMzgKtCJqy+BfDMFQMHaIVPlFyzALBsb4Ekls
|
||||
+r7ofZ1ZjSomBljYxVPhKE9XaZJe6bqICEhJBCpODyxavfh8HmxHDQ==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
- recipient: age16prza00sqzuhwwcyakj6z4hvwkruwkqpmmrsn94a5ucgpkelncdq2ldctk
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBiQTRXTHljd2UrbFJOWUI4
|
||||
WGRYcEVrZDJGM3hpVVNmVXlSREYzc1FHRlhFCjZHa2VTTzFHR1RXRmllT1huVDNV
|
||||
UkRKaEQrWjF5eHpiaUg1NExnME5veFkKLS0tIFpZY1RrOVNTTjU0N2Y1dFN6QWpX
|
||||
MTM3NDJrV1JZNE5pWGNLMUg1OFFwYUUKMx0hpB3iunnCbJ/+zWetdp1NI/LsrUTe
|
||||
J84+aDoe7/WJYT0FLMlC0RK80txm6ztVygoyRdN0cRKx1z3KqPmavw==
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBU0xYMnhqOE0wdXdleStF
|
||||
THcrY2NBQzNoRHdYTXY3ZmM5YXRZZkQ4aUZnCm9ad0IxSWxYT1JBd2RseUdVT1pi
|
||||
UXBuNzFxVlN0OWNTQU5BV2NiVEV0RUUKLS0tIGJHY0dzSDczUzcrV0RpTjE0czEy
|
||||
cWZMNUNlTzBRcEV5MjlRV1BsWGhoaUUKGhYaH8I0oPCfrbs7HbQKVOF/99rg3HXv
|
||||
RRTXUI71/ejKIuxehOvifClQc3nUW73bWkASFQ0guUvO4R+c0xOgUg==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
lastmodified: "2025-02-11T21:18:22Z"
|
||||
mac: ENC[AES256_GCM,data:5//boMp1awc/2XAkSASSCuobpkxa0E6IKf3GR8xHpMoCD30FJsCwV7PgX3fR8OuLEhOJ7UguqMNQdNqG37RMacreuDmI1J8oCFKp+3M2j4kCbXaEo8bw7WAtyjUez+SAXKzZWYmBibH0KOy6jdt+v0fdgy5hMBT4IFDofYRsyD0=,iv:6pD+SLwncpmal/FR4U8It2njvaQfUzzpALBCxa0NyME=,tag:4QN8ZFjdqck5ZgulF+FtbA==,type:str]
|
||||
|
||||
@@ -1,8 +1,10 @@
|
||||
{ pkgs, config, ... }:
|
||||
{
|
||||
sops.secrets."actions-token-1" = {
|
||||
sopsFile = ../../secrets/nix-cache01/actions_token_1;
|
||||
format = "binary";
|
||||
vault.secrets.actions-token = {
|
||||
secretPath = "hosts/nix-cache01/actions-token";
|
||||
extractKey = "token";
|
||||
outputDir = "/run/secrets/actions-token-1";
|
||||
services = [ "gitea-runner-actions1" ];
|
||||
};
|
||||
|
||||
virtualisation.podman = {
|
||||
@@ -13,7 +15,7 @@
|
||||
services.gitea-actions-runner.instances = {
|
||||
actions1 = {
|
||||
enable = true;
|
||||
tokenFile = config.sops.secrets.actions-token-1.path;
|
||||
tokenFile = "/run/secrets/actions-token-1";
|
||||
name = "actions1.home.2rjus.net";
|
||||
settings = {
|
||||
log = {
|
||||
|
||||
@@ -1,87 +0,0 @@
|
||||
{ config, ... }:
|
||||
{
|
||||
sops.secrets.authelia_ldap_password = {
|
||||
format = "yaml";
|
||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
||||
key = "authelia_ldap_password";
|
||||
restartUnits = [ "authelia-auth.service" ];
|
||||
owner = "authelia-auth";
|
||||
group = "authelia-auth";
|
||||
};
|
||||
sops.secrets.authelia_jwt_secret = {
|
||||
format = "yaml";
|
||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
||||
key = "authelia_jwt_secret";
|
||||
restartUnits = [ "authelia-auth.service" ];
|
||||
owner = "authelia-auth";
|
||||
group = "authelia-auth";
|
||||
};
|
||||
sops.secrets.authelia_storage_encryption_key_file = {
|
||||
format = "yaml";
|
||||
key = "authelia_storage_encryption_key_file";
|
||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
||||
restartUnits = [ "authelia-auth.service" ];
|
||||
owner = "authelia-auth";
|
||||
group = "authelia-auth";
|
||||
};
|
||||
sops.secrets.authelia_session_secret = {
|
||||
format = "yaml";
|
||||
key = "authelia_session_secret";
|
||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
||||
restartUnits = [ "authelia-auth.service" ];
|
||||
owner = "authelia-auth";
|
||||
group = "authelia-auth";
|
||||
};
|
||||
|
||||
services.authelia.instances."auth" = {
|
||||
enable = true;
|
||||
environmentVariables = {
|
||||
AUTHELIA_AUTHENTICATION_BACKEND_LDAP_PASSWORD_FILE =
|
||||
config.sops.secrets.authelia_ldap_password.path;
|
||||
AUTHELIA_SESSION_SECRET_FILE = config.sops.secrets.authelia_session_secret.path;
|
||||
};
|
||||
secrets = {
|
||||
jwtSecretFile = config.sops.secrets.authelia_jwt_secret.path;
|
||||
storageEncryptionKeyFile = config.sops.secrets.authelia_storage_encryption_key_file.path;
|
||||
};
|
||||
settings = {
|
||||
access_control = {
|
||||
default_policy = "two_factor";
|
||||
};
|
||||
session = {
|
||||
# secret = "{{- fileContent \"${config.sops.secrets.authelia_session_secret.path}\" }}";
|
||||
cookies = [
|
||||
{
|
||||
domain = "home.2rjus.net";
|
||||
authelia_url = "https://auth.home.2rjus.net";
|
||||
default_redirection_url = "https://dashboard.home.2rjus.net";
|
||||
name = "authelia_session";
|
||||
same_site = "lax";
|
||||
inactivity = "1h";
|
||||
expiration = "24h";
|
||||
remember_me = "30d";
|
||||
}
|
||||
];
|
||||
};
|
||||
notifier = {
|
||||
filesystem.filename = "/var/lib/authelia-auth/notification.txt";
|
||||
};
|
||||
storage = {
|
||||
local.path = "/var/lib/authelia-auth/db.sqlite3";
|
||||
};
|
||||
authentication_backend = {
|
||||
password_reset = {
|
||||
disable = false;
|
||||
};
|
||||
ldap = {
|
||||
address = "ldap://127.0.0.1:3890";
|
||||
implementation = "lldap";
|
||||
timeout = "5s";
|
||||
base_dn = "dc=home,dc=2rjus,dc=net";
|
||||
user = "uid=authelia_ldap_user,ou=people,dc=home,dc=2rjus,dc=net";
|
||||
# password = "{{- fileContent \"${config.sops.secrets.authelia_ldap_password.path}\" -}}";
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
}
|
||||
@@ -1,5 +1,9 @@
|
||||
{ pkgs, unstable, ... }:
|
||||
{
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "step-ca";
|
||||
port = 9000;
|
||||
}];
|
||||
sops.secrets."ca_root_pw" = {
|
||||
sopsFile = ../../secrets/ca/secrets.yaml;
|
||||
owner = "step-ca";
|
||||
|
||||
@@ -1,5 +1,11 @@
|
||||
{ pkgs, config, ... }:
|
||||
{
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "home-assistant";
|
||||
port = 8123;
|
||||
metrics_path = "/api/prometheus";
|
||||
scrape_interval = "60s";
|
||||
}];
|
||||
# Enable the Home Assistant service
|
||||
services.home-assistant = {
|
||||
enable = true;
|
||||
@@ -63,6 +69,44 @@
|
||||
frontend = true;
|
||||
permit_join = false;
|
||||
serial.port = "/dev/ttyUSB0";
|
||||
|
||||
# Inline device configuration (replaces devices.yaml)
|
||||
# This allows declarative management and homeassistant overrides
|
||||
devices = {
|
||||
# Temperature sensors with battery fix
|
||||
# WSDCGQ12LM sensors report battery: 0 due to firmware quirk
|
||||
# Override battery calculation using voltage (mV): (voltage - 2100) / 9
|
||||
"0x54ef441000a547bd" = {
|
||||
friendly_name = "0x54ef441000a547bd";
|
||||
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||
};
|
||||
"0x54ef441000a54d3c" = {
|
||||
friendly_name = "0x54ef441000a54d3c";
|
||||
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||
};
|
||||
"0x54ef441000a564b6" = {
|
||||
friendly_name = "temp_server";
|
||||
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||
};
|
||||
|
||||
# Other sensors
|
||||
"0x00124b0025495463".friendly_name = "0x00124b0025495463"; # SONOFF temp sensor (battery works)
|
||||
"0x54ef4410009ac117".friendly_name = "0x54ef4410009ac117"; # Water leak sensor
|
||||
|
||||
# Buttons
|
||||
"0x54ef441000a1f907".friendly_name = "btn_livingroom";
|
||||
"0x54ef441000a1ee71".friendly_name = "btn_bedroom";
|
||||
|
||||
# Philips Hue lights
|
||||
"0x001788010d1b599a" = {
|
||||
friendly_name = "0x001788010d1b599a";
|
||||
transition = 5;
|
||||
};
|
||||
"0x001788010d253b99".friendly_name = "0x001788010d253b99";
|
||||
"0x001788010e371aa4".friendly_name = "0x001788010e371aa4";
|
||||
"0x001788010dc5f003".friendly_name = "0x001788010dc5f003";
|
||||
"0x001788010dc35d06".friendly_name = "0x001788010dc35d06";
|
||||
};
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
@@ -3,4 +3,9 @@
|
||||
imports = [
|
||||
./proxy.nix
|
||||
];
|
||||
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "caddy";
|
||||
port = 80;
|
||||
}];
|
||||
}
|
||||
|
||||
@@ -86,22 +86,6 @@
|
||||
}
|
||||
reverse_proxy http://jelly01.home.2rjus.net:8096
|
||||
}
|
||||
lldap.home.2rjus.net {
|
||||
log {
|
||||
output file /var/log/caddy/auth.log {
|
||||
mode 644
|
||||
}
|
||||
}
|
||||
reverse_proxy http://auth01.home.2rjus.net:17170
|
||||
}
|
||||
auth.home.2rjus.net {
|
||||
log {
|
||||
output file /var/log/caddy/auth.log {
|
||||
mode 644
|
||||
}
|
||||
}
|
||||
reverse_proxy http://auth01.home.2rjus.net:9091
|
||||
}
|
||||
pyroscope.home.2rjus.net {
|
||||
log {
|
||||
output file /var/log/caddy/pyroscope.log {
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
{ pkgs, ... }:
|
||||
{
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "jellyfin";
|
||||
port = 8096;
|
||||
}];
|
||||
services.jellyfin = {
|
||||
enable = true;
|
||||
};
|
||||
|
||||
@@ -1,38 +0,0 @@
|
||||
{ config, ... }:
|
||||
{
|
||||
sops.secrets.lldap_user_pass = {
|
||||
format = "yaml";
|
||||
key = "lldap_user_pass";
|
||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
||||
restartUnits = [ "lldap.service" ];
|
||||
group = "acme";
|
||||
mode = "0440";
|
||||
};
|
||||
|
||||
services.lldap = {
|
||||
enable = true;
|
||||
settings = {
|
||||
ldap_base_dn = "dc=home,dc=2rjus,dc=net";
|
||||
ldap_user_email = "admin@home.2rjus.net";
|
||||
ldap_user_dn = "admin";
|
||||
ldap_user_pass_file = config.sops.secrets.lldap_user_pass.path;
|
||||
ldaps_options = {
|
||||
enabled = true;
|
||||
port = 6360;
|
||||
cert_file = "/var/lib/acme/auth01.home.2rjus.net/cert.pem";
|
||||
key_file = "/var/lib/acme/auth01.home.2rjus.net/key.pem";
|
||||
};
|
||||
};
|
||||
};
|
||||
systemd.services.lldap = {
|
||||
serviceConfig = {
|
||||
SupplementaryGroups = [ "acme" ];
|
||||
};
|
||||
};
|
||||
security.acme.certs."auth01.home.2rjus.net" = {
|
||||
listenHTTP = ":80";
|
||||
reloadServices = [ "lldap" ];
|
||||
extraDomainNames = [ "ldap.home.2rjus.net" ];
|
||||
enableDebugLogs = true;
|
||||
};
|
||||
}
|
||||
@@ -1,12 +1,18 @@
|
||||
{ pkgs, config, ... }:
|
||||
{
|
||||
sops.secrets."nats_nkey" = { };
|
||||
vault.secrets.nats-nkey = {
|
||||
secretPath = "shared/nats/nkey";
|
||||
extractKey = "nkey";
|
||||
outputDir = "/run/secrets/nats_nkey";
|
||||
services = [ "alerttonotify" ];
|
||||
};
|
||||
|
||||
systemd.services."alerttonotify" = {
|
||||
enable = true;
|
||||
wants = [ "network-online.target" ];
|
||||
after = [
|
||||
"network-online.target"
|
||||
"sops-nix.service"
|
||||
"vault-secret-nats-nkey.service"
|
||||
];
|
||||
wantedBy = [ "multi-user.target" ];
|
||||
restartIfChanged = true;
|
||||
|
||||
12
services/monitoring/external-targets.nix
Normal file
12
services/monitoring/external-targets.nix
Normal file
@@ -0,0 +1,12 @@
|
||||
# Monitoring targets for hosts not managed by this flake
|
||||
# These are manually maintained and combined with auto-generated targets
|
||||
{
|
||||
nodeExporter = [
|
||||
"gunter.home.2rjus.net:9100"
|
||||
];
|
||||
scrapeConfigs = [
|
||||
{ job_name = "smartctl"; targets = [ "gunter.home.2rjus.net:9633" ]; }
|
||||
{ job_name = "ghettoptt"; targets = [ "gunter.home.2rjus.net:8989" ]; }
|
||||
{ job_name = "restic_rest"; targets = [ "10.69.12.52:8000" ]; }
|
||||
];
|
||||
}
|
||||
@@ -1,7 +1,82 @@
|
||||
{ ... }:
|
||||
{ self, lib, pkgs, ... }:
|
||||
let
|
||||
monLib = import ../../lib/monitoring.nix { inherit lib; };
|
||||
externalTargets = import ./external-targets.nix;
|
||||
|
||||
nodeExporterTargets = monLib.generateNodeExporterTargets self externalTargets;
|
||||
autoScrapeConfigs = monLib.generateScrapeConfigs self externalTargets;
|
||||
|
||||
# Script to fetch AppRole token for Prometheus to use when scraping OpenBao metrics
|
||||
fetchOpenbaoToken = pkgs.writeShellApplication {
|
||||
name = "fetch-openbao-token";
|
||||
runtimeInputs = [ pkgs.curl pkgs.jq ];
|
||||
text = ''
|
||||
VAULT_ADDR="https://vault01.home.2rjus.net:8200"
|
||||
APPROLE_DIR="/var/lib/vault/approle"
|
||||
OUTPUT_FILE="/run/secrets/prometheus/openbao-token"
|
||||
|
||||
# Read AppRole credentials
|
||||
if [ ! -f "$APPROLE_DIR/role-id" ] || [ ! -f "$APPROLE_DIR/secret-id" ]; then
|
||||
echo "AppRole credentials not found at $APPROLE_DIR" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
ROLE_ID=$(cat "$APPROLE_DIR/role-id")
|
||||
SECRET_ID=$(cat "$APPROLE_DIR/secret-id")
|
||||
|
||||
# Authenticate to Vault
|
||||
AUTH_RESPONSE=$(curl -sf -k -X POST \
|
||||
-d "{\"role_id\":\"$ROLE_ID\",\"secret_id\":\"$SECRET_ID\"}" \
|
||||
"$VAULT_ADDR/v1/auth/approle/login")
|
||||
|
||||
# Extract token
|
||||
VAULT_TOKEN=$(echo "$AUTH_RESPONSE" | jq -r '.auth.client_token')
|
||||
if [ -z "$VAULT_TOKEN" ] || [ "$VAULT_TOKEN" = "null" ]; then
|
||||
echo "Failed to extract Vault token from response" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Write token to file
|
||||
mkdir -p "$(dirname "$OUTPUT_FILE")"
|
||||
echo -n "$VAULT_TOKEN" > "$OUTPUT_FILE"
|
||||
chown prometheus:prometheus "$OUTPUT_FILE"
|
||||
chmod 0400 "$OUTPUT_FILE"
|
||||
|
||||
echo "Successfully fetched OpenBao token"
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
# Systemd service to fetch AppRole token for Prometheus OpenBao scraping
|
||||
# The token is used to authenticate when scraping /v1/sys/metrics
|
||||
systemd.services.prometheus-openbao-token = {
|
||||
description = "Fetch OpenBao token for Prometheus metrics scraping";
|
||||
after = [ "network-online.target" ];
|
||||
wants = [ "network-online.target" ];
|
||||
before = [ "prometheus.service" ];
|
||||
requiredBy = [ "prometheus.service" ];
|
||||
|
||||
serviceConfig = {
|
||||
Type = "oneshot";
|
||||
ExecStart = lib.getExe fetchOpenbaoToken;
|
||||
};
|
||||
};
|
||||
|
||||
# Timer to periodically refresh the token (AppRole tokens have 1-hour TTL)
|
||||
systemd.timers.prometheus-openbao-token = {
|
||||
description = "Refresh OpenBao token for Prometheus";
|
||||
wantedBy = [ "timers.target" ];
|
||||
timerConfig = {
|
||||
OnBootSec = "5min";
|
||||
OnUnitActiveSec = "30min";
|
||||
RandomizedDelaySec = "5min";
|
||||
};
|
||||
};
|
||||
|
||||
services.prometheus = {
|
||||
enable = true;
|
||||
# syntax-only check because we use external credential files (e.g., openbao-token)
|
||||
checkConfig = "syntax-only";
|
||||
alertmanager = {
|
||||
enable = true;
|
||||
configuration = {
|
||||
@@ -45,26 +120,25 @@
|
||||
];
|
||||
|
||||
scrapeConfigs = [
|
||||
# Auto-generated node-exporter targets from flake hosts + external
|
||||
{
|
||||
job_name = "node-exporter";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [
|
||||
"ca.home.2rjus.net:9100"
|
||||
"gunter.home.2rjus.net:9100"
|
||||
"ha1.home.2rjus.net:9100"
|
||||
"http-proxy.home.2rjus.net:9100"
|
||||
"jelly01.home.2rjus.net:9100"
|
||||
"monitoring01.home.2rjus.net:9100"
|
||||
"nix-cache01.home.2rjus.net:9100"
|
||||
"ns1.home.2rjus.net:9100"
|
||||
"ns2.home.2rjus.net:9100"
|
||||
"pgdb1.home.2rjus.net:9100"
|
||||
"nats1.home.2rjus.net:9100"
|
||||
];
|
||||
targets = nodeExporterTargets;
|
||||
}
|
||||
];
|
||||
}
|
||||
# Systemd exporter on all hosts (same targets, different port)
|
||||
{
|
||||
job_name = "systemd-exporter";
|
||||
static_configs = [
|
||||
{
|
||||
targets = map (t: builtins.replaceStrings [":9100"] [":9558"] t) nodeExporterTargets;
|
||||
}
|
||||
];
|
||||
}
|
||||
# Local monitoring services (not auto-generated)
|
||||
{
|
||||
job_name = "prometheus";
|
||||
static_configs = [
|
||||
@@ -85,7 +159,7 @@
|
||||
job_name = "grafana";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "localhost:3100" ];
|
||||
targets = [ "localhost:3000" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
@@ -98,13 +172,35 @@
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "restic_rest";
|
||||
job_name = "pushgateway";
|
||||
honor_labels = true;
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "10.69.12.52:8000" ];
|
||||
targets = [ "localhost:9091" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "labmon";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "monitoring01.home.2rjus.net:9969" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
# TODO: nix-cache_caddy can't be auto-generated because the cert is issued
|
||||
# for nix-cache.home.2rjus.net (service CNAME), not nix-cache01 (hostname).
|
||||
# Consider adding a target override to homelab.monitoring.scrapeTargets.
|
||||
{
|
||||
job_name = "nix-cache_caddy";
|
||||
scheme = "https";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "nix-cache.home.2rjus.net" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
# pve-exporter with complex relabel config
|
||||
{
|
||||
job_name = "pve-exporter";
|
||||
static_configs = [
|
||||
@@ -133,91 +229,24 @@
|
||||
}
|
||||
];
|
||||
}
|
||||
# OpenBao metrics with bearer token auth
|
||||
{
|
||||
job_name = "caddy";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "http-proxy.home.2rjus.net" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "jellyfin";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "jelly01.home.2rjus.net:8096" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "smartctl";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "gunter.home.2rjus.net:9633" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "wireguard";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "http-proxy.home.2rjus.net:9586" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "home-assistant";
|
||||
scrape_interval = "60s";
|
||||
metrics_path = "/api/prometheus";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "ha1.home.2rjus.net:8123" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "ghettoptt";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "gunter.home.2rjus.net:8989" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "step-ca";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "ca.home.2rjus.net:9000" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "labmon";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "monitoring01.home.2rjus.net:9969" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "pushgateway";
|
||||
honor_labels = true;
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "localhost:9091" ];
|
||||
}
|
||||
];
|
||||
}
|
||||
{
|
||||
job_name = "nix-cache_caddy";
|
||||
job_name = "openbao";
|
||||
scheme = "https";
|
||||
static_configs = [
|
||||
{
|
||||
targets = [ "nix-cache.home.2rjus.net" ];
|
||||
}
|
||||
];
|
||||
metrics_path = "/v1/sys/metrics";
|
||||
params = {
|
||||
format = [ "prometheus" ];
|
||||
};
|
||||
static_configs = [{
|
||||
targets = [ "vault01.home.2rjus.net:8200" ];
|
||||
}];
|
||||
authorization = {
|
||||
type = "Bearer";
|
||||
credentials_file = "/run/secrets/prometheus/openbao-token";
|
||||
};
|
||||
}
|
||||
];
|
||||
] ++ autoScrapeConfigs;
|
||||
|
||||
pushgateway = {
|
||||
enable = true;
|
||||
web = {
|
||||
|
||||
@@ -1,14 +1,16 @@
|
||||
{ config, ... }:
|
||||
{
|
||||
sops.secrets.pve_exporter = {
|
||||
format = "yaml";
|
||||
sopsFile = ../../secrets/monitoring01/pve-exporter.yaml;
|
||||
key = "";
|
||||
vault.secrets.pve-exporter = {
|
||||
secretPath = "hosts/monitoring01/pve-exporter";
|
||||
extractKey = "config";
|
||||
outputDir = "/run/secrets/pve_exporter";
|
||||
mode = "0444";
|
||||
services = [ "prometheus-pve-exporter" ];
|
||||
};
|
||||
|
||||
services.prometheus.exporters.pve = {
|
||||
enable = true;
|
||||
configFile = config.sops.secrets.pve_exporter.path;
|
||||
configFile = "/run/secrets/pve_exporter";
|
||||
collectors = {
|
||||
cluster = false;
|
||||
replication = false;
|
||||
|
||||
@@ -18,13 +18,21 @@ groups:
|
||||
summary: "Disk space low on {{ $labels.instance }}"
|
||||
description: "Disk space is low on {{ $labels.instance }}. Please check."
|
||||
- alert: high_cpu_load
|
||||
expr: max(node_load5{}) by (instance) > (count by (instance)(node_cpu_seconds_total{mode="idle"}) * 0.7)
|
||||
expr: max(node_load5{instance!="nix-cache01.home.2rjus.net:9100"}) by (instance) > (count by (instance)(node_cpu_seconds_total{instance!="nix-cache01.home.2rjus.net:9100", mode="idle"}) * 0.7)
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High CPU load on {{ $labels.instance }}"
|
||||
description: "CPU load is high on {{ $labels.instance }}. Please check."
|
||||
- alert: high_cpu_load
|
||||
expr: max(node_load5{instance="nix-cache01.home.2rjus.net:9100"}) by (instance) > (count by (instance)(node_cpu_seconds_total{instance="nix-cache01.home.2rjus.net:9100", mode="idle"}) * 0.7)
|
||||
for: 2h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High CPU load on {{ $labels.instance }}"
|
||||
description: "CPU load is high on {{ $labels.instance }}. Please check."
|
||||
- alert: low_memory
|
||||
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
|
||||
for: 2m
|
||||
@@ -57,6 +65,38 @@ groups:
|
||||
annotations:
|
||||
summary: "Promtail service not running on {{ $labels.instance }}"
|
||||
description: "The promtail service has not been active on {{ $labels.instance }} for 5 minutes."
|
||||
- alert: filesystem_filling_up
|
||||
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24*3600) < 0
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Filesystem predicted to fill within 24h on {{ $labels.instance }}"
|
||||
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
||||
- alert: systemd_not_running
|
||||
expr: node_systemd_system_running == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Systemd not in running state on {{ $labels.instance }}"
|
||||
description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state."
|
||||
- alert: high_file_descriptors
|
||||
expr: node_filefd_allocated / node_filefd_maximum > 0.8
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High file descriptor usage on {{ $labels.instance }}"
|
||||
description: "More than 80% of file descriptors are in use on {{ $labels.instance }}."
|
||||
- alert: host_reboot
|
||||
expr: changes(node_boot_time_seconds[10m]) > 0
|
||||
for: 0m
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
summary: "Host {{ $labels.instance }} has rebooted"
|
||||
description: "Host {{ $labels.instance }} has rebooted."
|
||||
- name: nameserver_rules
|
||||
rules:
|
||||
- alert: unbound_down
|
||||
@@ -75,7 +115,15 @@ groups:
|
||||
annotations:
|
||||
summary: "NSD not running on {{ $labels.instance }}"
|
||||
description: "NSD has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- name: http-proxy_rules
|
||||
- alert: unbound_low_cache_hit_ratio
|
||||
expr: (rate(unbound_cache_hits_total[5m]) / (rate(unbound_cache_hits_total[5m]) + rate(unbound_cache_misses_total[5m]))) < 0.5
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Low DNS cache hit ratio on {{ $labels.instance }}"
|
||||
description: "Unbound cache hit ratio is below 50% on {{ $labels.instance }}."
|
||||
- name: http_proxy_rules
|
||||
rules:
|
||||
- alert: caddy_down
|
||||
expr: node_systemd_unit_state {instance="http-proxy.home.2rjus.net:9100", name = "caddy.service", state = "active"} == 0
|
||||
@@ -85,6 +133,22 @@ groups:
|
||||
annotations:
|
||||
summary: "Caddy not running on {{ $labels.instance }}"
|
||||
description: "Caddy has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- alert: caddy_upstream_unhealthy
|
||||
expr: caddy_reverse_proxy_upstreams_healthy == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Caddy upstream unhealthy for {{ $labels.upstream }}"
|
||||
description: "Caddy reverse proxy upstream {{ $labels.upstream }} is unhealthy on {{ $labels.instance }}."
|
||||
- alert: caddy_high_error_rate
|
||||
expr: rate(caddy_http_request_errors_total[5m]) > 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High HTTP error rate on {{ $labels.instance }}"
|
||||
description: "Caddy is experiencing a high rate of HTTP errors on {{ $labels.instance }}."
|
||||
- name: nats_rules
|
||||
rules:
|
||||
- alert: nats_down
|
||||
@@ -95,9 +159,17 @@ groups:
|
||||
annotations:
|
||||
summary: "NATS not running on {{ $labels.instance }}"
|
||||
description: "NATS has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- alert: nats_slow_consumers
|
||||
expr: nats_core_slow_consumer_count > 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "NATS has slow consumers on {{ $labels.instance }}"
|
||||
description: "NATS has {{ $value }} slow consumers on {{ $labels.instance }}."
|
||||
- name: nix_cache_rules
|
||||
rules:
|
||||
- alert: build-flakes_service_not_active_recently
|
||||
- alert: build_flakes_service_not_active_recently
|
||||
expr: count_over_time(node_systemd_unit_state{instance="nix-cache01.home.2rjus.net:9100", name="build-flakes.service", state="active"}[1h]) < 1
|
||||
for: 0m
|
||||
labels:
|
||||
@@ -138,7 +210,7 @@ groups:
|
||||
annotations:
|
||||
summary: "Home assistant not running on {{ $labels.instance }}"
|
||||
description: "Home assistant has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- alert: zigbee2qmtt_down
|
||||
- alert: zigbee2mqtt_down
|
||||
expr: node_systemd_unit_state {instance = "ha1.home.2rjus.net:9100", name = "zigbee2mqtt.service", state = "active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
@@ -154,9 +226,17 @@ groups:
|
||||
annotations:
|
||||
summary: "Mosquitto not running on {{ $labels.instance }}"
|
||||
description: "Mosquitto has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- alert: zigbee_sensor_stale
|
||||
expr: (time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 7200
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Zigbee sensor {{ $labels.friendly_name }} is stale"
|
||||
description: "Zigbee temperature sensor {{ $labels.entity }} has not reported data for over 2 hours. The sensor may have a dead battery or connectivity issues."
|
||||
- name: smartctl_rules
|
||||
rules:
|
||||
- alert: SmartCriticalWarning
|
||||
- alert: smart_critical_warning
|
||||
expr: smartctl_device_critical_warning > 0
|
||||
for: 0m
|
||||
labels:
|
||||
@@ -164,7 +244,7 @@ groups:
|
||||
annotations:
|
||||
summary: SMART critical warning (instance {{ $labels.instance }})
|
||||
description: "Disk controller has critical warning on {{ $labels.instance }} drive {{ $labels.device }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
|
||||
- alert: SmartMediaErrors
|
||||
- alert: smart_media_errors
|
||||
expr: smartctl_device_media_errors > 0
|
||||
for: 0m
|
||||
labels:
|
||||
@@ -172,7 +252,7 @@ groups:
|
||||
annotations:
|
||||
summary: SMART media errors (instance {{ $labels.instance }})
|
||||
description: "Disk controller detected media errors on {{ $labels.instance }} drive {{ $labels.device }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
|
||||
- alert: SmartWearoutIndicator
|
||||
- alert: smart_wearout_indicator
|
||||
expr: smartctl_device_available_spare < smartctl_device_available_spare_threshold
|
||||
for: 0m
|
||||
labels:
|
||||
@@ -180,20 +260,29 @@ groups:
|
||||
annotations:
|
||||
summary: SMART Wearout Indicator (instance {{ $labels.instance }})
|
||||
description: "Device is wearing out on {{ $labels.instance }} drive {{ $labels.device }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
|
||||
- alert: smartctl_high_temperature
|
||||
expr: smartctl_device_temperature > 60
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Disk temperature above 60C on {{ $labels.instance }}"
|
||||
description: "Disk {{ $labels.device }} on {{ $labels.instance }} has temperature {{ $value }}C."
|
||||
- name: wireguard_rules
|
||||
rules:
|
||||
- alert: WireguardHandshake
|
||||
expr: (time() - wireguard_latest_handshake_seconds{instance="http-proxy.home.2rjus.net:9586",interface="wg0",public_key="32Rb13wExcy8uI92JTnFdiOfkv0mlQ6f181WA741DHs="}) > 300
|
||||
- alert: wireguard_handshake_timeout
|
||||
expr: (time() - wireguard_latest_handshake_seconds{interface="wg0"}) > 300
|
||||
for: 1m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Wireguard handshake timeout on {{ $labels.instance }}"
|
||||
description: "Wireguard handshake timeout on {{ $labels.instance }} for more than 1 minutes."
|
||||
description: "Wireguard handshake timeout on {{ $labels.instance }} for peer {{ $labels.public_key }}."
|
||||
- name: monitoring_rules
|
||||
rules:
|
||||
- alert: prometheus_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="prometheus.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
@@ -201,6 +290,7 @@ groups:
|
||||
description: "Prometheus service not running on {{ $labels.instance }}"
|
||||
- alert: alertmanager_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="alertmanager.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
@@ -208,13 +298,7 @@ groups:
|
||||
description: "Alertmanager service not running on {{ $labels.instance }}"
|
||||
- alert: pushgateway_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="pushgateway.service", state="active"} == 0
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Pushgateway service not running on {{ $labels.instance }}"
|
||||
description: "Pushgateway service not running on {{ $labels.instance }}"
|
||||
- alert: pushgateway_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="pushgateway.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
@@ -222,6 +306,7 @@ groups:
|
||||
description: "Pushgateway service not running on {{ $labels.instance }}"
|
||||
- alert: loki_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="loki.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
@@ -229,6 +314,7 @@ groups:
|
||||
description: "Loki service not running on {{ $labels.instance }}"
|
||||
- alert: grafana_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="grafana.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
@@ -236,6 +322,7 @@ groups:
|
||||
description: "Grafana service not running on {{ $labels.instance }}"
|
||||
- alert: tempo_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="tempo.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
@@ -243,8 +330,123 @@ groups:
|
||||
description: "Tempo service not running on {{ $labels.instance }}"
|
||||
- alert: pyroscope_not_running
|
||||
expr: node_systemd_unit_state{instance="monitoring01.home.2rjus.net:9100", name="podman-pyroscope.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Pyroscope service not running on {{ $labels.instance }}"
|
||||
description: "Pyroscope service not running on {{ $labels.instance }}"
|
||||
- name: certificate_rules
|
||||
rules:
|
||||
- alert: certificate_expiring_soon
|
||||
expr: labmon_tlsconmon_certificate_seconds_left{address!="ca.home.2rjus.net:443"} < 86400
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "TLS certificate expiring soon for {{ $labels.instance }}"
|
||||
description: "TLS certificate for {{ $labels.address }} is expiring within 24 hours."
|
||||
- alert: step_ca_serving_cert_expiring
|
||||
expr: labmon_tlsconmon_certificate_seconds_left{address="ca.home.2rjus.net:443"} < 3600
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Step-CA serving certificate expiring"
|
||||
description: "The step-ca serving certificate (24h auto-renewed) has less than 1 hour of validity left. Renewal may have failed."
|
||||
- alert: certificate_check_error
|
||||
expr: labmon_tlsconmon_certificate_check_error == 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Error checking certificate for {{ $labels.address }}"
|
||||
description: "Certificate check is failing for {{ $labels.address }} on {{ $labels.instance }}."
|
||||
- alert: step_ca_certificate_expiring
|
||||
expr: labmon_stepmon_certificate_seconds_left < 3600
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Step-CA certificate expiring for {{ $labels.instance }}"
|
||||
description: "Step-CA certificate is expiring within 1 hour on {{ $labels.instance }}."
|
||||
- name: proxmox_rules
|
||||
rules:
|
||||
- alert: pve_node_down
|
||||
expr: pve_up{id=~"node/.*"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Proxmox node {{ $labels.id }} is down"
|
||||
description: "Proxmox node {{ $labels.id }} has been down for more than 5 minutes."
|
||||
- alert: pve_guest_stopped
|
||||
expr: pve_up{id=~"qemu/.*"} == 0 and pve_onboot_status == 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Proxmox VM {{ $labels.id }} is stopped"
|
||||
description: "Proxmox VM {{ $labels.id }} ({{ $labels.name }}) has onboot=1 but is stopped."
|
||||
- name: postgres_rules
|
||||
rules:
|
||||
- alert: postgres_down
|
||||
expr: node_systemd_unit_state{instance="pgdb1.home.2rjus.net:9100", name="postgresql.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "PostgreSQL not running on {{ $labels.instance }}"
|
||||
description: "PostgreSQL has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- alert: postgres_exporter_down
|
||||
expr: up{job="postgres"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "PostgreSQL exporter down on {{ $labels.instance }}"
|
||||
description: "Cannot scrape PostgreSQL metrics from {{ $labels.instance }}."
|
||||
- alert: postgres_high_connections
|
||||
expr: pg_stat_activity_count / pg_settings_max_connections > 0.8
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "PostgreSQL connection pool near exhaustion on {{ $labels.instance }}"
|
||||
description: "PostgreSQL is using over 80% of max_connections on {{ $labels.instance }}."
|
||||
- name: jellyfin_rules
|
||||
rules:
|
||||
- alert: jellyfin_down
|
||||
expr: up{job="jellyfin"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Jellyfin not responding on {{ $labels.instance }}"
|
||||
description: "Cannot scrape Jellyfin metrics from {{ $labels.instance }} for 5 minutes."
|
||||
- name: vault_rules
|
||||
rules:
|
||||
- alert: openbao_down
|
||||
expr: node_systemd_unit_state{instance="vault01.home.2rjus.net:9100", name="openbao.service", state="active"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "OpenBao not running on {{ $labels.instance }}"
|
||||
description: "OpenBao has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- alert: openbao_sealed
|
||||
expr: vault_core_unsealed == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "OpenBao is sealed on {{ $labels.instance }}"
|
||||
description: "OpenBao has been sealed on {{ $labels.instance }} for more than 5 minutes."
|
||||
- alert: openbao_scrape_down
|
||||
expr: up{job="openbao"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Cannot scrape OpenBao metrics from {{ $labels.instance }}"
|
||||
description: "OpenBao metrics endpoint is not responding on {{ $labels.instance }}."
|
||||
|
||||
@@ -1,10 +1,26 @@
|
||||
{ ... }:
|
||||
{
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "nats";
|
||||
port = 7777;
|
||||
}];
|
||||
|
||||
services.prometheus.exporters.nats = {
|
||||
enable = true;
|
||||
url = "http://localhost:8222";
|
||||
extraFlags = [
|
||||
"-varz" # General server info
|
||||
"-connz" # Connection info
|
||||
"-jsz=all" # JetStream info
|
||||
];
|
||||
};
|
||||
|
||||
services.nats = {
|
||||
enable = true;
|
||||
jetstream = true;
|
||||
serverName = "nats1";
|
||||
settings = {
|
||||
http_port = 8222;
|
||||
accounts = {
|
||||
ADMIN = {
|
||||
users = [
|
||||
|
||||
@@ -6,4 +6,5 @@
|
||||
./proxy.nix
|
||||
./nix.nix
|
||||
];
|
||||
|
||||
}
|
||||
|
||||
@@ -1,14 +1,16 @@
|
||||
{ pkgs, config, ... }:
|
||||
{
|
||||
sops.secrets."cache-secret" = {
|
||||
sopsFile = ../../secrets/nix-cache01/cache-secret;
|
||||
format = "binary";
|
||||
vault.secrets.cache-secret = {
|
||||
secretPath = "hosts/nix-cache01/cache-secret";
|
||||
extractKey = "key";
|
||||
outputDir = "/run/secrets/cache-secret";
|
||||
services = [ "harmonia" ];
|
||||
};
|
||||
|
||||
services.harmonia = {
|
||||
enable = true;
|
||||
package = pkgs.unstable.harmonia;
|
||||
signKeyPaths = [ config.sops.secrets.cache-secret.path ];
|
||||
signKeyPaths = [ "/run/secrets/cache-secret" ];
|
||||
};
|
||||
systemd.services.harmonia = {
|
||||
environment.RUST_LOG = "info,actix_web=debug";
|
||||
|
||||
33
services/ns/external-hosts.nix
Normal file
33
services/ns/external-hosts.nix
Normal file
@@ -0,0 +1,33 @@
|
||||
# DNS records for hosts not managed by this flake
|
||||
# These are manually maintained and combined with auto-generated records
|
||||
{
|
||||
aRecords = {
|
||||
# 10
|
||||
"gw" = "10.69.10.1";
|
||||
|
||||
# 12_CORE
|
||||
"nas" = "10.69.12.50";
|
||||
"nzbget-jail" = "10.69.12.51";
|
||||
"restic" = "10.69.12.52";
|
||||
"radarr-jail" = "10.69.12.53";
|
||||
"sonarr-jail" = "10.69.12.54";
|
||||
"bazarr" = "10.69.12.55";
|
||||
"pve1" = "10.69.12.75";
|
||||
"inc1" = "10.69.12.80";
|
||||
|
||||
# 22_WLAN
|
||||
"unifi-ctrl" = "10.69.22.5";
|
||||
|
||||
# 30
|
||||
"gunter" = "10.69.30.105";
|
||||
|
||||
# 31
|
||||
"media" = "10.69.31.50";
|
||||
|
||||
# 99_MGMT
|
||||
"sw1" = "10.69.99.2";
|
||||
};
|
||||
|
||||
cnames = {
|
||||
};
|
||||
}
|
||||
@@ -1,7 +1,22 @@
|
||||
{ ... }:
|
||||
{ self, lib, ... }:
|
||||
let
|
||||
dnsLib = import ../../lib/dns-zone.nix { inherit lib; };
|
||||
externalHosts = import ./external-hosts.nix;
|
||||
|
||||
# Generate zone from flake hosts + external hosts
|
||||
# Use lastModified from git commit as serial number
|
||||
zoneData = dnsLib.generateZone {
|
||||
inherit self externalHosts;
|
||||
serial = self.sourceInfo.lastModified;
|
||||
domain = "home.2rjus.net";
|
||||
};
|
||||
in
|
||||
{
|
||||
sops.secrets.ns_xfer_key = {
|
||||
path = "/etc/nsd/xfer.key";
|
||||
vault.secrets.ns-xfer-key = {
|
||||
secretPath = "shared/dns/xfer-key";
|
||||
extractKey = "key";
|
||||
outputDir = "/etc/nsd/xfer.key";
|
||||
services = [ "nsd" ];
|
||||
};
|
||||
|
||||
networking.firewall.allowedTCPPorts = [ 8053 ];
|
||||
@@ -26,7 +41,7 @@
|
||||
"home.2rjus.net" = {
|
||||
provideXFR = [ "10.69.13.6 xferkey" ];
|
||||
notify = [ "10.69.13.6@8053 xferkey" ];
|
||||
data = builtins.readFile ./zones-home-2rjus-net.conf;
|
||||
data = zoneData;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
@@ -1,10 +1,24 @@
|
||||
{ pkgs, ... }: {
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "unbound";
|
||||
port = 9167;
|
||||
}];
|
||||
|
||||
networking.firewall.allowedTCPPorts = [
|
||||
53
|
||||
];
|
||||
networking.firewall.allowedUDPPorts = [
|
||||
53
|
||||
];
|
||||
|
||||
services.prometheus.exporters.unbound = {
|
||||
enable = true;
|
||||
unbound.host = "unix:///run/unbound/unbound.ctl";
|
||||
};
|
||||
|
||||
# Grant exporter access to unbound socket
|
||||
systemd.services.prometheus-unbound-exporter.serviceConfig.SupplementaryGroups = [ "unbound" ];
|
||||
|
||||
services.unbound = {
|
||||
enable = true;
|
||||
|
||||
@@ -23,6 +37,11 @@
|
||||
do-ip6 = "no";
|
||||
do-udp = "yes";
|
||||
do-tcp = "yes";
|
||||
extended-statistics = true;
|
||||
};
|
||||
remote-control = {
|
||||
control-enable = true;
|
||||
control-interface = "/run/unbound/unbound.ctl";
|
||||
};
|
||||
stub-zone = {
|
||||
name = "home.2rjus.net";
|
||||
|
||||
@@ -1,7 +1,22 @@
|
||||
{ ... }:
|
||||
{ self, lib, ... }:
|
||||
let
|
||||
dnsLib = import ../../lib/dns-zone.nix { inherit lib; };
|
||||
externalHosts = import ./external-hosts.nix;
|
||||
|
||||
# Generate zone from flake hosts + external hosts
|
||||
# Used as initial zone data before first AXFR completes
|
||||
zoneData = dnsLib.generateZone {
|
||||
inherit self externalHosts;
|
||||
serial = self.sourceInfo.lastModified;
|
||||
domain = "home.2rjus.net";
|
||||
};
|
||||
in
|
||||
{
|
||||
sops.secrets.ns_xfer_key = {
|
||||
path = "/etc/nsd/xfer.key";
|
||||
vault.secrets.ns-xfer-key = {
|
||||
secretPath = "shared/dns/xfer-key";
|
||||
extractKey = "key";
|
||||
outputDir = "/etc/nsd/xfer.key";
|
||||
services = [ "nsd" ];
|
||||
};
|
||||
networking.firewall.allowedTCPPorts = [ 8053 ];
|
||||
networking.firewall.allowedUDPPorts = [ 8053 ];
|
||||
@@ -24,7 +39,7 @@
|
||||
"home.2rjus.net" = {
|
||||
allowNotify = [ "10.69.13.5 xferkey" ];
|
||||
requestXFR = [ "AXFR 10.69.13.5@8053 xferkey" ];
|
||||
data = builtins.readFile ./zones-home-2rjus-net.conf;
|
||||
data = zoneData;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
@@ -1,97 +0,0 @@
|
||||
$ORIGIN home.2rjus.net.
|
||||
$TTL 1800
|
||||
@ IN SOA ns1.home.2rjus.net. admin.test.2rjus.net. (
|
||||
2064 ; serial number
|
||||
3600 ; refresh
|
||||
900 ; retry
|
||||
1209600 ; expire
|
||||
120 ; ttl
|
||||
)
|
||||
|
||||
IN NS ns1.home.2rjus.net.
|
||||
IN NS ns2.home.2rjus.net.
|
||||
IN NS ns3.home.2rjus.net.
|
||||
|
||||
; 8_k8s
|
||||
kube-blue1 IN A 10.69.8.150
|
||||
kube-blue2 IN A 10.69.8.151
|
||||
kube-blue3 IN A 10.69.8.152
|
||||
|
||||
kube-blue4 IN A 10.69.8.153
|
||||
rook IN CNAME kube-blue4
|
||||
|
||||
kube-blue5 IN A 10.69.8.154
|
||||
git IN CNAME kube-blue5
|
||||
|
||||
kube-blue6 IN A 10.69.8.155
|
||||
kube-blue7 IN A 10.69.8.156
|
||||
kube-blue8 IN A 10.69.8.157
|
||||
kube-blue9 IN A 10.69.8.158
|
||||
kube-blue10 IN A 10.69.8.159
|
||||
|
||||
; 10
|
||||
gw IN A 10.69.10.1
|
||||
|
||||
; 12_CORE
|
||||
virt-mini1 IN A 10.69.12.11
|
||||
nas IN A 10.69.12.50
|
||||
nzbget-jail IN A 10.69.12.51
|
||||
restic IN A 10.69.12.52
|
||||
radarr-jail IN A 10.69.12.53
|
||||
sonarr-jail IN A 10.69.12.54
|
||||
bazarr IN A 10.69.12.55
|
||||
mpnzb IN A 10.69.12.57
|
||||
pve1 IN A 10.69.12.75
|
||||
inc1 IN A 10.69.12.80
|
||||
inc2 IN A 10.69.12.81
|
||||
media1 IN A 10.69.12.82
|
||||
|
||||
; 13_SVC
|
||||
ns1 IN A 10.69.13.5
|
||||
ns2 IN A 10.69.13.6
|
||||
ns3 IN A 10.69.13.7
|
||||
ns4 IN A 10.69.13.8
|
||||
ha1 IN A 10.69.13.9
|
||||
nixos-test1 IN A 10.69.13.10
|
||||
http-proxy IN A 10.69.13.11
|
||||
ca IN A 10.69.13.12
|
||||
monitoring01 IN A 10.69.13.13
|
||||
jelly01 IN A 10.69.13.14
|
||||
nix-cache01 IN A 10.69.13.15
|
||||
nix-cache IN CNAME nix-cache01
|
||||
actions1 IN CNAME nix-cache01
|
||||
pgdb1 IN A 10.69.13.16
|
||||
nats1 IN A 10.69.13.17
|
||||
auth01 IN A 10.69.13.18
|
||||
vault01 IN A 10.69.13.19
|
||||
|
||||
; http-proxy cnames
|
||||
nzbget IN CNAME http-proxy
|
||||
radarr IN CNAME http-proxy
|
||||
sonarr IN CNAME http-proxy
|
||||
ha IN CNAME http-proxy
|
||||
z2m IN CNAME http-proxy
|
||||
grafana IN CNAME http-proxy
|
||||
prometheus IN CNAME http-proxy
|
||||
alertmanager IN CNAME http-proxy
|
||||
jelly IN CNAME http-proxy
|
||||
auth IN CNAME http-proxy
|
||||
lldap IN CNAME http-proxy
|
||||
pyroscope IN CNAME http-proxy
|
||||
pushgw IN CNAME http-proxy
|
||||
|
||||
ldap IN CNAME auth01
|
||||
|
||||
|
||||
; 22_WLAN
|
||||
unifi-ctrl IN A 10.69.22.5
|
||||
|
||||
; 30
|
||||
gunter IN A 10.69.30.105
|
||||
|
||||
; 31
|
||||
media IN A 10.69.31.50
|
||||
|
||||
; 99_MGMT
|
||||
sw1 IN A 10.69.99.2
|
||||
testing IN A 10.69.33.33
|
||||
@@ -1,5 +1,15 @@
|
||||
{ pkgs, ... }:
|
||||
{
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "postgres";
|
||||
port = 9187;
|
||||
}];
|
||||
|
||||
services.prometheus.exporters.postgres = {
|
||||
enable = true;
|
||||
runAsLocalSuperUser = true; # Use peer auth as postgres user
|
||||
};
|
||||
|
||||
services.postgresql = {
|
||||
enable = true;
|
||||
enableJIT = true;
|
||||
|
||||
@@ -77,14 +77,100 @@ let
|
||||
fi
|
||||
'';
|
||||
};
|
||||
|
||||
bootstrapCertScript = pkgs.writeShellApplication {
|
||||
name = "bootstrap-vault-cert";
|
||||
runtimeInputs = with pkgs; [
|
||||
openbao
|
||||
jq
|
||||
openssl
|
||||
coreutils
|
||||
];
|
||||
text = ''
|
||||
# Bootstrap vault01 with a proper certificate from its own PKI
|
||||
# This solves the chicken-and-egg problem where ACME clients can't trust
|
||||
# vault01's self-signed certificate.
|
||||
|
||||
echo "=== Bootstrapping vault01 certificate ==="
|
||||
|
||||
# Use Unix socket to avoid TLS issues
|
||||
export BAO_ADDR='unix:///run/openbao/openbao.sock'
|
||||
|
||||
# ACME certificate directory
|
||||
CERT_DIR="/var/lib/acme/vault01.home.2rjus.net"
|
||||
|
||||
# Issue certificate for vault01 with vault as SAN
|
||||
echo "Issuing certificate for vault01.home.2rjus.net (with SAN: vault.home.2rjus.net)..."
|
||||
OUTPUT=$(bao write -format=json pki_int/issue/homelab \
|
||||
common_name="vault01.home.2rjus.net" \
|
||||
alt_names="vault.home.2rjus.net" \
|
||||
ttl="720h")
|
||||
|
||||
# Create ACME directory structure
|
||||
echo "Creating ACME certificate directory..."
|
||||
mkdir -p "$CERT_DIR"
|
||||
|
||||
# Extract certificate components to temp files
|
||||
echo "$OUTPUT" | jq -r '.data.certificate' > /tmp/vault01-cert.pem
|
||||
echo "$OUTPUT" | jq -r '.data.private_key' > /tmp/vault01-key.pem
|
||||
echo "$OUTPUT" | jq -r '.data.issuing_ca' > /tmp/vault01-ca.pem
|
||||
|
||||
# Create fullchain (cert + CA)
|
||||
cat /tmp/vault01-cert.pem /tmp/vault01-ca.pem > /tmp/vault01-fullchain.pem
|
||||
|
||||
# Backup old certificates if they exist
|
||||
if [ -f "$CERT_DIR/fullchain.pem" ]; then
|
||||
echo "Backing up old certificate..."
|
||||
cp "$CERT_DIR/fullchain.pem" "$CERT_DIR/fullchain.pem.backup"
|
||||
cp "$CERT_DIR/key.pem" "$CERT_DIR/key.pem.backup"
|
||||
fi
|
||||
|
||||
# Install new certificates
|
||||
echo "Installing new certificate..."
|
||||
mv /tmp/vault01-fullchain.pem "$CERT_DIR/fullchain.pem"
|
||||
mv /tmp/vault01-cert.pem "$CERT_DIR/cert.pem"
|
||||
mv /tmp/vault01-ca.pem "$CERT_DIR/chain.pem"
|
||||
mv /tmp/vault01-key.pem "$CERT_DIR/key.pem"
|
||||
|
||||
# Set proper ownership and permissions (ACME-style)
|
||||
chown -R acme:acme "$CERT_DIR"
|
||||
chmod 750 "$CERT_DIR"
|
||||
chmod 640 "$CERT_DIR"/*.pem
|
||||
|
||||
echo "Certificate installed successfully!"
|
||||
echo ""
|
||||
echo "Certificate details:"
|
||||
openssl x509 -in "$CERT_DIR/cert.pem" -noout -subject -issuer -dates
|
||||
echo ""
|
||||
echo "Subject Alternative Names:"
|
||||
openssl x509 -in "$CERT_DIR/cert.pem" -noout -ext subjectAltName
|
||||
|
||||
echo ""
|
||||
echo "Now restart openbao service:"
|
||||
echo " systemctl restart openbao"
|
||||
echo ""
|
||||
echo "After restart, verify ACME endpoint is accessible:"
|
||||
echo " curl https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory"
|
||||
echo ""
|
||||
echo "Once working, ACME will automatically manage certificate renewals."
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
# Make bootstrap script available as a command
|
||||
environment.systemPackages = [ bootstrapCertScript ];
|
||||
|
||||
services.openbao = {
|
||||
enable = true;
|
||||
|
||||
settings = {
|
||||
ui = true;
|
||||
|
||||
telemetry = {
|
||||
prometheus_retention_time = "60s";
|
||||
disable_hostname = true;
|
||||
};
|
||||
|
||||
storage.file.path = "/var/lib/openbao";
|
||||
listener.default = {
|
||||
type = "tcp";
|
||||
@@ -101,8 +187,8 @@ in
|
||||
|
||||
systemd.services.openbao.serviceConfig = {
|
||||
LoadCredential = [
|
||||
"key.pem:/var/lib/openbao/key.pem"
|
||||
"cert.pem:/var/lib/openbao/cert.pem"
|
||||
"key.pem:/var/lib/acme/vault01.home.2rjus.net/key.pem"
|
||||
"cert.pem:/var/lib/acme/vault01.home.2rjus.net/fullchain.pem"
|
||||
];
|
||||
# TPM2-encrypted unseal key (created manually, see setup instructions)
|
||||
LoadCredentialEncrypted = [
|
||||
@@ -110,5 +196,16 @@ in
|
||||
];
|
||||
# Auto-unseal on service start
|
||||
ExecStartPost = "${unsealScript}/bin/openbao-unseal";
|
||||
# Add openbao user to acme group to read certificates
|
||||
SupplementaryGroups = [ "acme" ];
|
||||
};
|
||||
|
||||
# ACME certificate management
|
||||
# Bootstrapped with bootstrap-vault-cert, now managed by ACME
|
||||
security.acme.certs."vault01.home.2rjus.net" = {
|
||||
server = "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory";
|
||||
listenHTTP = ":80";
|
||||
reloadServices = [ "openbao" ];
|
||||
extraDomainNames = [ "vault.home.2rjus.net" ];
|
||||
};
|
||||
}
|
||||
|
||||
@@ -4,12 +4,15 @@
|
||||
./acme.nix
|
||||
./autoupgrade.nix
|
||||
./monitoring
|
||||
./motd.nix
|
||||
./packages.nix
|
||||
./nix.nix
|
||||
./root-user.nix
|
||||
./root-ca.nix
|
||||
./pki/root-ca.nix
|
||||
./sops.nix
|
||||
./sshd.nix
|
||||
./vault-secrets.nix
|
||||
|
||||
../modules/homelab
|
||||
];
|
||||
}
|
||||
|
||||
@@ -9,4 +9,13 @@
|
||||
"processes"
|
||||
];
|
||||
};
|
||||
|
||||
services.prometheus.exporters.systemd = {
|
||||
enable = true;
|
||||
# Default port: 9558
|
||||
extraFlags = [
|
||||
"--systemd.collector.enable-restart-count"
|
||||
"--systemd.collector.enable-ip-accounting"
|
||||
];
|
||||
};
|
||||
}
|
||||
|
||||
28
system/motd.nix
Normal file
28
system/motd.nix
Normal file
@@ -0,0 +1,28 @@
|
||||
{ config, lib, self, ... }:
|
||||
|
||||
let
|
||||
hostname = config.networking.hostName;
|
||||
domain = config.networking.domain or "";
|
||||
fqdn = if domain != "" then "${hostname}.${domain}" else hostname;
|
||||
|
||||
# Get commit hash (handles both clean and dirty trees)
|
||||
shortRev = self.shortRev or self.dirtyShortRev or "unknown";
|
||||
|
||||
# Format timestamp from lastModified (Unix timestamp)
|
||||
# lastModifiedDate is in format "YYYYMMDDHHMMSS"
|
||||
dateStr = self.sourceInfo.lastModifiedDate or "unknown";
|
||||
formattedDate = if dateStr != "unknown" then
|
||||
"${builtins.substring 0 4 dateStr}-${builtins.substring 4 2 dateStr}-${builtins.substring 6 2 dateStr} ${builtins.substring 8 2 dateStr}:${builtins.substring 10 2 dateStr} UTC"
|
||||
else
|
||||
"unknown";
|
||||
|
||||
banner = ''
|
||||
####################################
|
||||
${fqdn}
|
||||
Commit: ${shortRev} (${formattedDate})
|
||||
####################################
|
||||
'';
|
||||
in
|
||||
{
|
||||
users.motd = lib.mkDefault banner;
|
||||
}
|
||||
@@ -1,8 +1,29 @@
|
||||
{ lib, ... }:
|
||||
{ lib, pkgs, ... }:
|
||||
let
|
||||
nixos-rebuild-test = pkgs.writeShellApplication {
|
||||
name = "nixos-rebuild-test";
|
||||
runtimeInputs = [ pkgs.nixos-rebuild ];
|
||||
text = ''
|
||||
if [ $# -lt 2 ]; then
|
||||
echo "Usage: nixos-rebuild-test <action> <branch>"
|
||||
echo "Example: nixos-rebuild-test boot my-feature-branch"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
action="$1"
|
||||
branch="$2"
|
||||
shift 2
|
||||
|
||||
exec nixos-rebuild "$action" --flake "git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$branch" "$@"
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
environment.systemPackages = [ nixos-rebuild-test ];
|
||||
nix = {
|
||||
gc = {
|
||||
automatic = true;
|
||||
options = "--delete-older-than 14d";
|
||||
};
|
||||
|
||||
optimise = {
|
||||
|
||||
@@ -4,6 +4,7 @@
|
||||
certificateFiles = [
|
||||
"${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt"
|
||||
./root-ca.crt
|
||||
./vault-root-ca.crt
|
||||
];
|
||||
};
|
||||
}
|
||||
14
system/pki/vault-root-ca.crt
Normal file
14
system/pki/vault-root-ca.crt
Normal file
@@ -0,0 +1,14 @@
|
||||
-----BEGIN CERTIFICATE-----
|
||||
MIICIjCCAaigAwIBAgIUQ/Bd/4kNvkPjQjgGLUMynIVzGeAwCgYIKoZIzj0EAwMw
|
||||
QDELMAkGA1UEBhMCTk8xEDAOBgNVBAoTB0hvbWVsYWIxHzAdBgNVBAMTFmhvbWUu
|
||||
MnJqdXMubmV0IFJvb3QgQ0EwHhcNMjYwMjAxMjIxODA5WhcNMzYwMTMwMjIxODM5
|
||||
WjBAMQswCQYDVQQGEwJOTzEQMA4GA1UEChMHSG9tZWxhYjEfMB0GA1UEAxMWaG9t
|
||||
ZS4ycmp1cy5uZXQgUm9vdCBDQTB2MBAGByqGSM49AgEGBSuBBAAiA2IABH8xhIOl
|
||||
Nd1Yb1OFhgIJQZM+OkwoFenOQiKfuQ4oPMxaF+fnXdKc77qPDVRjeDy61oGS38X3
|
||||
CjPOZAzS9kjo7FmVbzdqlYK7ut/OylF+8MJkCT8mFO1xvuzIXhufnyAD4aNjMGEw
|
||||
DgYDVR0PAQH/BAQDAgEGMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFEimBeAg
|
||||
3JVeF4BqdC9hMZ8MYKw2MB8GA1UdIwQYMBaAFEimBeAg3JVeF4BqdC9hMZ8MYKw2
|
||||
MAoGCCqGSM49BAMDA2gAMGUCMQCvhRElHBra/XyT93SKcG6ZzIG+K+DH3J5jm6Xr
|
||||
zaGj2VtdhBRVmEKaUcjU7htgSxcCMA9qHKYFcUH72W7By763M6sy8OOiGQNDSERY
|
||||
VgnNv9rLCvCef1C8G2bYh/sKGZTPGQ==
|
||||
-----END CERTIFICATE-----
|
||||
@@ -1,11 +1,10 @@
|
||||
{ pkgs, config, ... }: {
|
||||
{ pkgs, config, ... }:
|
||||
{
|
||||
programs.zsh.enable = true;
|
||||
sops.secrets.root_password_hash = { };
|
||||
sops.secrets.root_password_hash.neededForUsers = true;
|
||||
|
||||
users.users.root = {
|
||||
shell = pkgs.zsh;
|
||||
hashedPasswordFile = config.sops.secrets.root_password_hash.path;
|
||||
hashedPassword = "$y$j9T$N09APWqKc4//z9BoGyzSb0$3dMUzojSmo3/10nbIfShd6/IpaYoKdI21bfbWER3jl8";
|
||||
openssh.authorizedKeys.keys = [
|
||||
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
|
||||
];
|
||||
|
||||
@@ -8,6 +8,48 @@ let
|
||||
# Import vault-fetch package
|
||||
vault-fetch = pkgs.callPackage ../scripts/vault-fetch { };
|
||||
|
||||
# Helper to create fetch scripts using writeShellApplication
|
||||
mkFetchScript = name: secretCfg: pkgs.writeShellApplication {
|
||||
name = "fetch-${name}";
|
||||
runtimeInputs = [ vault-fetch ];
|
||||
text = ''
|
||||
# Set Vault environment variables
|
||||
export VAULT_ADDR="${cfg.vaultAddress}"
|
||||
export VAULT_SKIP_VERIFY="${if cfg.skipTlsVerify then "1" else "0"}"
|
||||
'' + (if secretCfg.extractKey != null then ''
|
||||
# Fetch to temporary directory, then extract single key
|
||||
TMPDIR=$(mktemp -d)
|
||||
trap 'rm -rf $TMPDIR' EXIT
|
||||
|
||||
vault-fetch \
|
||||
"${secretCfg.secretPath}" \
|
||||
"$TMPDIR" \
|
||||
"${secretCfg.cacheDir}"
|
||||
|
||||
# Extract the specified key and write as a single file
|
||||
if [ ! -f "$TMPDIR/${secretCfg.extractKey}" ]; then
|
||||
echo "ERROR: Key '${secretCfg.extractKey}' not found in secret" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Ensure parent directory exists
|
||||
mkdir -p "$(dirname "${secretCfg.outputDir}")"
|
||||
cp "$TMPDIR/${secretCfg.extractKey}" "${secretCfg.outputDir}"
|
||||
chown ${secretCfg.owner}:${secretCfg.group} "${secretCfg.outputDir}"
|
||||
chmod ${secretCfg.mode} "${secretCfg.outputDir}"
|
||||
'' else ''
|
||||
# Fetch secret as directory of files
|
||||
vault-fetch \
|
||||
"${secretCfg.secretPath}" \
|
||||
"${secretCfg.outputDir}" \
|
||||
"${secretCfg.cacheDir}"
|
||||
|
||||
# Set ownership and permissions
|
||||
chown -R ${secretCfg.owner}:${secretCfg.group} "${secretCfg.outputDir}"
|
||||
chmod ${secretCfg.mode} "${secretCfg.outputDir}"/*
|
||||
'');
|
||||
};
|
||||
|
||||
# Secret configuration type
|
||||
secretType = types.submodule ({ name, config, ... }: {
|
||||
options = {
|
||||
@@ -73,6 +115,16 @@ let
|
||||
'';
|
||||
};
|
||||
|
||||
extractKey = mkOption {
|
||||
type = types.nullOr types.str;
|
||||
default = null;
|
||||
description = ''
|
||||
Extract a single key from the vault secret JSON and write it as a
|
||||
plain file instead of a directory of files. When set, outputDir
|
||||
becomes a file path rather than a directory path.
|
||||
'';
|
||||
};
|
||||
|
||||
services = mkOption {
|
||||
type = types.listOf types.str;
|
||||
default = [];
|
||||
@@ -152,23 +204,7 @@ in
|
||||
RemainAfterExit = true;
|
||||
|
||||
# Fetch the secret
|
||||
ExecStart = pkgs.writeShellScript "fetch-${name}" ''
|
||||
set -euo pipefail
|
||||
|
||||
# Set Vault environment variables
|
||||
export VAULT_ADDR="${cfg.vaultAddress}"
|
||||
export VAULT_SKIP_VERIFY="${if cfg.skipTlsVerify then "1" else "0"}"
|
||||
|
||||
# Fetch secret using vault-fetch
|
||||
${vault-fetch}/bin/vault-fetch \
|
||||
"${secretCfg.secretPath}" \
|
||||
"${secretCfg.outputDir}" \
|
||||
"${secretCfg.cacheDir}"
|
||||
|
||||
# Set ownership and permissions
|
||||
chown -R ${secretCfg.owner}:${secretCfg.group} "${secretCfg.outputDir}"
|
||||
chmod ${secretCfg.mode} "${secretCfg.outputDir}"/*
|
||||
'';
|
||||
ExecStart = lib.getExe (mkFetchScript name secretCfg);
|
||||
|
||||
# Logging
|
||||
StandardOutput = "journal";
|
||||
@@ -216,7 +252,10 @@ in
|
||||
[ "d /run/secrets 0755 root root -" ] ++
|
||||
[ "d /var/lib/vault/cache 0700 root root -" ] ++
|
||||
flatten (mapAttrsToList (name: secretCfg: [
|
||||
"d ${secretCfg.outputDir} 0755 root root -"
|
||||
# When extractKey is set, outputDir is a file path - create parent dir instead
|
||||
(if secretCfg.extractKey != null
|
||||
then "d ${dirOf secretCfg.outputDir} 0755 root root -"
|
||||
else "d ${secretCfg.outputDir} 0755 root root -")
|
||||
"d ${secretCfg.cacheDir} 0700 root root -"
|
||||
]) cfg.secrets);
|
||||
};
|
||||
|
||||
@@ -15,6 +15,7 @@ locals {
|
||||
# "secret/data/services/grafana/*",
|
||||
# "secret/data/shared/smtp/*"
|
||||
# ]
|
||||
# extra_policies = ["some-other-policy"] # Optional: additional policies
|
||||
# }
|
||||
|
||||
# Example: ha1 host
|
||||
@@ -25,17 +26,67 @@ locals {
|
||||
# ]
|
||||
# }
|
||||
|
||||
# TODO: actually use this policy
|
||||
"ha1" = {
|
||||
paths = [
|
||||
"secret/data/hosts/ha1/*",
|
||||
"secret/data/shared/backup/*",
|
||||
]
|
||||
}
|
||||
|
||||
# TODO: actually use this policy
|
||||
"monitoring01" = {
|
||||
paths = [
|
||||
"secret/data/hosts/monitoring01/*",
|
||||
"secret/data/shared/backup/*",
|
||||
"secret/data/shared/nats/*",
|
||||
]
|
||||
extra_policies = ["prometheus-metrics"]
|
||||
}
|
||||
|
||||
# Wave 1: hosts with no service secrets (only need vault.enable for future use)
|
||||
"nats1" = {
|
||||
paths = [
|
||||
"secret/data/hosts/nats1/*",
|
||||
]
|
||||
}
|
||||
|
||||
"jelly01" = {
|
||||
paths = [
|
||||
"secret/data/hosts/jelly01/*",
|
||||
]
|
||||
}
|
||||
|
||||
"pgdb1" = {
|
||||
paths = [
|
||||
"secret/data/hosts/pgdb1/*",
|
||||
]
|
||||
}
|
||||
|
||||
# Wave 3: DNS servers
|
||||
"ns1" = {
|
||||
paths = [
|
||||
"secret/data/hosts/ns1/*",
|
||||
"secret/data/shared/dns/*",
|
||||
]
|
||||
}
|
||||
|
||||
"ns2" = {
|
||||
paths = [
|
||||
"secret/data/hosts/ns2/*",
|
||||
"secret/data/shared/dns/*",
|
||||
]
|
||||
}
|
||||
|
||||
# Wave 4: http-proxy
|
||||
"http-proxy" = {
|
||||
paths = [
|
||||
"secret/data/hosts/http-proxy/*",
|
||||
]
|
||||
}
|
||||
|
||||
# Wave 5: nix-cache01
|
||||
"nix-cache01" = {
|
||||
paths = [
|
||||
"secret/data/hosts/nix-cache01/*",
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -60,9 +111,12 @@ EOT
|
||||
resource "vault_approle_auth_backend_role" "hosts" {
|
||||
for_each = local.host_policies
|
||||
|
||||
backend = vault_auth_backend.approle.path
|
||||
role_name = each.key
|
||||
token_policies = ["${each.key}-policy"]
|
||||
backend = vault_auth_backend.approle.path
|
||||
role_name = each.key
|
||||
token_policies = concat(
|
||||
["${each.key}-policy"],
|
||||
lookup(each.value, "extra_policies", [])
|
||||
)
|
||||
|
||||
# Token configuration
|
||||
token_ttl = 3600 # 1 hour
|
||||
|
||||
@@ -62,6 +62,13 @@ resource "vault_mount" "pki_int" {
|
||||
description = "Intermediate CA"
|
||||
default_lease_ttl_seconds = 157680000 # 5 years
|
||||
max_lease_ttl_seconds = 157680000 # 5 years
|
||||
|
||||
# Required for ACME support - allow ACME-specific response headers
|
||||
allowed_response_headers = [
|
||||
"Replay-Nonce",
|
||||
"Link",
|
||||
"Location"
|
||||
]
|
||||
}
|
||||
|
||||
resource "vault_pki_secret_backend_intermediate_cert_request" "intermediate" {
|
||||
@@ -139,6 +146,33 @@ resource "vault_pki_secret_backend_config_urls" "config_urls" {
|
||||
]
|
||||
}
|
||||
|
||||
# Configure cluster path (required for ACME)
|
||||
resource "vault_pki_secret_backend_config_cluster" "cluster" {
|
||||
backend = vault_mount.pki_int.path
|
||||
path = "${var.vault_address}/v1/${vault_mount.pki_int.path}"
|
||||
aia_path = "${var.vault_address}/v1/${vault_mount.pki_int.path}"
|
||||
}
|
||||
|
||||
# Enable ACME support
|
||||
resource "vault_generic_endpoint" "acme_config" {
|
||||
depends_on = [
|
||||
vault_pki_secret_backend_config_cluster.cluster,
|
||||
vault_pki_secret_backend_role.homelab
|
||||
]
|
||||
|
||||
path = "${vault_mount.pki_int.path}/config/acme"
|
||||
ignore_absent_fields = true
|
||||
disable_read = true
|
||||
disable_delete = true
|
||||
|
||||
data_json = jsonencode({
|
||||
enabled = true
|
||||
allowed_issuers = ["*"]
|
||||
allowed_roles = ["*"]
|
||||
default_directory_policy = "sign-verbatim"
|
||||
})
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Direct Certificate Issuance (Non-ACME)
|
||||
# ============================================================================
|
||||
|
||||
10
terraform/vault/policies.tf
Normal file
10
terraform/vault/policies.tf
Normal file
@@ -0,0 +1,10 @@
|
||||
# Generic policies for services (not host-specific)
|
||||
|
||||
resource "vault_policy" "prometheus_metrics" {
|
||||
name = "prometheus-metrics"
|
||||
policy = <<EOT
|
||||
path "sys/metrics" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
EOT
|
||||
}
|
||||
@@ -35,22 +35,63 @@ locals {
|
||||
# }
|
||||
# }
|
||||
|
||||
# TODO: actually use the secret
|
||||
"hosts/monitoring01/grafana-admin" = {
|
||||
auto_generate = true
|
||||
password_length = 32
|
||||
}
|
||||
|
||||
# TODO: actually use the secret
|
||||
"hosts/ha1/mqtt-password" = {
|
||||
auto_generate = true
|
||||
password_length = 24
|
||||
}
|
||||
|
||||
# TODO: Remove after testing
|
||||
"hosts/vaulttest01/test-service" = {
|
||||
auto_generate = true
|
||||
password_length = 32
|
||||
}
|
||||
|
||||
# Shared backup password (auto-generated, add alongside existing restic key)
|
||||
"shared/backup/password" = {
|
||||
auto_generate = true
|
||||
password_length = 32
|
||||
}
|
||||
|
||||
# NATS NKey for alerttonotify
|
||||
"shared/nats/nkey" = {
|
||||
auto_generate = false
|
||||
data = { nkey = var.nats_nkey }
|
||||
}
|
||||
|
||||
# PVE exporter config for monitoring01
|
||||
"hosts/monitoring01/pve-exporter" = {
|
||||
auto_generate = false
|
||||
data = { config = var.pve_exporter_config }
|
||||
}
|
||||
|
||||
# DNS zone transfer key
|
||||
"shared/dns/xfer-key" = {
|
||||
auto_generate = false
|
||||
data = { key = var.ns_xfer_key }
|
||||
}
|
||||
|
||||
# WireGuard private key for http-proxy
|
||||
"hosts/http-proxy/wireguard" = {
|
||||
auto_generate = false
|
||||
data = { private_key = var.wireguard_private_key }
|
||||
}
|
||||
|
||||
# Nix cache signing key
|
||||
"hosts/nix-cache01/cache-secret" = {
|
||||
auto_generate = false
|
||||
data = { key = var.cache_signing_key }
|
||||
}
|
||||
|
||||
# Gitea Actions runner token
|
||||
"hosts/nix-cache01/actions-token" = {
|
||||
auto_generate = false
|
||||
data = { token = var.actions_token_1 }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -16,11 +16,39 @@ variable "vault_skip_tls_verify" {
|
||||
default = true
|
||||
}
|
||||
|
||||
# Example variables for manual secrets
|
||||
# Uncomment and add to terraform.tfvars as needed
|
||||
variable "nats_nkey" {
|
||||
description = "NATS NKey for alerttonotify"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "pve_exporter_config" {
|
||||
description = "PVE exporter YAML configuration"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "ns_xfer_key" {
|
||||
description = "DNS zone transfer TSIG key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "wireguard_private_key" {
|
||||
description = "WireGuard private key for http-proxy"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "cache_signing_key" {
|
||||
description = "Nix binary cache signing key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "actions_token_1" {
|
||||
description = "Gitea Actions runner token"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# variable "smtp_password" {
|
||||
# description = "SMTP password for notifications"
|
||||
# type = string
|
||||
# sensitive = true
|
||||
# }
|
||||
|
||||
@@ -50,8 +50,8 @@ locals {
|
||||
cpu_cores = 2
|
||||
memory = 2048
|
||||
disk_size = "20G"
|
||||
flake_branch = "vault-bootstrap-integration"
|
||||
vault_wrapped_token = "s.HwNenAYvXBsPs8uICh4CbE11"
|
||||
flake_branch = "pki-migration"
|
||||
vault_wrapped_token = "s.UCpQCOp7cOKDdtGGBvfRWwAt"
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user