Compare commits
81 Commits
fe80ec3576
...
deploy-tes
| Author | SHA1 | Date | |
|---|---|---|---|
|
38348c5980
|
|||
|
370cf2b03a
|
|||
|
7bc465b414
|
|||
|
8d7bc50108
|
|||
|
03e70ac094
|
|||
|
3b32c9479f
|
|||
|
b0d35f9a99
|
|||
|
26ca6817f0
|
|||
|
b03a9b3b64
|
|||
|
f805b9f629
|
|||
|
f3adf7e77f
|
|||
|
f6eca9decc
|
|||
| 6e93b8eae3 | |||
|
c214f8543c
|
|||
|
7933127d77
|
|||
|
13c3897e86
|
|||
|
0643f23281
|
|||
|
ad8570f8db
|
|||
| 2f195d26d3 | |||
|
a926d34287
|
|||
|
be2421746e
|
|||
|
12bf0683f5
|
|||
|
e8a43c6715
|
|||
|
eef52bb8c5
|
|||
|
c6cdbc6799
|
|||
|
4d724329a6
|
|||
|
881e70df27
|
|||
|
b9a269d280
|
|||
|
fcf1a66103
|
|||
|
2034004280
|
|||
| af43f88394 | |||
|
a834497fe8
|
|||
| d3de2a1511 | |||
|
97ff774d3f
|
|||
|
f2c30cc24f
|
|||
|
7e80d2e0bc
|
|||
|
1f5b7b13e2
|
|||
|
c53e36c3f3
|
|||
|
04a252b857
|
|||
|
5d26f52e0d
|
|||
|
506a692548
|
|||
|
fa8f4f0784
|
|||
|
025570dea1
|
|||
|
15c00393f1
|
|||
|
787c14c7a6
|
|||
|
eee3dde04f
|
|||
| 682b07b977 | |||
| 70661ac3d9 | |||
|
506e93a5e2
|
|||
|
b6c41aa910
|
|||
| aa6e00a327 | |||
|
258e350b89
|
|||
|
eba195c192
|
|||
|
bbb22e588e
|
|||
|
879e7aba60
|
|||
|
39a4ea98ab
|
|||
| 1d90dc2181 | |||
|
e9857afc11
|
|||
| 88e9036cb4 | |||
|
59e1962d75
|
|||
|
3dc4422ba0
|
|||
|
f0963624bc
|
|||
| 7b46f94e48 | |||
|
32968147b5
|
|||
|
c515a6b4e1
|
|||
|
4d8b94ce83
|
|||
|
8b0a4ea33a
|
|||
| 5be1f43c24 | |||
|
b322b1156b
|
|||
|
3cccfc0487
|
|||
|
41d4226812
|
|||
|
351fb6f720
|
|||
|
7d92c55d37
|
|||
|
6d117d68ca
|
|||
| a46fbdaa70 | |||
|
2c9d86eaf2
|
|||
|
ccb1c3fe2e
|
|||
|
0700033c0a
|
|||
|
4d33018285
|
|||
|
678fd3d6de
|
|||
|
9d74aa5c04
|
250
.claude/skills/observability/SKILL.md
Normal file
250
.claude/skills/observability/SKILL.md
Normal file
@@ -0,0 +1,250 @@
|
|||||||
|
---
|
||||||
|
name: observability
|
||||||
|
description: Reference guide for exploring Prometheus metrics and Loki logs when troubleshooting homelab issues. Use when investigating system state, deployments, service health, or searching logs.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Observability Troubleshooting Guide
|
||||||
|
|
||||||
|
Quick reference for exploring Prometheus metrics and Loki logs to troubleshoot homelab issues.
|
||||||
|
|
||||||
|
## Available Tools
|
||||||
|
|
||||||
|
Use the `lab-monitoring` MCP server tools:
|
||||||
|
|
||||||
|
**Metrics:**
|
||||||
|
- `search_metrics` - Find metrics by name substring
|
||||||
|
- `get_metric_metadata` - Get type/help for a specific metric
|
||||||
|
- `query` - Execute PromQL queries
|
||||||
|
- `list_targets` - Check scrape target health
|
||||||
|
- `list_alerts` / `get_alert` - View active alerts
|
||||||
|
|
||||||
|
**Logs:**
|
||||||
|
- `query_logs` - Execute LogQL queries against Loki
|
||||||
|
- `list_labels` - List available log labels
|
||||||
|
- `list_label_values` - List values for a specific label
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Logs Reference
|
||||||
|
|
||||||
|
### Label Reference
|
||||||
|
|
||||||
|
Available labels for log queries:
|
||||||
|
- `host` - Hostname (e.g., `ns1`, `monitoring01`, `ha1`)
|
||||||
|
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `nixos-upgrade.service`)
|
||||||
|
- `job` - Either `systemd-journal` (most logs) or `varlog` (file-based logs)
|
||||||
|
- `filename` - For `varlog` job, the log file path
|
||||||
|
- `hostname` - Alternative to `host` for some streams
|
||||||
|
|
||||||
|
### Log Format
|
||||||
|
|
||||||
|
Journal logs are JSON-formatted. Key fields:
|
||||||
|
- `MESSAGE` - The actual log message
|
||||||
|
- `PRIORITY` - Syslog priority (6=info, 4=warning, 3=error)
|
||||||
|
- `SYSLOG_IDENTIFIER` - Program name
|
||||||
|
|
||||||
|
### Basic LogQL Queries
|
||||||
|
|
||||||
|
**Logs from a specific service on a host:**
|
||||||
|
```logql
|
||||||
|
{host="ns1", systemd_unit="nsd.service"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**All logs from a host:**
|
||||||
|
```logql
|
||||||
|
{host="monitoring01"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Logs from a service across all hosts:**
|
||||||
|
```logql
|
||||||
|
{systemd_unit="nixos-upgrade.service"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Substring matching (case-sensitive):**
|
||||||
|
```logql
|
||||||
|
{host="ha1"} |= "error"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Exclude pattern:**
|
||||||
|
```logql
|
||||||
|
{host="ns1"} != "routine"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Regex matching:**
|
||||||
|
```logql
|
||||||
|
{systemd_unit="prometheus.service"} |~ "scrape.*failed"
|
||||||
|
```
|
||||||
|
|
||||||
|
**File-based logs (caddy access logs, etc):**
|
||||||
|
```logql
|
||||||
|
{job="varlog", hostname="nix-cache01"}
|
||||||
|
{job="varlog", filename="/var/log/caddy/nix-cache.log"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Time Ranges
|
||||||
|
|
||||||
|
Default lookback is 1 hour. Use `start` parameter for older logs:
|
||||||
|
- `start: "1h"` - Last hour (default)
|
||||||
|
- `start: "24h"` - Last 24 hours
|
||||||
|
- `start: "168h"` - Last 7 days
|
||||||
|
|
||||||
|
### Common Services
|
||||||
|
|
||||||
|
Useful systemd units for troubleshooting:
|
||||||
|
- `nixos-upgrade.service` - Daily auto-upgrade logs
|
||||||
|
- `nsd.service` - DNS server (ns1/ns2)
|
||||||
|
- `prometheus.service` - Metrics collection
|
||||||
|
- `loki.service` - Log aggregation
|
||||||
|
- `caddy.service` - Reverse proxy
|
||||||
|
- `home-assistant.service` - Home automation
|
||||||
|
- `step-ca.service` - Internal CA
|
||||||
|
- `openbao.service` - Secrets management
|
||||||
|
- `sshd.service` - SSH daemon
|
||||||
|
- `nix-gc.service` - Nix garbage collection
|
||||||
|
|
||||||
|
### Extracting JSON Fields
|
||||||
|
|
||||||
|
Parse JSON and filter on fields:
|
||||||
|
```logql
|
||||||
|
{systemd_unit="prometheus.service"} | json | PRIORITY="3"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Metrics Reference
|
||||||
|
|
||||||
|
### Deployment & Version Status
|
||||||
|
|
||||||
|
Check which NixOS revision hosts are running:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
nixos_flake_info
|
||||||
|
```
|
||||||
|
|
||||||
|
Labels:
|
||||||
|
- `current_rev` - Git commit of the running NixOS configuration
|
||||||
|
- `remote_rev` - Latest commit on the remote repository
|
||||||
|
- `nixpkgs_rev` - Nixpkgs revision used to build the system
|
||||||
|
- `nixos_version` - Full NixOS version string (e.g., `25.11.20260203.e576e3c`)
|
||||||
|
|
||||||
|
Check if hosts are behind on updates:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
nixos_flake_revision_behind == 1
|
||||||
|
```
|
||||||
|
|
||||||
|
View flake input versions:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
nixos_flake_input_info
|
||||||
|
```
|
||||||
|
|
||||||
|
Labels: `input` (name), `rev` (revision), `type` (git/github)
|
||||||
|
|
||||||
|
Check flake input age:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
nixos_flake_input_age_seconds / 86400
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns age in days for each flake input.
|
||||||
|
|
||||||
|
### System Health
|
||||||
|
|
||||||
|
Basic host availability:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
up{job="node-exporter"}
|
||||||
|
```
|
||||||
|
|
||||||
|
CPU usage by host:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
|
||||||
|
```
|
||||||
|
|
||||||
|
Memory usage:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)
|
||||||
|
```
|
||||||
|
|
||||||
|
Disk space (root filesystem):
|
||||||
|
|
||||||
|
```promql
|
||||||
|
node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Service-Specific Metrics
|
||||||
|
|
||||||
|
Common job names:
|
||||||
|
- `node-exporter` - System metrics (all hosts)
|
||||||
|
- `nixos-exporter` - NixOS version/generation metrics
|
||||||
|
- `caddy` - Reverse proxy metrics
|
||||||
|
- `prometheus` / `loki` / `grafana` - Monitoring stack
|
||||||
|
- `home-assistant` - Home automation
|
||||||
|
- `step-ca` - Internal CA
|
||||||
|
|
||||||
|
### Instance Label Format
|
||||||
|
|
||||||
|
The `instance` label uses FQDN format:
|
||||||
|
|
||||||
|
```
|
||||||
|
<hostname>.home.2rjus.net:<port>
|
||||||
|
```
|
||||||
|
|
||||||
|
Example queries filtering by host:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
up{instance=~"monitoring01.*"}
|
||||||
|
node_load1{instance=~"ns1.*"}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting Workflows
|
||||||
|
|
||||||
|
### Check Deployment Status Across Fleet
|
||||||
|
|
||||||
|
1. Query `nixos_flake_info` to see all hosts' current revisions
|
||||||
|
2. Check `nixos_flake_revision_behind` for hosts needing updates
|
||||||
|
3. Look at upgrade logs: `{systemd_unit="nixos-upgrade.service"}` with `start: "24h"`
|
||||||
|
|
||||||
|
### Investigate Service Issues
|
||||||
|
|
||||||
|
1. Check `up{job="<service>"}` for scrape failures
|
||||||
|
2. Use `list_targets` to see target health details
|
||||||
|
3. Query service logs: `{host="<host>", systemd_unit="<service>.service"}`
|
||||||
|
4. Search for errors: `{host="<host>"} |= "error"`
|
||||||
|
5. Check `list_alerts` for related alerts
|
||||||
|
|
||||||
|
### After Deploying Changes
|
||||||
|
|
||||||
|
1. Verify `current_rev` updated in `nixos_flake_info`
|
||||||
|
2. Confirm `nixos_flake_revision_behind == 0`
|
||||||
|
3. Check service logs for startup issues
|
||||||
|
4. Check service metrics are being scraped
|
||||||
|
|
||||||
|
### Debug SSH/Access Issues
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{host="<host>", systemd_unit="sshd.service"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Recent Upgrades
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{systemd_unit="nixos-upgrade.service"}
|
||||||
|
```
|
||||||
|
|
||||||
|
With `start: "24h"` to see last 24 hours of upgrades across all hosts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Default scrape interval is 15s for most metrics targets
|
||||||
|
- Default log lookback is 1h - use `start` parameter for older logs
|
||||||
|
- Use `rate()` for counter metrics, direct queries for gauges
|
||||||
|
- The `instance` label includes the port, use regex matching (`=~`) for hostname-only filters
|
||||||
|
- Log `MESSAGE` field contains the actual log content in JSON format
|
||||||
89
.claude/skills/quick-plan/SKILL.md
Normal file
89
.claude/skills/quick-plan/SKILL.md
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
---
|
||||||
|
name: quick-plan
|
||||||
|
description: Create a planning document for a future homelab project. Use when the user wants to document ideas for future work without implementing immediately.
|
||||||
|
argument-hint: [topic or feature to plan]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Quick Plan Generator
|
||||||
|
|
||||||
|
Create a planning document for a future homelab infrastructure project. Plans are for documenting ideas and approaches that will be implemented later, not immediately.
|
||||||
|
|
||||||
|
## Input
|
||||||
|
|
||||||
|
The user provides: $ARGUMENTS
|
||||||
|
|
||||||
|
## Process
|
||||||
|
|
||||||
|
1. **Understand the topic**: Research the codebase to understand:
|
||||||
|
- Current state of related systems
|
||||||
|
- Existing patterns and conventions
|
||||||
|
- Relevant NixOS options or packages
|
||||||
|
- Any constraints or dependencies
|
||||||
|
|
||||||
|
2. **Evaluate options**: If there are multiple approaches, research and compare them with pros/cons.
|
||||||
|
|
||||||
|
3. **Draft the plan**: Create a markdown document following the structure below.
|
||||||
|
|
||||||
|
4. **Save the plan**: Write to `docs/plans/<topic-slug>.md` using a kebab-case filename derived from the topic.
|
||||||
|
|
||||||
|
## Plan Structure
|
||||||
|
|
||||||
|
Use these sections as appropriate (not all plans need every section):
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Title
|
||||||
|
|
||||||
|
## Overview/Goal
|
||||||
|
Brief description of what this plan addresses and why.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
What exists today that's relevant to this plan.
|
||||||
|
|
||||||
|
## Options Evaluated (if multiple approaches)
|
||||||
|
For each option:
|
||||||
|
- **Option Name**
|
||||||
|
- **Pros:** bullet points
|
||||||
|
- **Cons:** bullet points
|
||||||
|
- **Verdict:** brief assessment
|
||||||
|
|
||||||
|
Or use a comparison table for structured evaluation.
|
||||||
|
|
||||||
|
## Recommendation/Decision
|
||||||
|
What approach is recommended and why. Include rationale.
|
||||||
|
|
||||||
|
## Implementation Steps
|
||||||
|
Numbered phases or steps. Be specific but not overly detailed.
|
||||||
|
Can use sub-sections for major phases.
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
Things still to be determined. Use checkbox format:
|
||||||
|
- [ ] Question 1?
|
||||||
|
- [ ] Question 2?
|
||||||
|
|
||||||
|
## Notes (optional)
|
||||||
|
Additional context, caveats, or references.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Style Guidelines
|
||||||
|
|
||||||
|
- **Concise**: Use bullet points, avoid verbose paragraphs
|
||||||
|
- **Technical but accessible**: Include NixOS config snippets when relevant
|
||||||
|
- **Future-oriented**: These are plans, not specifications
|
||||||
|
- **Acknowledge uncertainty**: Use "Open Questions" for unresolved decisions
|
||||||
|
- **Reference existing patterns**: Mention how this fits with existing infrastructure
|
||||||
|
- **Tables for comparisons**: Use markdown tables when comparing options
|
||||||
|
- **Practical focus**: Emphasize what needs to happen, not theory
|
||||||
|
|
||||||
|
## Examples of Good Plans
|
||||||
|
|
||||||
|
Reference these existing plans for style guidance:
|
||||||
|
- `docs/plans/auth-system-replacement.md` - Good option evaluation with table
|
||||||
|
- `docs/plans/truenas-migration.md` - Good decision documentation with rationale
|
||||||
|
- `docs/plans/remote-access.md` - Good multi-option comparison
|
||||||
|
- `docs/plans/prometheus-scrape-target-labels.md` - Good implementation detail level
|
||||||
|
|
||||||
|
## After Creating the Plan
|
||||||
|
|
||||||
|
1. Tell the user the plan was saved to `docs/plans/<filename>.md`
|
||||||
|
2. Summarize the key points
|
||||||
|
3. Ask if they want any adjustments before committing
|
||||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1,5 +1,6 @@
|
|||||||
.direnv/
|
.direnv/
|
||||||
result
|
result
|
||||||
|
result-*
|
||||||
|
|
||||||
# Terraform/OpenTofu
|
# Terraform/OpenTofu
|
||||||
terraform/.terraform/
|
terraform/.terraform/
|
||||||
|
|||||||
14
.mcp.json
14
.mcp.json
@@ -19,8 +19,20 @@
|
|||||||
"args": ["run", "git+https://git.t-juice.club/torjus/labmcp#lab-monitoring", "--", "serve", "--enable-silences"],
|
"args": ["run", "git+https://git.t-juice.club/torjus/labmcp#lab-monitoring", "--", "serve", "--enable-silences"],
|
||||||
"env": {
|
"env": {
|
||||||
"PROMETHEUS_URL": "https://prometheus.home.2rjus.net",
|
"PROMETHEUS_URL": "https://prometheus.home.2rjus.net",
|
||||||
"ALERTMANAGER_URL": "https://alertmanager.home.2rjus.net"
|
"ALERTMANAGER_URL": "https://alertmanager.home.2rjus.net",
|
||||||
|
"LOKI_URL": "http://monitoring01.home.2rjus.net:3100"
|
||||||
}
|
}
|
||||||
|
},
|
||||||
|
"homelab-deploy": {
|
||||||
|
"command": "nix",
|
||||||
|
"args": [
|
||||||
|
"run",
|
||||||
|
"git+https://git.t-juice.club/torjus/homelab-deploy",
|
||||||
|
"--",
|
||||||
|
"mcp",
|
||||||
|
"--nats-url", "nats://nats1.home.2rjus.net:4222",
|
||||||
|
"--nkey-file", "/home/torjus/.config/homelab-deploy/test-deployer.nkey"
|
||||||
|
]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -10,7 +10,6 @@ keys:
|
|||||||
- &server_nix-cache01 age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
- &server_nix-cache01 age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
||||||
- &server_pgdb1 age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
- &server_pgdb1 age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
||||||
- &server_nats1 age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
- &server_nats1 age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
||||||
- &server_auth01 age16prza00sqzuhwwcyakj6z4hvwkruwkqpmmrsn94a5ucgpkelncdq2ldctk
|
|
||||||
creation_rules:
|
creation_rules:
|
||||||
- path_regex: secrets/[^/]+\.(yaml|json|env|ini)
|
- path_regex: secrets/[^/]+\.(yaml|json|env|ini)
|
||||||
key_groups:
|
key_groups:
|
||||||
@@ -26,7 +25,6 @@ creation_rules:
|
|||||||
- *server_nix-cache01
|
- *server_nix-cache01
|
||||||
- *server_pgdb1
|
- *server_pgdb1
|
||||||
- *server_nats1
|
- *server_nats1
|
||||||
- *server_auth01
|
|
||||||
- path_regex: secrets/ca/[^/]+\.(yaml|json|env|ini|)
|
- path_regex: secrets/ca/[^/]+\.(yaml|json|env|ini|)
|
||||||
key_groups:
|
key_groups:
|
||||||
- age:
|
- age:
|
||||||
@@ -52,8 +50,3 @@ creation_rules:
|
|||||||
- age:
|
- age:
|
||||||
- *admin_torjus
|
- *admin_torjus
|
||||||
- *server_http-proxy
|
- *server_http-proxy
|
||||||
- path_regex: secrets/auth01/[^/]+\.(yaml|json|env|ini|)
|
|
||||||
key_groups:
|
|
||||||
- age:
|
|
||||||
- *admin_torjus
|
|
||||||
- *server_auth01
|
|
||||||
|
|||||||
177
CLAUDE.md
177
CLAUDE.md
@@ -35,6 +35,21 @@ nix build .#create-host
|
|||||||
|
|
||||||
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
||||||
|
|
||||||
|
### Testing Feature Branches on Hosts
|
||||||
|
|
||||||
|
All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On the target host, test a feature branch
|
||||||
|
nixos-rebuild-test boot <branch-name>
|
||||||
|
nixos-rebuild-test switch <branch-name>
|
||||||
|
|
||||||
|
# Additional arguments are passed through to nixos-rebuild
|
||||||
|
nixos-rebuild-test boot my-feature --show-trace
|
||||||
|
```
|
||||||
|
|
||||||
|
When working on a feature branch that requires testing on a live host, suggest using this command instead of the full flake URL syntax.
|
||||||
|
|
||||||
### Flake Management
|
### Flake Management
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -52,12 +67,19 @@ nix develop
|
|||||||
|
|
||||||
### Secrets Management
|
### Secrets Management
|
||||||
|
|
||||||
Secrets are handled by sops. Do not edit any `.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the
|
||||||
|
`vault.secrets` option defined in `system/vault-secrets.nix` to fetch secrets at boot.
|
||||||
|
Terraform manages the secrets and AppRole policies in `terraform/vault/`.
|
||||||
|
|
||||||
|
Legacy sops-nix is still present but only actively used by the `ca` host. Do not edit any
|
||||||
|
`.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
||||||
|
|
||||||
### Git Workflow
|
### Git Workflow
|
||||||
|
|
||||||
**Important:** Never commit directly to `master` unless the user explicitly asks for it. Always create a feature branch for changes.
|
**Important:** Never commit directly to `master` unless the user explicitly asks for it. Always create a feature branch for changes.
|
||||||
|
|
||||||
|
**Important:** Never amend commits to `master` unless the user explicitly asks for it. Amending rewrites history and causes issues for deployed configurations.
|
||||||
|
|
||||||
When starting a new plan or task, the first step should typically be to create and checkout a new branch with an appropriate name (e.g., `git checkout -b dns-automation` or `git checkout -b fix-nginx-config`).
|
When starting a new plan or task, the first step should typically be to create and checkout a new branch with an appropriate name (e.g., `git checkout -b dns-automation` or `git checkout -b fix-nginx-config`).
|
||||||
|
|
||||||
### Plan Management
|
### Plan Management
|
||||||
@@ -110,6 +132,113 @@ Two MCP servers are available for searching NixOS options and packages:
|
|||||||
|
|
||||||
This ensures documentation matches the exact nixpkgs version (currently NixOS 25.11) used by this flake.
|
This ensures documentation matches the exact nixpkgs version (currently NixOS 25.11) used by this flake.
|
||||||
|
|
||||||
|
### Lab Monitoring Log Queries
|
||||||
|
|
||||||
|
The **lab-monitoring** MCP server can query logs from Loki. All hosts ship systemd journal logs via Promtail.
|
||||||
|
|
||||||
|
**Loki Label Reference:**
|
||||||
|
|
||||||
|
- `host` - Hostname (e.g., `ns1`, `ns2`, `monitoring01`, `ha1`). Use this label, not `hostname`.
|
||||||
|
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `prometheus.service`, `nixos-upgrade.service`)
|
||||||
|
- `job` - Either `systemd-journal` (most logs) or `varlog` (file-based logs like caddy access logs)
|
||||||
|
- `filename` - For `varlog` job, the log file path (e.g., `/var/log/caddy/nix-cache.log`)
|
||||||
|
|
||||||
|
Journal log entries are JSON-formatted with the actual log message in the `MESSAGE` field. Other useful fields include `PRIORITY` and `SYSLOG_IDENTIFIER`.
|
||||||
|
|
||||||
|
**Example LogQL queries:**
|
||||||
|
```
|
||||||
|
# Logs from a specific service on a host
|
||||||
|
{host="ns2", systemd_unit="nsd.service"}
|
||||||
|
|
||||||
|
# Substring match on log content
|
||||||
|
{host="ns1", systemd_unit="nsd.service"} |= "error"
|
||||||
|
|
||||||
|
# File-based logs (e.g., caddy access logs)
|
||||||
|
{job="varlog", hostname="nix-cache01"}
|
||||||
|
```
|
||||||
|
|
||||||
|
Default lookback is 1 hour. Use the `start` parameter with relative durations (e.g., `24h`, `168h`) for older logs.
|
||||||
|
|
||||||
|
### Lab Monitoring Prometheus Queries
|
||||||
|
|
||||||
|
The **lab-monitoring** MCP server can query Prometheus metrics via PromQL. The `instance` label uses the FQDN format `<host>.home.2rjus.net:<port>`.
|
||||||
|
|
||||||
|
**Prometheus Job Names:**
|
||||||
|
|
||||||
|
- `node-exporter` - System metrics from all hosts (CPU, memory, disk, network)
|
||||||
|
- `caddy` - Reverse proxy metrics (http-proxy)
|
||||||
|
- `nix-cache_caddy` - Nix binary cache metrics
|
||||||
|
- `home-assistant` - Home automation metrics
|
||||||
|
- `jellyfin` - Media server metrics
|
||||||
|
- `loki` / `prometheus` / `grafana` - Monitoring stack self-metrics
|
||||||
|
- `step-ca` - Internal CA metrics
|
||||||
|
- `pve-exporter` - Proxmox hypervisor metrics
|
||||||
|
- `smartctl` - Disk SMART health (gunter)
|
||||||
|
- `wireguard` - VPN metrics (http-proxy)
|
||||||
|
- `pushgateway` - Push-based metrics (e.g., backup results)
|
||||||
|
- `restic_rest` - Backup server metrics
|
||||||
|
- `labmon` / `ghettoptt` / `alertmanager` - Other service metrics
|
||||||
|
|
||||||
|
**Example PromQL queries:**
|
||||||
|
```
|
||||||
|
# Check all targets are up
|
||||||
|
up
|
||||||
|
|
||||||
|
# CPU usage for a specific host
|
||||||
|
rate(node_cpu_seconds_total{instance=~"ns1.*", mode!="idle"}[5m])
|
||||||
|
|
||||||
|
# Memory usage across all hosts
|
||||||
|
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
|
||||||
|
|
||||||
|
# Disk space
|
||||||
|
node_filesystem_avail_bytes{mountpoint="/"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deploying to Test Hosts
|
||||||
|
|
||||||
|
The **homelab-deploy** MCP server enables remote deployments to test-tier hosts via NATS messaging.
|
||||||
|
|
||||||
|
**Available Tools:**
|
||||||
|
|
||||||
|
- `deploy` - Deploy NixOS configuration to test-tier hosts
|
||||||
|
- `list_hosts` - List available deployment targets
|
||||||
|
|
||||||
|
**Deploy Parameters:**
|
||||||
|
|
||||||
|
- `hostname` - Target a specific host (e.g., `vaulttest01`)
|
||||||
|
- `role` - Deploy to all hosts with a specific role (e.g., `vault`)
|
||||||
|
- `all` - Deploy to all test-tier hosts
|
||||||
|
- `action` - nixos-rebuild action: `switch` (default), `boot`, `test`, `dry-activate`
|
||||||
|
- `branch` - Git branch or commit to deploy (default: `master`)
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
|
||||||
|
```
|
||||||
|
# List available hosts
|
||||||
|
list_hosts()
|
||||||
|
|
||||||
|
# Deploy to a specific host
|
||||||
|
deploy(hostname="vaulttest01", action="switch")
|
||||||
|
|
||||||
|
# Dry-run deployment
|
||||||
|
deploy(hostname="vaulttest01", action="dry-activate")
|
||||||
|
|
||||||
|
# Deploy to all hosts with a role
|
||||||
|
deploy(role="vault", action="switch")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** Only test-tier hosts with `homelab.deploy.enable = true` and the listener service running will respond to deployments.
|
||||||
|
|
||||||
|
**Verifying Deployments:**
|
||||||
|
|
||||||
|
After deploying, use the `nixos_flake_info` metric from nixos-exporter to verify the host is running the expected revision:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
nixos_flake_info{instance=~"vaulttest01.*"}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `current_rev` label contains the git commit hash of the deployed flake configuration.
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
### Directory Structure
|
### Directory Structure
|
||||||
@@ -119,7 +248,7 @@ This ensures documentation matches the exact nixpkgs version (currently NixOS 25
|
|||||||
- `default.nix` - Entry point, imports configuration.nix and services
|
- `default.nix` - Entry point, imports configuration.nix and services
|
||||||
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
||||||
- `/system/` - Shared system-level configurations applied to ALL hosts
|
- `/system/` - Shared system-level configurations applied to ALL hosts
|
||||||
- Core modules: nix.nix, sshd.nix, sops.nix, acme.nix, autoupgrade.nix
|
- Core modules: nix.nix, sshd.nix, sops.nix (legacy), vault-secrets.nix, acme.nix, autoupgrade.nix
|
||||||
- Monitoring: node-exporter and promtail on every host
|
- Monitoring: node-exporter and promtail on every host
|
||||||
- `/modules/` - Custom NixOS modules
|
- `/modules/` - Custom NixOS modules
|
||||||
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
|
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
|
||||||
@@ -131,13 +260,13 @@ This ensures documentation matches the exact nixpkgs version (currently NixOS 25
|
|||||||
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
||||||
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
||||||
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
||||||
- `/secrets/` - SOPS-encrypted secrets with age encryption
|
- `/secrets/` - SOPS-encrypted secrets with age encryption (legacy, only used by ca)
|
||||||
- `/common/` - Shared configurations (e.g., VM guest agent)
|
- `/common/` - Shared configurations (e.g., VM guest agent)
|
||||||
- `/docs/` - Documentation and plans
|
- `/docs/` - Documentation and plans
|
||||||
- `plans/` - Future plans and proposals
|
- `plans/` - Future plans and proposals
|
||||||
- `plans/completed/` - Completed plans (moved here when done)
|
- `plans/completed/` - Completed plans (moved here when done)
|
||||||
- `/playbooks/` - Ansible playbooks for fleet management
|
- `/playbooks/` - Ansible playbooks for fleet management
|
||||||
- `/.sops.yaml` - SOPS configuration with age keys for all servers
|
- `/.sops.yaml` - SOPS configuration with age keys (legacy, only used by ca)
|
||||||
|
|
||||||
### Configuration Inheritance
|
### Configuration Inheritance
|
||||||
|
|
||||||
@@ -153,7 +282,7 @@ hosts/<hostname>/default.nix
|
|||||||
All hosts automatically get:
|
All hosts automatically get:
|
||||||
- Nix binary cache (nix-cache.home.2rjus.net)
|
- Nix binary cache (nix-cache.home.2rjus.net)
|
||||||
- SSH with root login enabled
|
- SSH with root login enabled
|
||||||
- SOPS secrets management with auto-generated age keys
|
- OpenBao (Vault) secrets management via AppRole
|
||||||
- Internal ACME CA integration (ca.home.2rjus.net)
|
- Internal ACME CA integration (ca.home.2rjus.net)
|
||||||
- Daily auto-upgrades with auto-reboot
|
- Daily auto-upgrades with auto-reboot
|
||||||
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
||||||
@@ -173,7 +302,6 @@ Production servers managed by `rebuild-all.sh`:
|
|||||||
- `nix-cache01` - Binary cache server
|
- `nix-cache01` - Binary cache server
|
||||||
- `pgdb1` - PostgreSQL database
|
- `pgdb1` - PostgreSQL database
|
||||||
- `nats1` - NATS messaging server
|
- `nats1` - NATS messaging server
|
||||||
- `auth01` - Authentication service
|
|
||||||
|
|
||||||
Template/test hosts:
|
Template/test hosts:
|
||||||
- `template1` - Base template for cloning new hosts
|
- `template1` - Base template for cloning new hosts
|
||||||
@@ -182,7 +310,7 @@ Template/test hosts:
|
|||||||
|
|
||||||
- `nixpkgs` - NixOS 25.11 stable (primary)
|
- `nixpkgs` - NixOS 25.11 stable (primary)
|
||||||
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
||||||
- `sops-nix` - Secrets management
|
- `sops-nix` - Secrets management (legacy, only used by ca)
|
||||||
- Custom packages from git.t-juice.club:
|
- Custom packages from git.t-juice.club:
|
||||||
- `alerttonotify` - Alert routing
|
- `alerttonotify` - Alert routing
|
||||||
- `labmon` - Lab monitoring
|
- `labmon` - Lab monitoring
|
||||||
@@ -198,12 +326,21 @@ Template/test hosts:
|
|||||||
|
|
||||||
### Secrets Management
|
### Secrets Management
|
||||||
|
|
||||||
- Uses SOPS with age encryption
|
Most hosts use OpenBao (Vault) for secrets:
|
||||||
- Each server has unique age key in `.sops.yaml`
|
- Vault server at `vault01.home.2rjus.net:8200`
|
||||||
- Keys auto-generated at `/var/lib/sops-nix/key.txt` on first boot
|
- AppRole authentication with credentials at `/var/lib/vault/approle/`
|
||||||
|
- Secrets defined in Terraform (`terraform/vault/secrets.tf`)
|
||||||
|
- AppRole policies in Terraform (`terraform/vault/approle.tf`)
|
||||||
|
- NixOS module: `system/vault-secrets.nix` with `vault.secrets.<name>` options
|
||||||
|
- `extractKey` option extracts a single key from vault JSON as a plain file
|
||||||
|
- Secrets fetched at boot by `vault-secret-<name>.service` systemd units
|
||||||
|
- Fallback to cached secrets in `/var/lib/vault/cache/` when Vault is unreachable
|
||||||
|
- Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
||||||
|
|
||||||
|
Legacy SOPS (only used by `ca` host):
|
||||||
|
- SOPS with age encryption, keys in `.sops.yaml`
|
||||||
- Shared secrets: `/secrets/secrets.yaml`
|
- Shared secrets: `/secrets/secrets.yaml`
|
||||||
- Per-host secrets: `/secrets/<hostname>/`
|
- Per-host secrets: `/secrets/<hostname>/`
|
||||||
- All production servers can decrypt shared secrets; host-specific secrets require specific host keys
|
|
||||||
|
|
||||||
### Auto-Upgrade System
|
### Auto-Upgrade System
|
||||||
|
|
||||||
@@ -303,13 +440,15 @@ This means:
|
|||||||
3. Add host entry to `flake.nix` nixosConfigurations
|
3. Add host entry to `flake.nix` nixosConfigurations
|
||||||
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
|
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
|
||||||
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
|
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
|
||||||
6. User clones template host
|
6. Add `vault.enable = true;` to the host configuration
|
||||||
7. User runs `prepare-host.sh` on new host, this deletes files which should be regenerated, like ssh host keys, machine-id etc. It also creates a new age key, and prints the public key
|
7. Add AppRole policy in `terraform/vault/approle.tf` and any secrets in `secrets.tf`
|
||||||
8. This key is then added to `.sops.yaml`
|
8. Run `tofu apply` in `terraform/vault/`
|
||||||
9. Create `/secrets/<hostname>/` if needed
|
9. User clones template host
|
||||||
10. Commit changes, and merge to master.
|
10. User runs `prepare-host.sh` on new host
|
||||||
11. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
11. Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
||||||
12. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
|
12. Commit changes, and merge to master.
|
||||||
|
13. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
||||||
|
14. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
|
||||||
|
|
||||||
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
||||||
|
|
||||||
@@ -325,6 +464,8 @@ This means:
|
|||||||
|
|
||||||
**Firewall**: Disabled on most hosts (trusted network). Enable selectively in host configuration if needed.
|
**Firewall**: Disabled on most hosts (trusted network). Enable selectively in host configuration if needed.
|
||||||
|
|
||||||
|
**Shell scripts**: Use `pkgs.writeShellApplication` instead of `pkgs.writeShellScript` or `pkgs.writeShellScriptBin` for creating shell scripts. `writeShellApplication` provides automatic shellcheck validation, sets strict bash options (`set -euo pipefail`), and allows declaring `runtimeInputs` for dependencies. When referencing the executable path (e.g., in `ExecStart`), use `lib.getExe myScript` to get the proper `bin/` path.
|
||||||
|
|
||||||
### Monitoring Stack
|
### Monitoring Stack
|
||||||
|
|
||||||
All hosts ship metrics and logs to `monitoring01`:
|
All hosts ship metrics and logs to `monitoring01`:
|
||||||
|
|||||||
@@ -15,7 +15,6 @@ NixOS Flake-based configuration repository for a homelab infrastructure. All hos
|
|||||||
| `nix-cache01` | Nix binary cache |
|
| `nix-cache01` | Nix binary cache |
|
||||||
| `pgdb1` | PostgreSQL |
|
| `pgdb1` | PostgreSQL |
|
||||||
| `nats1` | NATS messaging |
|
| `nats1` | NATS messaging |
|
||||||
| `auth01` | Authentication (LLDAP + Authelia) |
|
|
||||||
| `vault01` | OpenBao (Vault) secrets management |
|
| `vault01` | OpenBao (Vault) secrets management |
|
||||||
| `template1`, `template2` | VM templates for cloning new hosts |
|
| `template1`, `template2` | VM templates for cloning new hosts |
|
||||||
|
|
||||||
@@ -28,7 +27,7 @@ system/ # Shared modules applied to ALL hosts
|
|||||||
services/ # Reusable service modules, selectively imported per host
|
services/ # Reusable service modules, selectively imported per host
|
||||||
modules/ # Custom NixOS module definitions
|
modules/ # Custom NixOS module definitions
|
||||||
lib/ # Nix library functions (DNS zone generation, etc.)
|
lib/ # Nix library functions (DNS zone generation, etc.)
|
||||||
secrets/ # SOPS-encrypted secrets (age encryption)
|
secrets/ # SOPS-encrypted secrets (legacy, only used by ca)
|
||||||
common/ # Shared configurations (e.g., VM guest agent)
|
common/ # Shared configurations (e.g., VM guest agent)
|
||||||
terraform/ # OpenTofu configs for Proxmox VM provisioning
|
terraform/ # OpenTofu configs for Proxmox VM provisioning
|
||||||
terraform/vault/ # OpenTofu configs for OpenBao (secrets, PKI, AppRoles)
|
terraform/vault/ # OpenTofu configs for OpenBao (secrets, PKI, AppRoles)
|
||||||
@@ -40,7 +39,7 @@ scripts/ # Helper scripts (create-host, vault-fetch)
|
|||||||
|
|
||||||
**Automatic DNS zone generation** - A records are derived from each host's static IP configuration. CNAME aliases are defined via `homelab.dns.cnames`. No manual zone file editing required.
|
**Automatic DNS zone generation** - A records are derived from each host's static IP configuration. CNAME aliases are defined via `homelab.dns.cnames`. No manual zone file editing required.
|
||||||
|
|
||||||
**SOPS secrets management** - Each host has a unique age key. Shared secrets live in `secrets/secrets.yaml`, per-host secrets in `secrets/<hostname>/`.
|
**OpenBao (Vault) secrets** - Hosts authenticate via AppRole and fetch secrets at boot. Secrets and policies are managed as code in `terraform/vault/`. Legacy SOPS remains only for the `ca` host.
|
||||||
|
|
||||||
**Daily auto-upgrades** - All hosts pull from the master branch and automatically rebuild and reboot on a randomized schedule.
|
**Daily auto-upgrades** - All hosts pull from the master branch and automatically rebuild and reboot on a randomized schedule.
|
||||||
|
|
||||||
|
|||||||
192
docs/plans/auth-system-replacement.md
Normal file
192
docs/plans/auth-system-replacement.md
Normal file
@@ -0,0 +1,192 @@
|
|||||||
|
# Authentication System Replacement Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Replace the current auth01 setup (LLDAP + Authelia) with a modern, unified authentication solution. The current setup is not in active use, making this a good time to evaluate alternatives.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
1. **Central user database** - Manage users across all homelab hosts from a single source
|
||||||
|
2. **Linux PAM/NSS integration** - Users can SSH into hosts using central credentials
|
||||||
|
3. **UID/GID consistency** - Proper POSIX attributes for NAS share permissions
|
||||||
|
4. **OIDC provider** - Single sign-on for homelab web services (Grafana, etc.)
|
||||||
|
|
||||||
|
## Options Evaluated
|
||||||
|
|
||||||
|
### OpenLDAP (raw)
|
||||||
|
|
||||||
|
- **NixOS Support:** Good (`services.openldap` with `declarativeContents`)
|
||||||
|
- **Pros:** Most widely supported, very flexible
|
||||||
|
- **Cons:** LDIF format is painful, schema management is complex, no built-in OIDC, requires SSSD on each client
|
||||||
|
- **Verdict:** Doesn't address LDAP complexity concerns
|
||||||
|
|
||||||
|
### LLDAP + Authelia (current)
|
||||||
|
|
||||||
|
- **NixOS Support:** Both have good modules
|
||||||
|
- **Pros:** Already configured, lightweight, nice web UIs
|
||||||
|
- **Cons:** Two services to manage, limited POSIX attribute support in LLDAP, requires SSSD on every client host
|
||||||
|
- **Verdict:** Workable but has friction for NAS/UID goals
|
||||||
|
|
||||||
|
### FreeIPA
|
||||||
|
|
||||||
|
- **NixOS Support:** None
|
||||||
|
- **Pros:** Full enterprise solution (LDAP + Kerberos + DNS + CA)
|
||||||
|
- **Cons:** Extremely heavy, wants to own DNS, designed for Red Hat ecosystems, massive overkill for homelab
|
||||||
|
- **Verdict:** Overkill, no NixOS support
|
||||||
|
|
||||||
|
### Keycloak
|
||||||
|
|
||||||
|
- **NixOS Support:** None
|
||||||
|
- **Pros:** Good OIDC/SAML, nice UI
|
||||||
|
- **Cons:** Primarily an identity broker not a user directory, poor POSIX support, heavy (Java)
|
||||||
|
- **Verdict:** Wrong tool for Linux user management
|
||||||
|
|
||||||
|
### Authentik
|
||||||
|
|
||||||
|
- **NixOS Support:** None (would need Docker)
|
||||||
|
- **Pros:** All-in-one with LDAP outpost and OIDC, modern UI
|
||||||
|
- **Cons:** Heavy stack (Python + PostgreSQL + Redis), LDAP is a separate component
|
||||||
|
- **Verdict:** Would work but requires Docker and is heavy
|
||||||
|
|
||||||
|
### Kanidm
|
||||||
|
|
||||||
|
- **NixOS Support:** Excellent - first-class module with PAM/NSS integration
|
||||||
|
- **Pros:**
|
||||||
|
- Native PAM/NSS module (no SSSD needed)
|
||||||
|
- Built-in OIDC provider
|
||||||
|
- Optional LDAP interface for legacy services
|
||||||
|
- Declarative provisioning via NixOS (users, groups, OAuth2 clients)
|
||||||
|
- Modern, written in Rust
|
||||||
|
- Single service handles everything
|
||||||
|
- **Cons:** Newer project, smaller community than LDAP
|
||||||
|
- **Verdict:** Best fit for requirements
|
||||||
|
|
||||||
|
### Pocket-ID
|
||||||
|
|
||||||
|
- **NixOS Support:** Unknown
|
||||||
|
- **Pros:** Very lightweight, passkey-first
|
||||||
|
- **Cons:** No LDAP, no PAM/NSS integration - purely OIDC for web apps
|
||||||
|
- **Verdict:** Doesn't solve Linux user management goal
|
||||||
|
|
||||||
|
## Recommendation: Kanidm
|
||||||
|
|
||||||
|
Kanidm is the recommended solution for the following reasons:
|
||||||
|
|
||||||
|
| Requirement | Kanidm Support |
|
||||||
|
|-------------|----------------|
|
||||||
|
| Central user database | Native |
|
||||||
|
| Linux PAM/NSS (host login) | Native NixOS module |
|
||||||
|
| UID/GID for NAS | POSIX attributes supported |
|
||||||
|
| OIDC for services | Built-in |
|
||||||
|
| Declarative config | Excellent NixOS provisioning |
|
||||||
|
| Simplicity | Modern API, LDAP optional |
|
||||||
|
| NixOS integration | First-class |
|
||||||
|
|
||||||
|
### Key NixOS Features
|
||||||
|
|
||||||
|
**Server configuration:**
|
||||||
|
```nix
|
||||||
|
services.kanidm.enableServer = true;
|
||||||
|
services.kanidm.serverSettings = {
|
||||||
|
domain = "home.2rjus.net";
|
||||||
|
origin = "https://auth.home.2rjus.net";
|
||||||
|
ldapbindaddress = "0.0.0.0:636"; # Optional LDAP interface
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Declarative user provisioning:**
|
||||||
|
```nix
|
||||||
|
services.kanidm.provision.enable = true;
|
||||||
|
services.kanidm.provision.persons.torjus = {
|
||||||
|
displayName = "Torjus";
|
||||||
|
groups = [ "admins" "nas-users" ];
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Declarative OAuth2 clients:**
|
||||||
|
```nix
|
||||||
|
services.kanidm.provision.systems.oauth2.grafana = {
|
||||||
|
displayName = "Grafana";
|
||||||
|
originUrl = "https://grafana.home.2rjus.net/login/generic_oauth";
|
||||||
|
originLanding = "https://grafana.home.2rjus.net";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Client host configuration (add to system/):**
|
||||||
|
```nix
|
||||||
|
services.kanidm.enableClient = true;
|
||||||
|
services.kanidm.enablePam = true;
|
||||||
|
services.kanidm.clientSettings.uri = "https://auth.home.2rjus.net";
|
||||||
|
```
|
||||||
|
|
||||||
|
## NAS Integration
|
||||||
|
|
||||||
|
### Current: TrueNAS CORE (FreeBSD)
|
||||||
|
|
||||||
|
TrueNAS CORE has a built-in LDAP client. Kanidm's read-only LDAP interface will work for NFS share permissions:
|
||||||
|
|
||||||
|
- **NFS shares**: Only need consistent UID/GID mapping - Kanidm's LDAP provides this
|
||||||
|
- **No SMB requirement**: SMB would need Samba schema attributes (deprecated in TrueNAS 13.0+), but we're NFS-only
|
||||||
|
|
||||||
|
Configuration approach:
|
||||||
|
1. Enable Kanidm's LDAP interface (`ldapbindaddress = "0.0.0.0:636"`)
|
||||||
|
2. Import internal CA certificate into TrueNAS
|
||||||
|
3. Configure TrueNAS LDAP client with Kanidm's Base DN and bind credentials
|
||||||
|
4. Users/groups appear in TrueNAS permission dropdowns
|
||||||
|
|
||||||
|
Note: Kanidm's LDAP is read-only and uses LDAPS only (no StartTLS). This is fine for our use case.
|
||||||
|
|
||||||
|
### Future: NixOS NAS
|
||||||
|
|
||||||
|
When the NAS is migrated to NixOS, it becomes a first-class citizen:
|
||||||
|
|
||||||
|
- Native Kanidm PAM/NSS integration (same as other hosts)
|
||||||
|
- No LDAP compatibility layer needed
|
||||||
|
- Full integration with the rest of the homelab
|
||||||
|
|
||||||
|
This future migration path is a strong argument for Kanidm over LDAP-only solutions.
|
||||||
|
|
||||||
|
## Implementation Steps
|
||||||
|
|
||||||
|
1. **Create Kanidm service module** in `services/kanidm/`
|
||||||
|
- Server configuration
|
||||||
|
- TLS via internal ACME
|
||||||
|
- Vault secrets for admin passwords
|
||||||
|
|
||||||
|
2. **Configure declarative provisioning**
|
||||||
|
- Define initial users and groups
|
||||||
|
- Set up POSIX attributes (UID/GID ranges)
|
||||||
|
|
||||||
|
3. **Add OIDC clients** for homelab services
|
||||||
|
- Grafana
|
||||||
|
- Other services as needed
|
||||||
|
|
||||||
|
4. **Create client module** in `system/` for PAM/NSS
|
||||||
|
- Enable on all hosts that need central auth
|
||||||
|
- Configure trusted CA
|
||||||
|
|
||||||
|
5. **Test NAS integration**
|
||||||
|
- Configure TrueNAS LDAP client to connect to Kanidm
|
||||||
|
- Verify UID/GID mapping works with NFS shares
|
||||||
|
|
||||||
|
6. **Migrate auth01**
|
||||||
|
- Remove LLDAP and Authelia services
|
||||||
|
- Deploy Kanidm
|
||||||
|
- Update DNS CNAMEs if needed
|
||||||
|
|
||||||
|
7. **Documentation**
|
||||||
|
- User management procedures
|
||||||
|
- Adding new OAuth2 clients
|
||||||
|
- Troubleshooting PAM/NSS issues
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- What UID/GID range should be reserved for Kanidm-managed users?
|
||||||
|
- Which hosts should have PAM/NSS enabled initially?
|
||||||
|
- What OAuth2 clients are needed at launch?
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Kanidm Documentation](https://kanidm.github.io/kanidm/stable/)
|
||||||
|
- [NixOS Kanidm Module](https://search.nixos.org/options?query=services.kanidm)
|
||||||
|
- [Kanidm PAM/NSS Integration](https://kanidm.github.io/kanidm/stable/pam_and_nsswitch.html)
|
||||||
@@ -79,6 +79,33 @@ These services have adequate alerting and/or scrape targets:
|
|||||||
| Nix Cache (Harmonia, build-flakes) | Via Caddy | 4 alerts |
|
| Nix Cache (Harmonia, build-flakes) | Via Caddy | 4 alerts |
|
||||||
| CA (step-ca) | Yes (port 9000) | 4 certificate alerts |
|
| CA (step-ca) | Yes (port 9000) | 4 certificate alerts |
|
||||||
|
|
||||||
|
## Per-Service Resource Metrics (systemd-exporter)
|
||||||
|
|
||||||
|
### Current State
|
||||||
|
|
||||||
|
No per-service CPU, memory, or IO metrics are collected. The existing node-exporter systemd collector only provides unit state (active/inactive/failed), socket stats, and timer triggers. While systemd tracks per-unit resource usage via cgroups internally (visible in `systemctl status` and `systemd-cgtop`), this data is not exported to Prometheus.
|
||||||
|
|
||||||
|
### Available Solution
|
||||||
|
|
||||||
|
The `prometheus-systemd-exporter` package (v0.7.0) is available in nixpkgs with a ready-made NixOS module:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
services.prometheus.exporters.systemd.enable = true;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Options:** `enable`, `port`, `extraFlags`, `user`, `group`
|
||||||
|
|
||||||
|
This exporter reads cgroup data and exposes per-unit metrics including:
|
||||||
|
- CPU seconds consumed per service
|
||||||
|
- Memory usage per service
|
||||||
|
- Task/process counts per service
|
||||||
|
- Restart counts
|
||||||
|
- IO usage
|
||||||
|
|
||||||
|
### Recommendation
|
||||||
|
|
||||||
|
Enable on all hosts via the shared `system/` config (same pattern as node-exporter). Add a corresponding scrape job on monitoring01. This would give visibility into resource consumption per service across the fleet, useful for capacity planning and diagnosing noisy-neighbor issues on shared hosts.
|
||||||
|
|
||||||
## Suggested Priority
|
## Suggested Priority
|
||||||
|
|
||||||
1. **PostgreSQL** - Critical infrastructure, easy to add with existing nixpkgs module
|
1. **PostgreSQL** - Critical infrastructure, easy to add with existing nixpkgs module
|
||||||
176
docs/plans/completed/nixos-exporter.md
Normal file
176
docs/plans/completed/nixos-exporter.md
Normal file
@@ -0,0 +1,176 @@
|
|||||||
|
# NixOS Prometheus Exporter
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Build a generic Prometheus exporter for NixOS-specific metrics. This exporter should be useful for any NixOS deployment, not just our homelab.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Provide visibility into NixOS system state that standard exporters don't cover:
|
||||||
|
- Generation management (count, age, current vs booted)
|
||||||
|
- Flake input freshness
|
||||||
|
- Upgrade status
|
||||||
|
|
||||||
|
## Metrics
|
||||||
|
|
||||||
|
### Core Metrics
|
||||||
|
|
||||||
|
| Metric | Description | Source |
|
||||||
|
|--------|-------------|--------|
|
||||||
|
| `nixos_generation_count` | Number of system generations | Count entries in `/nix/var/nix/profiles/system-*` |
|
||||||
|
| `nixos_current_generation` | Active generation number | Parse `readlink /run/current-system` |
|
||||||
|
| `nixos_booted_generation` | Generation that was booted | Parse `/run/booted-system` |
|
||||||
|
| `nixos_generation_age_seconds` | Age of current generation | File mtime of current system profile |
|
||||||
|
| `nixos_config_mismatch` | 1 if booted != current, 0 otherwise | Compare symlink targets |
|
||||||
|
|
||||||
|
### Flake Metrics (optional collector)
|
||||||
|
|
||||||
|
| Metric | Description | Source |
|
||||||
|
|--------|-------------|--------|
|
||||||
|
| `nixos_flake_input_age_seconds` | Age of each flake.lock input | Parse `lastModified` from flake.lock |
|
||||||
|
| `nixos_flake_input_info` | Info gauge with rev label | Parse `rev` from flake.lock |
|
||||||
|
|
||||||
|
Labels: `input` (e.g., "nixpkgs", "home-manager")
|
||||||
|
|
||||||
|
### Future Metrics
|
||||||
|
|
||||||
|
| Metric | Description | Source |
|
||||||
|
|--------|-------------|--------|
|
||||||
|
| `nixos_upgrade_pending` | 1 if remote differs from local | Compare flake refs (expensive) |
|
||||||
|
| `nixos_store_size_bytes` | Size of /nix/store | `du` or filesystem stats |
|
||||||
|
| `nixos_store_path_count` | Number of store paths | Count entries |
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Single binary with optional collectors enabled via config or flags.
|
||||||
|
|
||||||
|
```
|
||||||
|
nixos-exporter
|
||||||
|
├── main.go
|
||||||
|
├── collector/
|
||||||
|
│ ├── generation.go # Core generation metrics
|
||||||
|
│ └── flake.go # Flake input metrics
|
||||||
|
└── config/
|
||||||
|
└── config.go
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
listen_addr: ":9971"
|
||||||
|
collectors:
|
||||||
|
generation:
|
||||||
|
enabled: true
|
||||||
|
flake:
|
||||||
|
enabled: false
|
||||||
|
lock_path: "/etc/nixos/flake.lock" # or auto-detect from /run/current-system
|
||||||
|
```
|
||||||
|
|
||||||
|
Command-line alternative:
|
||||||
|
```bash
|
||||||
|
nixos-exporter --listen=:9971 --collector.flake --flake.lock-path=/etc/nixos/flake.lock
|
||||||
|
```
|
||||||
|
|
||||||
|
## NixOS Module
|
||||||
|
|
||||||
|
```nix
|
||||||
|
services.prometheus.exporters.nixos = {
|
||||||
|
enable = true;
|
||||||
|
port = 9971;
|
||||||
|
collectors = [ "generation" "flake" ];
|
||||||
|
flake.lockPath = "/etc/nixos/flake.lock";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
The module should integrate with nixpkgs' existing `services.prometheus.exporters.*` pattern.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Language
|
||||||
|
|
||||||
|
Go - mature prometheus client library, single static binary, easy cross-compilation.
|
||||||
|
|
||||||
|
### Phase 1: Core
|
||||||
|
1. Create git repository
|
||||||
|
2. Implement generation collector (count, current, booted, age, mismatch)
|
||||||
|
3. Basic HTTP server with `/metrics` endpoint
|
||||||
|
4. NixOS module
|
||||||
|
|
||||||
|
### Phase 2: Flake Collector
|
||||||
|
1. Parse flake.lock JSON format
|
||||||
|
2. Extract lastModified timestamps per input
|
||||||
|
3. Add input labels
|
||||||
|
|
||||||
|
### Phase 3: Packaging
|
||||||
|
1. Add to nixpkgs or publish as flake
|
||||||
|
2. Documentation
|
||||||
|
3. Example Grafana dashboard
|
||||||
|
|
||||||
|
## Example Output
|
||||||
|
|
||||||
|
```
|
||||||
|
# HELP nixos_generation_count Total number of system generations
|
||||||
|
# TYPE nixos_generation_count gauge
|
||||||
|
nixos_generation_count 47
|
||||||
|
|
||||||
|
# HELP nixos_current_generation Currently active generation number
|
||||||
|
# TYPE nixos_current_generation gauge
|
||||||
|
nixos_current_generation 47
|
||||||
|
|
||||||
|
# HELP nixos_booted_generation Generation that was booted
|
||||||
|
# TYPE nixos_booted_generation gauge
|
||||||
|
nixos_booted_generation 46
|
||||||
|
|
||||||
|
# HELP nixos_generation_age_seconds Age of current generation in seconds
|
||||||
|
# TYPE nixos_generation_age_seconds gauge
|
||||||
|
nixos_generation_age_seconds 3600
|
||||||
|
|
||||||
|
# HELP nixos_config_mismatch 1 if booted generation differs from current
|
||||||
|
# TYPE nixos_config_mismatch gauge
|
||||||
|
nixos_config_mismatch 1
|
||||||
|
|
||||||
|
# HELP nixos_flake_input_age_seconds Age of flake input in seconds
|
||||||
|
# TYPE nixos_flake_input_age_seconds gauge
|
||||||
|
nixos_flake_input_age_seconds{input="nixpkgs"} 259200
|
||||||
|
nixos_flake_input_age_seconds{input="home-manager"} 86400
|
||||||
|
```
|
||||||
|
|
||||||
|
## Alert Examples
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- alert: NixOSConfigStale
|
||||||
|
expr: nixos_generation_age_seconds > 7 * 24 * 3600
|
||||||
|
for: 1h
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "NixOS config on {{ $labels.instance }} is over 7 days old"
|
||||||
|
|
||||||
|
- alert: NixOSRebootRequired
|
||||||
|
expr: nixos_config_mismatch == 1
|
||||||
|
for: 24h
|
||||||
|
labels:
|
||||||
|
severity: info
|
||||||
|
annotations:
|
||||||
|
summary: "{{ $labels.instance }} needs reboot to apply config"
|
||||||
|
|
||||||
|
- alert: NixpkgsInputStale
|
||||||
|
expr: nixos_flake_input_age_seconds{input="nixpkgs"} > 30 * 24 * 3600
|
||||||
|
for: 1d
|
||||||
|
labels:
|
||||||
|
severity: info
|
||||||
|
annotations:
|
||||||
|
summary: "nixpkgs input on {{ $labels.instance }} is over 30 days old"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- [ ] How to detect flake.lock path automatically? (check /run/current-system for flake info)
|
||||||
|
- [ ] Should generation collector need root? (probably not, just reading symlinks)
|
||||||
|
- [ ] Include in nixpkgs or distribute as standalone flake?
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Port 9971 suggested (9970 reserved for homelab-exporter)
|
||||||
|
- Keep scope focused on NixOS-specific metrics - don't duplicate node-exporter
|
||||||
|
- Consider submitting to prometheus exporter registry once stable
|
||||||
86
docs/plans/completed/sops-to-openbao-migration.md
Normal file
86
docs/plans/completed/sops-to-openbao-migration.md
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# Sops to OpenBao Secrets Migration Plan
|
||||||
|
|
||||||
|
## Status: Complete (except ca, deferred)
|
||||||
|
|
||||||
|
## Remaining sops cleanup
|
||||||
|
|
||||||
|
The `sops-nix` flake input, `system/sops.nix`, `.sops.yaml`, and `secrets/` directory are
|
||||||
|
still present because `ca` still uses sops for its step-ca secrets (5 secrets in
|
||||||
|
`services/ca/default.nix`). The `services/authelia/` and `services/lldap/` modules also
|
||||||
|
reference sops but are only used by auth01 (decommissioned).
|
||||||
|
|
||||||
|
Once `ca` is migrated to OpenBao PKI (Phase 4c in host-migration-to-opentofu.md), remove:
|
||||||
|
- `sops-nix` input from `flake.nix`
|
||||||
|
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
||||||
|
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
||||||
|
- `system/sops.nix` and its import in `system/default.nix`
|
||||||
|
- `.sops.yaml`
|
||||||
|
- `secrets/` directory
|
||||||
|
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Migrate all hosts from sops-nix secrets to OpenBao (vault) secrets management. Pilot with ha1, then roll out to remaining hosts in waves.
|
||||||
|
|
||||||
|
## Pre-requisites (completed)
|
||||||
|
|
||||||
|
1. Hardcoded root password hash in `system/root-user.nix` (removes sops dependency for all hosts)
|
||||||
|
2. Added `extractKey` option to `system/vault-secrets.nix` (extracts single key as file)
|
||||||
|
|
||||||
|
## Deployment Order
|
||||||
|
|
||||||
|
### Pilot: ha1
|
||||||
|
- Terraform: shared/backup/password secret, ha1 AppRole policy
|
||||||
|
- Provision AppRole credentials via `playbooks/provision-approle.yml`
|
||||||
|
- NixOS: vault.enable + backup-helper vault secret
|
||||||
|
|
||||||
|
### Wave 1: nats1, jelly01, pgdb1
|
||||||
|
- No service secrets (only root password, already handled)
|
||||||
|
- Just need AppRole policies + credential provisioning
|
||||||
|
|
||||||
|
### Wave 2: monitoring01
|
||||||
|
- 3 secrets: backup password, nats nkey, pve-exporter config
|
||||||
|
- Updates: alerttonotify.nix, pve.nix, configuration.nix
|
||||||
|
|
||||||
|
### Wave 3: ns1, then ns2 (critical - deploy ns1 first, verify, then ns2)
|
||||||
|
- DNS zone transfer key (shared/dns/xfer-key)
|
||||||
|
|
||||||
|
### Wave 4: http-proxy
|
||||||
|
- WireGuard private key
|
||||||
|
|
||||||
|
### Wave 5: nix-cache01
|
||||||
|
- Cache signing key + Gitea Actions token
|
||||||
|
|
||||||
|
### Wave 6: ca (DEFERRED - waiting for PKI migration)
|
||||||
|
|
||||||
|
### Skipped: auth01 (decommissioned)
|
||||||
|
|
||||||
|
## Terraform variables needed
|
||||||
|
|
||||||
|
User must extract from sops and add to `terraform/vault/terraform.tfvars`:
|
||||||
|
|
||||||
|
| Variable | Source |
|
||||||
|
|----------|--------|
|
||||||
|
| `backup_helper_secret` | `sops -d secrets/secrets.yaml` |
|
||||||
|
| `ns_xfer_key` | `sops -d secrets/secrets.yaml` |
|
||||||
|
| `nats_nkey` | `sops -d secrets/secrets.yaml` |
|
||||||
|
| `pve_exporter_config` | `sops -d secrets/monitoring01/pve-exporter.yaml` |
|
||||||
|
| `wireguard_private_key` | `sops -d secrets/http-proxy/wireguard.yaml` |
|
||||||
|
| `cache_signing_key` | `sops -d secrets/nix-cache01/cache-secret` |
|
||||||
|
| `actions_token_1` | `sops -d secrets/nix-cache01/actions_token_1` |
|
||||||
|
|
||||||
|
## Provisioning AppRole credentials
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export BAO_ADDR='https://vault01.home.2rjus.net:8200'
|
||||||
|
export BAO_TOKEN='<root-token>'
|
||||||
|
nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verification (per host)
|
||||||
|
|
||||||
|
1. `systemctl status vault-secret-*` - all secret fetch services succeeded
|
||||||
|
2. Check secret files exist at expected paths with correct permissions
|
||||||
|
3. Verify dependent services are running
|
||||||
|
4. Check `/var/lib/vault/cache/` is populated (fallback ready)
|
||||||
|
5. Reboot host to verify boot-time secret fetching works
|
||||||
109
docs/plans/completed/zigbee-sensor-battery-monitoring.md
Normal file
109
docs/plans/completed/zigbee-sensor-battery-monitoring.md
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
# Zigbee Sensor Battery Monitoring
|
||||||
|
|
||||||
|
**Status:** Completed
|
||||||
|
**Branch:** `zigbee-battery-fix`
|
||||||
|
**Commit:** `c515a6b home-assistant: fix zigbee sensor battery reporting`
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Three Aqara Zigbee temperature sensors report `battery: 0` in their MQTT payload, making the `hass_sensor_battery_percent` Prometheus metric useless for battery monitoring on these devices.
|
||||||
|
|
||||||
|
Affected sensors:
|
||||||
|
- **Temp Living Room** (`0x54ef441000a54d3c`) — WSDCGQ12LM
|
||||||
|
- **Temp Office** (`0x54ef441000a547bd`) — WSDCGQ12LM
|
||||||
|
- **temp_server** (`0x54ef441000a564b6`) — WSDCGQ12LM
|
||||||
|
|
||||||
|
The **Temp Bedroom** sensor (`0x00124b0025495463`) is a SONOFF SNZB-02 and reports battery correctly.
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
- All three sensors are actively reporting temperature, humidity, and pressure data — they are not dead.
|
||||||
|
- The Zigbee2MQTT payload includes a `voltage` field (e.g., `2707` = 2.707V), which indicates healthy battery levels (~40-60% for a CR2032 coin cell).
|
||||||
|
- CR2032 voltage reference: ~3.0V fresh, ~2.7V mid-life, ~2.1V dead.
|
||||||
|
- The `voltage` field is not exposed as a Prometheus metric — it exists only in the MQTT payload.
|
||||||
|
- This is a known firmware quirk with some Aqara WSDCGQ12LM sensors that always report 0% battery.
|
||||||
|
|
||||||
|
## Device Inventory
|
||||||
|
|
||||||
|
Full list of Zigbee devices on ha1 (12 total):
|
||||||
|
|
||||||
|
| Device | IEEE Address | Model | Type |
|
||||||
|
|--------|-------------|-------|------|
|
||||||
|
| temp_server | 0x54ef441000a564b6 | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||||
|
| (Temp Living Room) | 0x54ef441000a54d3c | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||||
|
| (Temp Office) | 0x54ef441000a547bd | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||||
|
| (Temp Bedroom) | 0x00124b0025495463 | SNZB-02 | Temperature sensor (battery works) |
|
||||||
|
| (Water leak) | 0x54ef4410009ac117 | SJCGQ12LM | Water leak sensor |
|
||||||
|
| btn_livingroom | 0x54ef441000a1f907 | WXKG13LM | Wireless mini switch |
|
||||||
|
| btn_bedroom | 0x54ef441000a1ee71 | WXKG13LM | Wireless mini switch |
|
||||||
|
| (Hue bulb) | 0x001788010dc35d06 | 9290024688 | Hue E27 1100lm (Router) |
|
||||||
|
| (Hue bulb) | 0x001788010dc5f003 | 9290024688 | Hue E27 1100lm (Router) |
|
||||||
|
| (Hue ceiling) | 0x001788010e371aa4 | 915005997301 | Hue Infuse medium (Router) |
|
||||||
|
| (Hue ceiling) | 0x001788010d253b99 | 915005997301 | Hue Infuse medium (Router) |
|
||||||
|
| (Hue wall) | 0x001788010d1b599a | 929003052901 | Hue Sana wall light (Router, transition=5) |
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Solution 1: Calculate battery from voltage in Zigbee2MQTT (Implemented)
|
||||||
|
|
||||||
|
Override the Home Assistant battery entity's `value_template` in Zigbee2MQTT device configuration to calculate battery percentage from voltage.
|
||||||
|
|
||||||
|
**Formula:** `(voltage - 2100) / 9` (maps 2100-3000mV to 0-100%)
|
||||||
|
|
||||||
|
**Changes in `services/home-assistant/default.nix`:**
|
||||||
|
- Device configuration moved from external `devices.yaml` to inline NixOS config
|
||||||
|
- Three affected sensors have `homeassistant.sensor_battery.value_template` override
|
||||||
|
- All 12 devices now declaratively managed
|
||||||
|
|
||||||
|
**Expected battery values based on current voltages:**
|
||||||
|
| Sensor | Voltage | Expected Battery |
|
||||||
|
|--------|---------|------------------|
|
||||||
|
| Temp Living Room | 2710 mV | ~68% |
|
||||||
|
| Temp Office | 2658 mV | ~62% |
|
||||||
|
| temp_server | 2765 mV | ~74% |
|
||||||
|
|
||||||
|
### Solution 2: Alert on sensor staleness (Implemented)
|
||||||
|
|
||||||
|
Added Prometheus alert `zigbee_sensor_stale` in `services/monitoring/rules.yml` that fires when a Zigbee temperature sensor hasn't updated in over 1 hour. This provides defense-in-depth for detecting dead sensors regardless of battery reporting accuracy.
|
||||||
|
|
||||||
|
**Alert details:**
|
||||||
|
- Expression: `(time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 3600`
|
||||||
|
- Severity: warning
|
||||||
|
- For: 5m
|
||||||
|
|
||||||
|
## Pre-Deployment Verification
|
||||||
|
|
||||||
|
### Backup Verification
|
||||||
|
|
||||||
|
Before deployment, verified ha1 backup configuration and ran manual backup:
|
||||||
|
|
||||||
|
**Backup paths:**
|
||||||
|
- `/var/lib/hass` ✓
|
||||||
|
- `/var/lib/zigbee2mqtt` ✓
|
||||||
|
- `/var/lib/mosquitto` ✓
|
||||||
|
|
||||||
|
**Manual backup (2026-02-05 22:45:23):**
|
||||||
|
- Snapshot ID: `59704dfa`
|
||||||
|
- Files: 77 total (0 new, 13 changed, 64 unmodified)
|
||||||
|
- Data: 62.635 MiB processed, 6.928 MiB stored (compressed)
|
||||||
|
|
||||||
|
### Other directories reviewed
|
||||||
|
|
||||||
|
- `/var/lib/vault` — Contains AppRole credentials; not backed up (can be re-provisioned via Ansible)
|
||||||
|
- `/var/lib/sops-nix` — Legacy; ha1 uses Vault now
|
||||||
|
|
||||||
|
## Post-Deployment Steps
|
||||||
|
|
||||||
|
After deploying to ha1:
|
||||||
|
|
||||||
|
1. Restart zigbee2mqtt service (automatic on NixOS rebuild)
|
||||||
|
2. In Home Assistant, the battery entities may need to be re-discovered:
|
||||||
|
- Go to Settings → Devices & Services → MQTT
|
||||||
|
- The new `value_template` should take effect after entity re-discovery
|
||||||
|
- If not, try disabling and re-enabling the battery entities
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Device configuration is now declarative in NixOS. Future device additions via Zigbee2MQTT frontend will need to be added to the NixOS config to persist.
|
||||||
|
- The `devices.yaml` file on ha1 will be overwritten on service start but can be removed after confirming the new config works.
|
||||||
|
- The NixOS zigbee2mqtt module defaults to `devices = "devices.yaml"` but our explicit inline config overrides this.
|
||||||
179
docs/plans/homelab-exporter.md
Normal file
179
docs/plans/homelab-exporter.md
Normal file
@@ -0,0 +1,179 @@
|
|||||||
|
# Homelab Infrastructure Exporter
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Build a Prometheus exporter for metrics specific to our homelab infrastructure. Unlike the generic nixos-exporter, this covers services and patterns unique to our environment.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
### Existing Exporters
|
||||||
|
- **node-exporter** (all hosts): System metrics
|
||||||
|
- **systemd-exporter** (all hosts): Service restart counts, IP accounting
|
||||||
|
- **labmon** (monitoring01): TLS certificate monitoring, step-ca health
|
||||||
|
- **Service-specific**: unbound, postgres, nats, jellyfin, home-assistant, caddy, step-ca
|
||||||
|
|
||||||
|
### Gaps
|
||||||
|
- No visibility into Vault/OpenBao lease expiry
|
||||||
|
- No ACME certificate expiry from internal CA
|
||||||
|
- No Proxmox guest agent metrics from inside VMs
|
||||||
|
|
||||||
|
## Metrics
|
||||||
|
|
||||||
|
### Vault/OpenBao Metrics
|
||||||
|
|
||||||
|
| Metric | Description | Source |
|
||||||
|
|--------|-------------|--------|
|
||||||
|
| `homelab_vault_token_expiry_seconds` | Seconds until AppRole token expires | Token metadata or lease file |
|
||||||
|
| `homelab_vault_token_renewable` | 1 if token is renewable | Token metadata |
|
||||||
|
|
||||||
|
Labels: `role` (AppRole name)
|
||||||
|
|
||||||
|
### ACME Certificate Metrics
|
||||||
|
|
||||||
|
| Metric | Description | Source |
|
||||||
|
|--------|-------------|--------|
|
||||||
|
| `homelab_acme_cert_expiry_seconds` | Seconds until certificate expires | Parse cert from `/var/lib/acme/*/cert.pem` |
|
||||||
|
| `homelab_acme_cert_not_after` | Unix timestamp of cert expiry | Certificate NotAfter field |
|
||||||
|
|
||||||
|
Labels: `domain`, `issuer`
|
||||||
|
|
||||||
|
Note: labmon already monitors external TLS endpoints. This covers local ACME-managed certs.
|
||||||
|
|
||||||
|
### Proxmox Guest Metrics (future)
|
||||||
|
|
||||||
|
| Metric | Description | Source |
|
||||||
|
|--------|-------------|--------|
|
||||||
|
| `homelab_proxmox_guest_info` | Info gauge with VM ID, name | QEMU guest agent |
|
||||||
|
| `homelab_proxmox_guest_agent_running` | 1 if guest agent is responsive | Agent ping |
|
||||||
|
|
||||||
|
### DNS Zone Metrics (future)
|
||||||
|
|
||||||
|
| Metric | Description | Source |
|
||||||
|
|--------|-------------|--------|
|
||||||
|
| `homelab_dns_zone_serial` | Current zone serial number | DNS AXFR or zone file |
|
||||||
|
|
||||||
|
Labels: `zone`
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Single binary with collectors enabled via config. Runs on hosts that need specific collectors.
|
||||||
|
|
||||||
|
```
|
||||||
|
homelab-exporter
|
||||||
|
├── main.go
|
||||||
|
├── collector/
|
||||||
|
│ ├── vault.go # Vault/OpenBao token metrics
|
||||||
|
│ ├── acme.go # ACME certificate metrics
|
||||||
|
│ └── proxmox.go # Proxmox guest agent (future)
|
||||||
|
└── config/
|
||||||
|
└── config.go
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
listen_addr: ":9970"
|
||||||
|
collectors:
|
||||||
|
vault:
|
||||||
|
enabled: true
|
||||||
|
token_path: "/var/lib/vault/token"
|
||||||
|
acme:
|
||||||
|
enabled: true
|
||||||
|
cert_dirs:
|
||||||
|
- "/var/lib/acme"
|
||||||
|
proxmox:
|
||||||
|
enabled: false
|
||||||
|
```
|
||||||
|
|
||||||
|
## NixOS Module
|
||||||
|
|
||||||
|
```nix
|
||||||
|
services.homelab-exporter = {
|
||||||
|
enable = true;
|
||||||
|
port = 9970;
|
||||||
|
collectors = {
|
||||||
|
vault = {
|
||||||
|
enable = true;
|
||||||
|
tokenPath = "/var/lib/vault/token";
|
||||||
|
};
|
||||||
|
acme = {
|
||||||
|
enable = true;
|
||||||
|
certDirs = [ "/var/lib/acme" ];
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
# Auto-register scrape target
|
||||||
|
homelab.monitoring.scrapeTargets = [{
|
||||||
|
job_name = "homelab-exporter";
|
||||||
|
port = 9970;
|
||||||
|
}];
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Deploy on hosts that have relevant data:
|
||||||
|
- **All hosts with ACME certs**: acme collector
|
||||||
|
- **All hosts with Vault**: vault collector
|
||||||
|
- **Proxmox VMs**: proxmox collector (when implemented)
|
||||||
|
|
||||||
|
### Relationship with nixos-exporter
|
||||||
|
|
||||||
|
These are complementary:
|
||||||
|
- **nixos-exporter** (port 9971): Generic NixOS metrics, deploy everywhere
|
||||||
|
- **homelab-exporter** (port 9970): Infrastructure-specific, deploy selectively
|
||||||
|
|
||||||
|
Both can run on the same host if needed.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Language
|
||||||
|
|
||||||
|
Go - consistent with labmon and nixos-exporter.
|
||||||
|
|
||||||
|
### Phase 1: Core + ACME
|
||||||
|
1. Create git repository (git.t-juice.club/torjus/homelab-exporter)
|
||||||
|
2. Implement ACME certificate collector
|
||||||
|
3. HTTP server with `/metrics`
|
||||||
|
4. NixOS module
|
||||||
|
|
||||||
|
### Phase 2: Vault Collector
|
||||||
|
1. Implement token expiry detection
|
||||||
|
2. Handle missing/expired tokens gracefully
|
||||||
|
|
||||||
|
### Phase 3: Dashboard
|
||||||
|
1. Create Grafana dashboard for infrastructure health
|
||||||
|
2. Add to existing monitoring service module
|
||||||
|
|
||||||
|
## Alert Examples
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- alert: VaultTokenExpiringSoon
|
||||||
|
expr: homelab_vault_token_expiry_seconds < 3600
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Vault token on {{ $labels.instance }} expires in < 1 hour"
|
||||||
|
|
||||||
|
- alert: ACMECertExpiringSoon
|
||||||
|
expr: homelab_acme_cert_expiry_seconds < 7 * 24 * 3600
|
||||||
|
for: 1h
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "ACME cert {{ $labels.domain }} on {{ $labels.instance }} expires in < 7 days"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- [ ] How to read Vault token expiry without re-authenticating?
|
||||||
|
- [ ] Should ACME collector also check key/cert match?
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Port 9970 (labmon uses 9969, nixos-exporter will use 9971)
|
||||||
|
- Keep infrastructure-specific logic here, generic NixOS stuff in nixos-exporter
|
||||||
|
- Consider merging Proxmox metrics with pve-exporter if overlap is significant
|
||||||
@@ -20,7 +20,7 @@ Hosts to migrate:
|
|||||||
| nix-cache01 | Stateless | Binary cache, recreate |
|
| nix-cache01 | Stateless | Binary cache, recreate |
|
||||||
| http-proxy | Stateless | Reverse proxy, recreate |
|
| http-proxy | Stateless | Reverse proxy, recreate |
|
||||||
| nats1 | Stateless | Messaging, recreate |
|
| nats1 | Stateless | Messaging, recreate |
|
||||||
| auth01 | Stateless | Authentication, recreate |
|
| auth01 | Decommission | No longer in use |
|
||||||
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||||
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
||||||
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
||||||
@@ -94,8 +94,7 @@ These hosts have no meaningful state and can be recreated fresh. For each host:
|
|||||||
Migrate stateless hosts in an order that minimizes disruption:
|
Migrate stateless hosts in an order that minimizes disruption:
|
||||||
|
|
||||||
1. **nix-cache01** — low risk, no downstream dependencies during migration
|
1. **nix-cache01** — low risk, no downstream dependencies during migration
|
||||||
2. **auth01** — low risk
|
2. **nats1** — low risk, verify no persistent JetStream streams first
|
||||||
3. **nats1** — low risk, verify no persistent JetStream streams first
|
|
||||||
4. **http-proxy** — brief disruption to proxied services, migrate during low-traffic window
|
4. **http-proxy** — brief disruption to proxied services, migrate during low-traffic window
|
||||||
5. **ns1, ns2** — migrate one at a time, verify DNS resolution between each
|
5. **ns1, ns2** — migrate one at a time, verify DNS resolution between each
|
||||||
|
|
||||||
@@ -154,6 +153,10 @@ For each stateful host, the procedure is:
|
|||||||
6. Verify Zigbee devices are still paired and communicating
|
6. Verify Zigbee devices are still paired and communicating
|
||||||
7. Decommission old VM
|
7. Decommission old VM
|
||||||
|
|
||||||
|
**Note:** ha1 currently has 2 GB RAM, which is consistently tight. Average memory usage has
|
||||||
|
climbed from ~57% (30-day avg) to ~70% currently, with a 30-day low of only 187 MB free.
|
||||||
|
Consider increasing to 4 GB when reprovisioning to allow headroom for additional integrations.
|
||||||
|
|
||||||
**Note:** ha1 is the highest-risk migration due to Zigbee device pairings. The Zigbee
|
**Note:** ha1 is the highest-risk migration due to Zigbee device pairings. The Zigbee
|
||||||
coordinator state in `/var/lib/zigbee2mqtt` should preserve pairings, but verify on a
|
coordinator state in `/var/lib/zigbee2mqtt` should preserve pairings, but verify on a
|
||||||
non-critical time window.
|
non-critical time window.
|
||||||
@@ -164,8 +167,9 @@ OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropr
|
|||||||
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
||||||
through before starting Zigbee2MQTT on the new host.
|
through before starting Zigbee2MQTT on the new host.
|
||||||
|
|
||||||
## Phase 5: Decommission jump Host
|
## Phase 5: Decommission jump and auth01 Hosts
|
||||||
|
|
||||||
|
### jump
|
||||||
1. Verify nothing depends on the jump host (no SSH proxy configs pointing to it, etc.)
|
1. Verify nothing depends on the jump host (no SSH proxy configs pointing to it, etc.)
|
||||||
2. Remove host configuration from `hosts/jump/`
|
2. Remove host configuration from `hosts/jump/`
|
||||||
3. Remove from `flake.nix`
|
3. Remove from `flake.nix`
|
||||||
@@ -174,12 +178,37 @@ through before starting Zigbee2MQTT on the new host.
|
|||||||
6. Destroy the VM in Proxmox
|
6. Destroy the VM in Proxmox
|
||||||
7. Commit cleanup
|
7. Commit cleanup
|
||||||
|
|
||||||
|
### auth01
|
||||||
|
1. Remove host configuration from `hosts/auth01/`
|
||||||
|
2. Remove from `flake.nix`
|
||||||
|
3. Remove any secrets in `secrets/auth01/`
|
||||||
|
4. Remove from `.sops.yaml`
|
||||||
|
5. Remove `services/authelia/` and `services/lldap/` (only used by auth01)
|
||||||
|
6. Destroy the VM in Proxmox
|
||||||
|
7. Commit cleanup
|
||||||
|
|
||||||
## Phase 6: Decommission ca Host (Deferred)
|
## Phase 6: Decommission ca Host (Deferred)
|
||||||
|
|
||||||
Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
||||||
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
||||||
the same cleanup steps as the jump host.
|
the same cleanup steps as the jump host.
|
||||||
|
|
||||||
|
## Phase 7: Remove sops-nix
|
||||||
|
|
||||||
|
Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
||||||
|
all remnants:
|
||||||
|
- `sops-nix` input from `flake.nix` and `flake.lock`
|
||||||
|
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
||||||
|
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
||||||
|
- `system/sops.nix` and its import in `system/default.nix`
|
||||||
|
- `.sops.yaml`
|
||||||
|
- `secrets/` directory
|
||||||
|
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
||||||
|
- Template scripts that generate age keys for sops (`hosts/template/scripts.nix`,
|
||||||
|
`hosts/template2/scripts.nix`)
|
||||||
|
|
||||||
|
See `docs/plans/completed/sops-to-openbao-migration.md` for full context.
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Each host migration should be done individually, not in bulk, to limit blast radius
|
- Each host migration should be done individually, not in bulk, to limit blast radius
|
||||||
|
|||||||
122
docs/plans/long-term-metrics-storage.md
Normal file
122
docs/plans/long-term-metrics-storage.md
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
# Long-Term Metrics Storage Options
|
||||||
|
|
||||||
|
## Problem Statement
|
||||||
|
|
||||||
|
Current Prometheus configuration retains metrics for 30 days (`retentionTime = "30d"`). Extending retention further raises disk usage concerns on the homelab hypervisor with limited local storage.
|
||||||
|
|
||||||
|
Prometheus does not support downsampling - it stores all data at full resolution until the retention period expires, then deletes it entirely.
|
||||||
|
|
||||||
|
## Current Configuration
|
||||||
|
|
||||||
|
Location: `services/monitoring/prometheus.nix`
|
||||||
|
|
||||||
|
- **Retention**: 30 days
|
||||||
|
- **Scrape interval**: 15s
|
||||||
|
- **Features**: Alertmanager, Pushgateway, auto-generated scrape configs from flake hosts
|
||||||
|
- **Storage**: Local disk on monitoring01
|
||||||
|
|
||||||
|
## Options Evaluated
|
||||||
|
|
||||||
|
### Option 1: VictoriaMetrics
|
||||||
|
|
||||||
|
VictoriaMetrics is a Prometheus-compatible TSDB with significantly better compression (5-10x smaller storage footprint).
|
||||||
|
|
||||||
|
**NixOS Options Available:**
|
||||||
|
- `services.victoriametrics.enable`
|
||||||
|
- `services.victoriametrics.prometheusConfig` - accepts Prometheus scrape config format
|
||||||
|
- `services.victoriametrics.retentionPeriod` - e.g., "6m" for 6 months
|
||||||
|
- `services.vmagent` - dedicated scraping agent
|
||||||
|
- `services.vmalert` - alerting rules evaluation
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Simple migration - single service replacement
|
||||||
|
- Same PromQL query language - Grafana dashboards work unchanged
|
||||||
|
- Same scrape config format - existing auto-generated configs work as-is
|
||||||
|
- 5-10x better compression means 30 days of Prometheus data could become 180+ days
|
||||||
|
- Lightweight, single binary
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- No automatic downsampling (relies on compression alone)
|
||||||
|
- Alerting requires switching to vmalert instead of Prometheus alertmanager integration
|
||||||
|
- Would need to migrate existing data or start fresh
|
||||||
|
|
||||||
|
**Migration Steps:**
|
||||||
|
1. Replace `services.prometheus` with `services.victoriametrics`
|
||||||
|
2. Move scrape configs to `prometheusConfig`
|
||||||
|
3. Set up `services.vmalert` for alerting rules
|
||||||
|
4. Update Grafana datasource to VictoriaMetrics port (8428)
|
||||||
|
5. Keep Alertmanager for notification routing
|
||||||
|
|
||||||
|
### Option 2: Thanos
|
||||||
|
|
||||||
|
Thanos extends Prometheus with long-term storage and automatic downsampling by uploading data to object storage.
|
||||||
|
|
||||||
|
**NixOS Options Available:**
|
||||||
|
- `services.thanos.sidecar` - uploads Prometheus blocks to object storage
|
||||||
|
- `services.thanos.compact` - compacts and downsamples data
|
||||||
|
- `services.thanos.query` - unified query gateway
|
||||||
|
- `services.thanos.query-frontend` - query caching and parallelization
|
||||||
|
- `services.thanos.downsample` - dedicated downsampling service
|
||||||
|
|
||||||
|
**Downsampling Behavior:**
|
||||||
|
- Raw resolution kept for configurable period (default: indefinite)
|
||||||
|
- 5-minute resolution created after 40 hours
|
||||||
|
- 1-hour resolution created after 10 days
|
||||||
|
|
||||||
|
**Retention Configuration (in compactor):**
|
||||||
|
```nix
|
||||||
|
services.thanos.compact = {
|
||||||
|
retention.resolution-raw = "30d"; # Keep raw for 30 days
|
||||||
|
retention.resolution-5m = "180d"; # Keep 5m samples for 6 months
|
||||||
|
retention.resolution-1h = "2y"; # Keep 1h samples for 2 years
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- True downsampling - older data uses progressively less storage
|
||||||
|
- Keep metrics for years with minimal storage impact
|
||||||
|
- Prometheus continues running unchanged
|
||||||
|
- Existing Alertmanager integration preserved
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Requires object storage (MinIO, S3, or local filesystem)
|
||||||
|
- Multiple services to manage (sidecar, compactor, query)
|
||||||
|
- More complex architecture
|
||||||
|
- Additional infrastructure (MinIO) may be needed
|
||||||
|
|
||||||
|
**Required Components:**
|
||||||
|
1. Thanos Sidecar (runs alongside Prometheus)
|
||||||
|
2. Object storage (MinIO or local filesystem)
|
||||||
|
3. Thanos Compactor (handles downsampling)
|
||||||
|
4. Thanos Query (provides unified query endpoint)
|
||||||
|
|
||||||
|
**Migration Steps:**
|
||||||
|
1. Deploy object storage (MinIO or configure filesystem backend)
|
||||||
|
2. Add Thanos sidecar pointing to Prometheus data directory
|
||||||
|
3. Add Thanos compactor with retention policies
|
||||||
|
4. Add Thanos query gateway
|
||||||
|
5. Update Grafana datasource to Thanos Query port (10902)
|
||||||
|
|
||||||
|
## Comparison
|
||||||
|
|
||||||
|
| Aspect | VictoriaMetrics | Thanos |
|
||||||
|
|--------|-----------------|--------|
|
||||||
|
| Complexity | Low (1 service) | Higher (3-4 services) |
|
||||||
|
| Downsampling | No | Yes (automatic) |
|
||||||
|
| Storage savings | 5-10x compression | Compression + downsampling |
|
||||||
|
| Object storage required | No | Yes |
|
||||||
|
| Migration effort | Minimal | Moderate |
|
||||||
|
| Grafana changes | Change port only | Change port only |
|
||||||
|
| Alerting changes | Need vmalert | Keep existing |
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
**Start with VictoriaMetrics** for simplicity. The compression alone may provide 6+ months of retention in the same disk space currently used for 30 days.
|
||||||
|
|
||||||
|
If multi-year retention with true downsampling becomes necessary, Thanos can be evaluated later. However, it requires deploying object storage infrastructure (MinIO) which adds operational complexity.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- VictoriaMetrics docs: https://docs.victoriametrics.com/
|
||||||
|
- Thanos docs: https://thanos.io/tip/thanos/getting-started.md/
|
||||||
|
- NixOS options searched from nixpkgs revision e576e3c9 (NixOS 25.11)
|
||||||
371
docs/plans/nats-deploy-service.md
Normal file
371
docs/plans/nats-deploy-service.md
Normal file
@@ -0,0 +1,371 @@
|
|||||||
|
# NATS-Based Deployment Service
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Create a message-based deployment system that allows triggering NixOS configuration updates on-demand, rather than waiting for the daily auto-upgrade timer. This enables faster iteration when testing changes and immediate fleet-wide deployments.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
1. **On-demand deployment** - Trigger config updates immediately via NATS message
|
||||||
|
2. **Targeted deployment** - Deploy to specific hosts or all hosts
|
||||||
|
3. **Branch/revision support** - Test feature branches before merging to master
|
||||||
|
4. **MCP integration** - Allow Claude Code to trigger deployments during development
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
- **Auto-upgrade**: All hosts run `nixos-upgrade.service` daily, pulling from master
|
||||||
|
- **Manual testing**: `nixos-rebuild-test <action> <branch>` helper exists on all hosts
|
||||||
|
- **NATS**: Running on nats1 with JetStream enabled, using NKey authentication
|
||||||
|
- **Accounts**: ADMIN (system) and HOMELAB (user workloads with JetStream)
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐ ┌─────────────┐
|
||||||
|
│ MCP Tool │ deploy.test.> │ Admin CLI │ deploy.test.> + deploy.prod.>
|
||||||
|
│ (claude) │────────────┐ ┌─────│ (torjus) │
|
||||||
|
└─────────────┘ │ │ └─────────────┘
|
||||||
|
▼ ▼
|
||||||
|
┌──────────────┐
|
||||||
|
│ nats1 │
|
||||||
|
│ (authz) │
|
||||||
|
└──────┬───────┘
|
||||||
|
│
|
||||||
|
┌─────────────────┼─────────────────┐
|
||||||
|
│ │ │
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||||
|
│ template1│ │ ns1 │ │ ha1 │
|
||||||
|
│ tier=test│ │ tier=prod│ │ tier=prod│
|
||||||
|
└──────────┘ └──────────┘ └──────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
The project lives in a **separate repository** (e.g., `homelab-deploy`) containing:
|
||||||
|
|
||||||
|
```
|
||||||
|
homelab-deploy/
|
||||||
|
├── flake.nix # Nix flake with Go package + NixOS module
|
||||||
|
├── go.mod
|
||||||
|
├── go.sum
|
||||||
|
├── cmd/
|
||||||
|
│ └── homelab-deploy/
|
||||||
|
│ └── main.go # CLI entrypoint with subcommands
|
||||||
|
├── internal/
|
||||||
|
│ ├── listener/ # Listener mode logic
|
||||||
|
│ ├── mcp/ # MCP server mode logic
|
||||||
|
│ └── deploy/ # Shared deployment logic
|
||||||
|
└── nixos/
|
||||||
|
└── module.nix # NixOS module for listener service
|
||||||
|
```
|
||||||
|
|
||||||
|
This repo imports the flake as an input and uses the NixOS module.
|
||||||
|
|
||||||
|
## Single Binary with Subcommands
|
||||||
|
|
||||||
|
The `homelab-deploy` binary supports multiple modes:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run as listener on a host (systemd service)
|
||||||
|
homelab-deploy listener --hostname ns1 --nats-url nats://nats1:4222
|
||||||
|
|
||||||
|
# Run as MCP server (for Claude Code)
|
||||||
|
homelab-deploy mcp --nats-url nats://nats1:4222
|
||||||
|
|
||||||
|
# CLI commands for manual use
|
||||||
|
homelab-deploy deploy ns1 --branch feature-x --action switch # single host
|
||||||
|
homelab-deploy deploy --tier test --all --action boot # all test hosts
|
||||||
|
homelab-deploy deploy --tier prod --all --action boot # all prod hosts (admin only)
|
||||||
|
homelab-deploy deploy --tier prod --role dns --action switch # all prod dns hosts
|
||||||
|
homelab-deploy status
|
||||||
|
```
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### Listener Mode
|
||||||
|
|
||||||
|
A systemd service on each host that:
|
||||||
|
- Subscribes to multiple subjects for targeted and group deployments
|
||||||
|
- Validates incoming messages (revision, action)
|
||||||
|
- Executes `nixos-rebuild` with specified parameters
|
||||||
|
- Reports status back via NATS
|
||||||
|
|
||||||
|
**Subject structure:**
|
||||||
|
```
|
||||||
|
deploy.<tier>.<hostname> # specific host (e.g., deploy.prod.ns1)
|
||||||
|
deploy.<tier>.all # all hosts in tier (e.g., deploy.test.all)
|
||||||
|
deploy.<tier>.role.<role> # all hosts with role in tier (e.g., deploy.prod.role.dns)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Listener subscriptions** (based on `homelab.host` config):
|
||||||
|
- `deploy.<tier>.<hostname>` - direct messages to this host
|
||||||
|
- `deploy.<tier>.all` - broadcast to all hosts in tier
|
||||||
|
- `deploy.<tier>.role.<role>` - broadcast to hosts with matching role (if role is set)
|
||||||
|
|
||||||
|
Example: ns1 with `tier=prod, role=dns` subscribes to:
|
||||||
|
- `deploy.prod.ns1`
|
||||||
|
- `deploy.prod.all`
|
||||||
|
- `deploy.prod.role.dns`
|
||||||
|
|
||||||
|
**NixOS module configuration:**
|
||||||
|
```nix
|
||||||
|
services.homelab-deploy.listener = {
|
||||||
|
enable = true;
|
||||||
|
timeout = 600; # seconds, default 10 minutes
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
The listener reads tier and role from `config.homelab.host` (see Host Metadata below).
|
||||||
|
|
||||||
|
**Request message format:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"action": "switch" | "boot" | "test" | "dry-activate",
|
||||||
|
"revision": "master" | "feature-branch" | "abc123...",
|
||||||
|
"reply_to": "deploy.responses.<request-id>"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response message format:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "accepted" | "rejected" | "started" | "completed" | "failed",
|
||||||
|
"error": "invalid_revision" | "already_running" | "build_failed" | null,
|
||||||
|
"message": "human-readable details"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request/Reply flow:**
|
||||||
|
1. MCP/CLI sends deploy request with unique `reply_to` subject
|
||||||
|
2. Listener validates request (e.g., `git ls-remote` to check revision exists)
|
||||||
|
3. Listener sends immediate response:
|
||||||
|
- `{"status": "rejected", "error": "invalid_revision", "message": "branch 'foo' not found"}`, or
|
||||||
|
- `{"status": "started", "message": "starting nixos-rebuild switch"}`
|
||||||
|
4. If started, listener runs nixos-rebuild
|
||||||
|
5. Listener sends final response:
|
||||||
|
- `{"status": "completed", "message": "successfully switched to generation 42"}`, or
|
||||||
|
- `{"status": "failed", "error": "build_failed", "message": "nixos-rebuild exited with code 1"}`
|
||||||
|
|
||||||
|
This provides immediate feedback on validation errors (bad revision, already running) without waiting for the build to fail.
|
||||||
|
|
||||||
|
### MCP Mode
|
||||||
|
|
||||||
|
Runs as an MCP server providing tools for Claude Code.
|
||||||
|
|
||||||
|
**Tools:**
|
||||||
|
| Tool | Description | Tier Access |
|
||||||
|
|------|-------------|-------------|
|
||||||
|
| `deploy` | Deploy to test hosts (individual, all, or by role) | test only |
|
||||||
|
| `deploy_admin` | Deploy to any host (requires `--enable-admin` flag) | test + prod |
|
||||||
|
| `deploy_status` | Check deployment status/history | n/a |
|
||||||
|
| `list_hosts` | List available deployment targets | n/a |
|
||||||
|
|
||||||
|
**CLI flags:**
|
||||||
|
```bash
|
||||||
|
# Default: only test-tier deployments available
|
||||||
|
homelab-deploy mcp --nats-url nats://nats1:4222
|
||||||
|
|
||||||
|
# Enable admin tool (requires admin NKey to be configured)
|
||||||
|
homelab-deploy mcp --nats-url nats://nats1:4222 --enable-admin --admin-nkey-file /path/to/admin.nkey
|
||||||
|
```
|
||||||
|
|
||||||
|
**Security layers:**
|
||||||
|
1. **MCP flag**: `deploy_admin` tool only exposed when `--enable-admin` is passed
|
||||||
|
2. **NATS authz**: Even if tool is exposed, NATS rejects publishes without valid admin NKey
|
||||||
|
3. **Claude Code permissions**: Can set `mcp__homelab-deploy__deploy_admin` to `ask` mode for confirmation popup
|
||||||
|
|
||||||
|
By default, the MCP only loads test-tier credentials and exposes the `deploy` tool. Claude can:
|
||||||
|
- Deploy to individual test hosts
|
||||||
|
- Deploy to all test hosts at once (`deploy.test.all`)
|
||||||
|
- Deploy to test hosts by role (`deploy.test.role.<role>`)
|
||||||
|
|
||||||
|
### Tiered Permissions
|
||||||
|
|
||||||
|
Authorization is enforced at the NATS layer using subject-based permissions. Different deployer credentials have different publish rights:
|
||||||
|
|
||||||
|
**NATS user configuration (on nats1):**
|
||||||
|
```nix
|
||||||
|
accounts = {
|
||||||
|
HOMELAB = {
|
||||||
|
users = [
|
||||||
|
# MCP/Claude - test tier only
|
||||||
|
{
|
||||||
|
nkey = "UABC..."; # mcp-deployer
|
||||||
|
permissions = {
|
||||||
|
publish = [ "deploy.test.>" ];
|
||||||
|
subscribe = [ "deploy.responses.>" ];
|
||||||
|
};
|
||||||
|
}
|
||||||
|
# Admin - full access to all tiers
|
||||||
|
{
|
||||||
|
nkey = "UXYZ..."; # admin-deployer
|
||||||
|
permissions = {
|
||||||
|
publish = [ "deploy.test.>" "deploy.prod.>" ];
|
||||||
|
subscribe = [ "deploy.responses.>" ];
|
||||||
|
};
|
||||||
|
}
|
||||||
|
# Host listeners - subscribe to their tier, publish responses
|
||||||
|
{
|
||||||
|
nkey = "UDEF..."; # host-listener (one per host)
|
||||||
|
permissions = {
|
||||||
|
subscribe = [ "deploy.*.>" ];
|
||||||
|
publish = [ "deploy.responses.>" ];
|
||||||
|
};
|
||||||
|
}
|
||||||
|
];
|
||||||
|
};
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Host tier assignments** (via `homelab.host.tier`):
|
||||||
|
| Tier | Hosts |
|
||||||
|
|------|-------|
|
||||||
|
| test | template1, nix-cache01, future test hosts |
|
||||||
|
| prod | ns1, ns2, ha1, monitoring01, http-proxy, etc. |
|
||||||
|
|
||||||
|
**Example deployment scenarios:**
|
||||||
|
|
||||||
|
| Command | Subject | MCP | Admin |
|
||||||
|
|---------|---------|-----|-------|
|
||||||
|
| Deploy to ns1 | `deploy.prod.ns1` | ❌ | ✅ |
|
||||||
|
| Deploy to template1 | `deploy.test.template1` | ✅ | ✅ |
|
||||||
|
| Deploy to all test hosts | `deploy.test.all` | ✅ | ✅ |
|
||||||
|
| Deploy to all prod hosts | `deploy.prod.all` | ❌ | ✅ |
|
||||||
|
| Deploy to all DNS servers | `deploy.prod.role.dns` | ❌ | ✅ |
|
||||||
|
|
||||||
|
All NKeys stored in Vault - MCP gets limited credentials, admin CLI gets full-access credentials.
|
||||||
|
|
||||||
|
### Host Metadata
|
||||||
|
|
||||||
|
Rather than defining `tier` in the listener config, use a central `homelab.host` module that provides host metadata for multiple consumers. This aligns with the approach proposed in `docs/plans/prometheus-scrape-target-labels.md`.
|
||||||
|
|
||||||
|
**Status:** The `homelab.host` module is implemented in `modules/homelab/host.nix`.
|
||||||
|
Hosts can be filtered by tier using `config.homelab.host.tier`.
|
||||||
|
|
||||||
|
**Module definition (in `modules/homelab/host.nix`):**
|
||||||
|
```nix
|
||||||
|
homelab.host = {
|
||||||
|
tier = lib.mkOption {
|
||||||
|
type = lib.types.enum [ "test" "prod" ];
|
||||||
|
default = "prod";
|
||||||
|
description = "Deployment tier - controls which credentials can deploy to this host";
|
||||||
|
};
|
||||||
|
|
||||||
|
priority = lib.mkOption {
|
||||||
|
type = lib.types.enum [ "high" "low" ];
|
||||||
|
default = "high";
|
||||||
|
description = "Alerting priority - low priority hosts have relaxed thresholds";
|
||||||
|
};
|
||||||
|
|
||||||
|
role = lib.mkOption {
|
||||||
|
type = lib.types.nullOr lib.types.str;
|
||||||
|
default = null;
|
||||||
|
description = "Primary role of this host (dns, database, monitoring, etc.)";
|
||||||
|
};
|
||||||
|
|
||||||
|
labels = lib.mkOption {
|
||||||
|
type = lib.types.attrsOf lib.types.str;
|
||||||
|
default = { };
|
||||||
|
description = "Additional free-form labels";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Consumers:**
|
||||||
|
- `homelab-deploy` listener reads `config.homelab.host.tier` for subject subscription
|
||||||
|
- Prometheus scrape config reads `priority`, `role`, `labels` for target labels
|
||||||
|
- Future services can consume the same metadata
|
||||||
|
|
||||||
|
**Example host config:**
|
||||||
|
```nix
|
||||||
|
# hosts/nix-cache01/configuration.nix
|
||||||
|
homelab.host = {
|
||||||
|
tier = "test"; # can be deployed by MCP
|
||||||
|
priority = "low"; # relaxed alerting thresholds
|
||||||
|
role = "build-host";
|
||||||
|
};
|
||||||
|
|
||||||
|
# hosts/ns1/configuration.nix
|
||||||
|
homelab.host = {
|
||||||
|
tier = "prod"; # requires admin credentials
|
||||||
|
priority = "high";
|
||||||
|
role = "dns";
|
||||||
|
labels.dns_role = "primary";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementation Steps
|
||||||
|
|
||||||
|
### Phase 1: Core Binary + Listener
|
||||||
|
|
||||||
|
1. **Create homelab-deploy repository**
|
||||||
|
- Initialize Go module
|
||||||
|
- Set up flake.nix with Go package build
|
||||||
|
|
||||||
|
2. **Implement listener mode**
|
||||||
|
- NATS subscription logic
|
||||||
|
- nixos-rebuild execution
|
||||||
|
- Status reporting via NATS reply
|
||||||
|
|
||||||
|
3. **Create NixOS module**
|
||||||
|
- Systemd service definition
|
||||||
|
- Configuration options (hostname, NATS URL, NKey path)
|
||||||
|
- Vault secret integration for NKeys
|
||||||
|
|
||||||
|
4. **Create `homelab.host` module** (in nixos-servers)
|
||||||
|
- Define `tier`, `priority`, `role`, `labels` options
|
||||||
|
- This module is shared with Prometheus label work (see `docs/plans/prometheus-scrape-target-labels.md`)
|
||||||
|
|
||||||
|
5. **Integrate with nixos-servers**
|
||||||
|
- Add flake input for homelab-deploy
|
||||||
|
- Import listener module in `system/`
|
||||||
|
- Set `homelab.host.tier` per host (test vs prod)
|
||||||
|
|
||||||
|
6. **Configure NATS tiered permissions**
|
||||||
|
- Add deployer users to nats1 config (mcp-deployer, admin-deployer)
|
||||||
|
- Set up subject ACLs per user (test-only vs full access)
|
||||||
|
- Add deployer NKeys to Vault
|
||||||
|
- Create Terraform resources for NKey secrets
|
||||||
|
|
||||||
|
### Phase 2: MCP + CLI
|
||||||
|
|
||||||
|
7. **Implement MCP mode**
|
||||||
|
- MCP server with deploy/status tools
|
||||||
|
- Request/reply pattern for deployment feedback
|
||||||
|
|
||||||
|
8. **Implement CLI commands**
|
||||||
|
- `deploy` command for manual deployments
|
||||||
|
- `status` command to check deployment state
|
||||||
|
|
||||||
|
9. **Configure Claude Code**
|
||||||
|
- Add MCP server to configuration
|
||||||
|
- Document usage
|
||||||
|
|
||||||
|
### Phase 3: Enhancements
|
||||||
|
|
||||||
|
10. Add deployment locking (prevent concurrent deploys)
|
||||||
|
11. Prometheus metrics for deployment status
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
- **Privilege escalation**: Listener runs as root to execute nixos-rebuild
|
||||||
|
- **Input validation**: Strictly validate revision format (branch name or commit hash)
|
||||||
|
- **Rate limiting**: Prevent rapid-fire deployments
|
||||||
|
- **Audit logging**: Log all deployment requests with source identity
|
||||||
|
- **Network isolation**: NATS only accessible from internal network
|
||||||
|
|
||||||
|
## Decisions
|
||||||
|
|
||||||
|
All open questions have been resolved. See Notes section for decision rationale.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- The existing `nixos-rebuild-test` helper provides a good reference for the rebuild logic
|
||||||
|
- Uses NATS request/reply pattern for immediate validation feedback and completion status
|
||||||
|
- Consider using NATS headers for metadata (request ID, timestamp)
|
||||||
|
- **Timeout decision**: Metrics show no-change upgrades complete in 5-55 seconds. A 10-minute default provides ample headroom for actual updates with package downloads. Per-host override available for hosts with known longer build times.
|
||||||
|
- **Rollback**: Not needed as a separate feature - deploy an older commit hash to effectively rollback.
|
||||||
|
- **Offline hosts**: No message persistence - if host is offline, deploy fails. Daily auto-upgrade is the safety net. Avoids complexity of JetStream deduplication (host coming online and applying 10 queued updates instead of just the latest).
|
||||||
|
- **Deploy history**: Use existing Loki - listener logs deployments to journald, queryable via Loki. No need for separate JetStream persistence.
|
||||||
|
- **Naming**: `homelab-deploy` - ties it to the infrastructure rather than implementation details.
|
||||||
@@ -4,6 +4,8 @@
|
|||||||
|
|
||||||
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
|
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
|
||||||
|
|
||||||
|
**Related:** This plan shares the `homelab.host` module with `docs/plans/nats-deploy-service.md`, which uses the same metadata for deployment tier assignment.
|
||||||
|
|
||||||
## Motivation
|
## Motivation
|
||||||
|
|
||||||
Some hosts have workloads that make generic alert thresholds inappropriate. For example, `nix-cache01` regularly hits high CPU during builds, requiring a longer `for` duration on `high_cpu_load`. Currently this is handled by excluding specific instance names in PromQL expressions, which is brittle and doesn't scale.
|
Some hosts have workloads that make generic alert thresholds inappropriate. For example, `nix-cache01` regularly hits high CPU during builds, requiring a longer `for` duration on `high_cpu_load`. Currently this is handled by excluding specific instance names in PromQL expressions, which is brittle and doesn't scale.
|
||||||
@@ -32,24 +34,82 @@ Values: free-form string, e.g. `"dns"`, `"build-host"`, `"database"`, `"monitori
|
|||||||
|
|
||||||
Recommendation: start with a single primary role string. If multi-role matching becomes a real need, switch to separate boolean labels.
|
Recommendation: start with a single primary role string. If multi-role matching becomes a real need, switch to separate boolean labels.
|
||||||
|
|
||||||
|
### `dns_role`
|
||||||
|
|
||||||
|
For DNS servers specifically, distinguish between primary and secondary resolvers. The secondary resolver (ns2) receives very little traffic and has a cold cache, making generic cache hit ratio alerts inappropriate.
|
||||||
|
|
||||||
|
Values: `"primary"`, `"secondary"`
|
||||||
|
|
||||||
|
Example use case: The `unbound_low_cache_hit_ratio` alert fires on ns2 because its cache hit ratio (~62%) is lower than ns1 (~90%). This is expected behavior since ns2 gets ~100x less traffic. With a `dns_role` label, the alert can either exclude secondaries or use different thresholds:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
# Only alert on primary DNS
|
||||||
|
unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"}
|
||||||
|
|
||||||
|
# Or use different thresholds
|
||||||
|
(unbound_cache_hit_ratio < 0.7 and on(instance) unbound_up{dns_role="primary"})
|
||||||
|
or
|
||||||
|
(unbound_cache_hit_ratio < 0.5 and on(instance) unbound_up{dns_role="secondary"})
|
||||||
|
```
|
||||||
|
|
||||||
## Implementation
|
## Implementation
|
||||||
|
|
||||||
### 1. Add `labels` option to `homelab.monitoring`
|
This implementation uses a shared `homelab.host` module that provides host metadata for multiple consumers (Prometheus labels, deployment tiers, etc.). See also `docs/plans/nats-deploy-service.md` which uses the same module for deployment tier assignment.
|
||||||
|
|
||||||
In `modules/homelab/monitoring.nix`, add:
|
### 1. Create `homelab.host` module
|
||||||
|
|
||||||
|
**Status:** Step 1 (Create `homelab.host` module) is complete. The module is in
|
||||||
|
`modules/homelab/host.nix` with tier, priority, role, and labels options.
|
||||||
|
|
||||||
|
Create `modules/homelab/host.nix` with shared host metadata options:
|
||||||
|
|
||||||
```nix
|
```nix
|
||||||
|
{ lib, ... }:
|
||||||
|
{
|
||||||
|
options.homelab.host = {
|
||||||
|
tier = lib.mkOption {
|
||||||
|
type = lib.types.enum [ "test" "prod" ];
|
||||||
|
default = "prod";
|
||||||
|
description = "Deployment tier - controls which credentials can deploy to this host";
|
||||||
|
};
|
||||||
|
|
||||||
|
priority = lib.mkOption {
|
||||||
|
type = lib.types.enum [ "high" "low" ];
|
||||||
|
default = "high";
|
||||||
|
description = "Alerting priority - low priority hosts have relaxed thresholds";
|
||||||
|
};
|
||||||
|
|
||||||
|
role = lib.mkOption {
|
||||||
|
type = lib.types.nullOr lib.types.str;
|
||||||
|
default = null;
|
||||||
|
description = "Primary role of this host (dns, database, monitoring, etc.)";
|
||||||
|
};
|
||||||
|
|
||||||
labels = lib.mkOption {
|
labels = lib.mkOption {
|
||||||
type = lib.types.attrsOf lib.types.str;
|
type = lib.types.attrsOf lib.types.str;
|
||||||
default = { };
|
default = { };
|
||||||
description = "Custom labels to attach to this host's scrape targets";
|
description = "Additional free-form labels (e.g., dns_role = 'primary')";
|
||||||
};
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Import this module in `modules/homelab/default.nix`.
|
||||||
|
|
||||||
### 2. Update `lib/monitoring.nix`
|
### 2. Update `lib/monitoring.nix`
|
||||||
|
|
||||||
- `extractHostMonitoring` should carry `labels` through in its return value.
|
- `extractHostMonitoring` should also extract `homelab.host` values (priority, role, labels).
|
||||||
- `generateNodeExporterTargets` currently returns a flat list of target strings. It needs to return structured `static_configs` entries instead, grouping targets by their label sets:
|
- Build the combined label set from `homelab.host`:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
# Combine structured options + free-form labels
|
||||||
|
effectiveLabels =
|
||||||
|
(lib.optionalAttrs (host.priority != "high") { priority = host.priority; })
|
||||||
|
// (lib.optionalAttrs (host.role != null) { role = host.role; })
|
||||||
|
// host.labels;
|
||||||
|
```
|
||||||
|
|
||||||
|
- `generateNodeExporterTargets` returns structured `static_configs` entries, grouping targets by their label sets:
|
||||||
|
|
||||||
```nix
|
```nix
|
||||||
# Before (flat list):
|
# Before (flat list):
|
||||||
@@ -62,7 +122,7 @@ labels = lib.mkOption {
|
|||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
This requires grouping hosts by their label attrset and producing one `static_configs` entry per unique label combination. Hosts with no custom labels get grouped together with no extra labels (preserving current behavior).
|
This requires grouping hosts by their label attrset and producing one `static_configs` entry per unique label combination. Hosts with default values (priority=high, no role, no labels) get grouped together with no extra labels (preserving current behavior).
|
||||||
|
|
||||||
### 3. Update `services/monitoring/prometheus.nix`
|
### 3. Update `services/monitoring/prometheus.nix`
|
||||||
|
|
||||||
@@ -76,17 +136,29 @@ static_configs = [{ targets = nodeExporterTargets; }];
|
|||||||
static_configs = nodeExporterTargets;
|
static_configs = nodeExporterTargets;
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Set labels on hosts
|
### 4. Set metadata on hosts
|
||||||
|
|
||||||
Example in `hosts/nix-cache01/configuration.nix` or the relevant service module:
|
Example in `hosts/nix-cache01/configuration.nix`:
|
||||||
|
|
||||||
```nix
|
```nix
|
||||||
homelab.monitoring.labels = {
|
homelab.host = {
|
||||||
priority = "low";
|
tier = "test"; # can be deployed by MCP (used by homelab-deploy)
|
||||||
|
priority = "low"; # relaxed alerting thresholds
|
||||||
role = "build-host";
|
role = "build-host";
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Example in `hosts/ns1/configuration.nix`:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
homelab.host = {
|
||||||
|
tier = "prod";
|
||||||
|
priority = "high";
|
||||||
|
role = "dns";
|
||||||
|
labels.dns_role = "primary";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
### 5. Update alert rules
|
### 5. Update alert rules
|
||||||
|
|
||||||
After implementing labels, review and update `services/monitoring/rules.yml`:
|
After implementing labels, review and update `services/monitoring/rules.yml`:
|
||||||
|
|||||||
49
flake.lock
generated
49
flake.lock
generated
@@ -21,6 +21,27 @@
|
|||||||
"url": "https://git.t-juice.club/torjus/alerttonotify"
|
"url": "https://git.t-juice.club/torjus/alerttonotify"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"homelab-deploy": {
|
||||||
|
"inputs": {
|
||||||
|
"nixpkgs": [
|
||||||
|
"nixpkgs-unstable"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"locked": {
|
||||||
|
"lastModified": 1770447502,
|
||||||
|
"narHash": "sha256-xH1PNyE3ydj4udhe1IpK8VQxBPZETGLuORZdSWYRmSU=",
|
||||||
|
"ref": "master",
|
||||||
|
"rev": "79db119d1ca6630023947ef0a65896cc3307c2ff",
|
||||||
|
"revCount": 22,
|
||||||
|
"type": "git",
|
||||||
|
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
||||||
|
},
|
||||||
|
"original": {
|
||||||
|
"ref": "master",
|
||||||
|
"type": "git",
|
||||||
|
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
||||||
|
}
|
||||||
|
},
|
||||||
"labmon": {
|
"labmon": {
|
||||||
"inputs": {
|
"inputs": {
|
||||||
"nixpkgs": [
|
"nixpkgs": [
|
||||||
@@ -42,6 +63,26 @@
|
|||||||
"url": "https://git.t-juice.club/torjus/labmon"
|
"url": "https://git.t-juice.club/torjus/labmon"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"nixos-exporter": {
|
||||||
|
"inputs": {
|
||||||
|
"nixpkgs": [
|
||||||
|
"nixpkgs-unstable"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"locked": {
|
||||||
|
"lastModified": 1770422522,
|
||||||
|
"narHash": "sha256-WmIFnquu4u58v8S2bOVWmknRwHn4x88CRfBFTzJ1inQ=",
|
||||||
|
"ref": "refs/heads/master",
|
||||||
|
"rev": "cf0ce858997af4d8dcc2ce10393ff393e17fc911",
|
||||||
|
"revCount": 11,
|
||||||
|
"type": "git",
|
||||||
|
"url": "https://git.t-juice.club/torjus/nixos-exporter"
|
||||||
|
},
|
||||||
|
"original": {
|
||||||
|
"type": "git",
|
||||||
|
"url": "https://git.t-juice.club/torjus/nixos-exporter"
|
||||||
|
}
|
||||||
|
},
|
||||||
"nixpkgs": {
|
"nixpkgs": {
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1770136044,
|
"lastModified": 1770136044,
|
||||||
@@ -60,11 +101,11 @@
|
|||||||
},
|
},
|
||||||
"nixpkgs-unstable": {
|
"nixpkgs-unstable": {
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1770181073,
|
"lastModified": 1770197578,
|
||||||
"narHash": "sha256-ksTL7P9QC1WfZasNlaAdLOzqD8x5EPyods69YBqxSfk=",
|
"narHash": "sha256-AYqlWrX09+HvGs8zM6ebZ1pwUqjkfpnv8mewYwAo+iM=",
|
||||||
"owner": "nixos",
|
"owner": "nixos",
|
||||||
"repo": "nixpkgs",
|
"repo": "nixpkgs",
|
||||||
"rev": "bf922a59c5c9998a6584645f7d0de689512e444c",
|
"rev": "00c21e4c93d963c50d4c0c89bfa84ed6e0694df2",
|
||||||
"type": "github"
|
"type": "github"
|
||||||
},
|
},
|
||||||
"original": {
|
"original": {
|
||||||
@@ -77,7 +118,9 @@
|
|||||||
"root": {
|
"root": {
|
||||||
"inputs": {
|
"inputs": {
|
||||||
"alerttonotify": "alerttonotify",
|
"alerttonotify": "alerttonotify",
|
||||||
|
"homelab-deploy": "homelab-deploy",
|
||||||
"labmon": "labmon",
|
"labmon": "labmon",
|
||||||
|
"nixos-exporter": "nixos-exporter",
|
||||||
"nixpkgs": "nixpkgs",
|
"nixpkgs": "nixpkgs",
|
||||||
"nixpkgs-unstable": "nixpkgs-unstable",
|
"nixpkgs-unstable": "nixpkgs-unstable",
|
||||||
"sops-nix": "sops-nix"
|
"sops-nix": "sops-nix"
|
||||||
|
|||||||
213
flake.nix
213
flake.nix
@@ -17,6 +17,14 @@
|
|||||||
url = "git+https://git.t-juice.club/torjus/labmon?ref=master";
|
url = "git+https://git.t-juice.club/torjus/labmon?ref=master";
|
||||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||||
};
|
};
|
||||||
|
nixos-exporter = {
|
||||||
|
url = "git+https://git.t-juice.club/torjus/nixos-exporter";
|
||||||
|
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||||
|
};
|
||||||
|
homelab-deploy = {
|
||||||
|
url = "git+https://git.t-juice.club/torjus/homelab-deploy?ref=master";
|
||||||
|
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
outputs =
|
outputs =
|
||||||
@@ -27,6 +35,8 @@
|
|||||||
sops-nix,
|
sops-nix,
|
||||||
alerttonotify,
|
alerttonotify,
|
||||||
labmon,
|
labmon,
|
||||||
|
nixos-exporter,
|
||||||
|
homelab-deploy,
|
||||||
...
|
...
|
||||||
}@inputs:
|
}@inputs:
|
||||||
let
|
let
|
||||||
@@ -42,6 +52,20 @@
|
|||||||
alerttonotify.overlays.default
|
alerttonotify.overlays.default
|
||||||
labmon.overlays.default
|
labmon.overlays.default
|
||||||
];
|
];
|
||||||
|
# Common modules applied to all hosts
|
||||||
|
commonModules = [
|
||||||
|
(
|
||||||
|
{ config, pkgs, ... }:
|
||||||
|
{
|
||||||
|
nixpkgs.overlays = commonOverlays;
|
||||||
|
system.configurationRevision = self.rev or self.dirtyRev or "dirty";
|
||||||
|
}
|
||||||
|
)
|
||||||
|
sops-nix.nixosModules.sops
|
||||||
|
nixos-exporter.nixosModules.default
|
||||||
|
homelab-deploy.nixosModules.default
|
||||||
|
./modules/homelab
|
||||||
|
];
|
||||||
allSystems = [
|
allSystems = [
|
||||||
"x86_64-linux"
|
"x86_64-linux"
|
||||||
"aarch64-linux"
|
"aarch64-linux"
|
||||||
@@ -58,15 +82,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/ns1
|
./hosts/ns1
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
ns2 = nixpkgs.lib.nixosSystem {
|
ns2 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -74,15 +91,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/ns2
|
./hosts/ns2
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
ha1 = nixpkgs.lib.nixosSystem {
|
ha1 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -90,15 +100,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/ha1
|
./hosts/ha1
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
template1 = nixpkgs.lib.nixosSystem {
|
template1 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -106,15 +109,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/template
|
./hosts/template
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
template2 = nixpkgs.lib.nixosSystem {
|
template2 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -122,15 +118,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/template2
|
./hosts/template2
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
http-proxy = nixpkgs.lib.nixosSystem {
|
http-proxy = nixpkgs.lib.nixosSystem {
|
||||||
@@ -138,15 +127,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/http-proxy
|
./hosts/http-proxy
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
ca = nixpkgs.lib.nixosSystem {
|
ca = nixpkgs.lib.nixosSystem {
|
||||||
@@ -154,15 +136,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/ca
|
./hosts/ca
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
monitoring01 = nixpkgs.lib.nixosSystem {
|
monitoring01 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -170,15 +145,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/monitoring01
|
./hosts/monitoring01
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
labmon.nixosModules.labmon
|
labmon.nixosModules.labmon
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
@@ -187,15 +155,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/jelly01
|
./hosts/jelly01
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
nix-cache01 = nixpkgs.lib.nixosSystem {
|
nix-cache01 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -203,15 +164,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/nix-cache01
|
./hosts/nix-cache01
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
pgdb1 = nixpkgs.lib.nixosSystem {
|
pgdb1 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -219,15 +173,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/pgdb1
|
./hosts/pgdb1
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
nats1 = nixpkgs.lib.nixosSystem {
|
nats1 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -235,47 +182,8 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/nats1
|
./hosts/nats1
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
|
||||||
};
|
|
||||||
auth01 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = [
|
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/auth01
|
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
|
||||||
};
|
|
||||||
testvm01 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = [
|
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/testvm01
|
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
vault01 = nixpkgs.lib.nixosSystem {
|
vault01 = nixpkgs.lib.nixosSystem {
|
||||||
@@ -283,31 +191,35 @@
|
|||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{ config, pkgs, ... }:
|
|
||||||
{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}
|
|
||||||
)
|
|
||||||
./hosts/vault01
|
./hosts/vault01
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
vaulttest01 = nixpkgs.lib.nixosSystem {
|
testvm01 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
};
|
};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
./hosts/testvm01
|
||||||
{ config, pkgs, ... }:
|
];
|
||||||
{
|
};
|
||||||
nixpkgs.overlays = commonOverlays;
|
testvm02 = nixpkgs.lib.nixosSystem {
|
||||||
}
|
inherit system;
|
||||||
)
|
specialArgs = {
|
||||||
./hosts/vaulttest01
|
inherit inputs self sops-nix;
|
||||||
sops-nix.nixosModules.sops
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/testvm02
|
||||||
|
];
|
||||||
|
};
|
||||||
|
testvm03 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self sops-nix;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/testvm03
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
@@ -322,11 +234,12 @@
|
|||||||
{ pkgs }:
|
{ pkgs }:
|
||||||
{
|
{
|
||||||
default = pkgs.mkShell {
|
default = pkgs.mkShell {
|
||||||
packages = with pkgs; [
|
packages = [
|
||||||
ansible
|
pkgs.ansible
|
||||||
opentofu
|
pkgs.opentofu
|
||||||
openbao
|
pkgs.openbao
|
||||||
(pkgs.callPackage ./scripts/create-host { })
|
(pkgs.callPackage ./scripts/create-host { })
|
||||||
|
homelab-deploy.packages.${pkgs.system}.default
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,8 +0,0 @@
|
|||||||
{ ... }:
|
|
||||||
{
|
|
||||||
imports = [
|
|
||||||
./configuration.nix
|
|
||||||
../../services/lldap
|
|
||||||
../../services/authelia
|
|
||||||
];
|
|
||||||
}
|
|
||||||
@@ -55,8 +55,17 @@
|
|||||||
git
|
git
|
||||||
];
|
];
|
||||||
|
|
||||||
|
# Vault secrets management
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
vault.secrets.backup-helper = {
|
||||||
|
secretPath = "shared/backup/password";
|
||||||
|
extractKey = "password";
|
||||||
|
outputDir = "/run/secrets/backup_helper_secret";
|
||||||
|
services = [ "restic-backups-ha1" ];
|
||||||
|
};
|
||||||
|
|
||||||
# Backup service dirs
|
# Backup service dirs
|
||||||
sops.secrets."backup_helper_secret" = { };
|
|
||||||
services.restic.backups.ha1 = {
|
services.restic.backups.ha1 = {
|
||||||
repository = "rest:http://10.69.12.52:8000/backup-nix";
|
repository = "rest:http://10.69.12.52:8000/backup-nix";
|
||||||
passwordFile = "/run/secrets/backup_helper_secret";
|
passwordFile = "/run/secrets/backup_helper_secret";
|
||||||
@@ -68,6 +77,7 @@
|
|||||||
timerConfig = {
|
timerConfig = {
|
||||||
OnCalendar = "daily";
|
OnCalendar = "daily";
|
||||||
Persistent = true;
|
Persistent = true;
|
||||||
|
RandomizedDelaySec = "2h";
|
||||||
};
|
};
|
||||||
pruneOpts = [
|
pruneOpts = [
|
||||||
"--keep-daily 7"
|
"--keep-daily 7"
|
||||||
|
|||||||
@@ -21,8 +21,6 @@
|
|||||||
"prometheus"
|
"prometheus"
|
||||||
"alertmanager"
|
"alertmanager"
|
||||||
"jelly"
|
"jelly"
|
||||||
"auth"
|
|
||||||
"lldap"
|
|
||||||
"pyroscope"
|
"pyroscope"
|
||||||
"pushgw"
|
"pushgw"
|
||||||
];
|
];
|
||||||
@@ -62,6 +60,9 @@
|
|||||||
"nix-command"
|
"nix-command"
|
||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nix.settings.tarball-ttl = 0;
|
nix.settings.tarball-ttl = 0;
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
vim
|
vim
|
||||||
|
|||||||
@@ -1,9 +1,12 @@
|
|||||||
{ config, ... }:
|
{ config, ... }:
|
||||||
{
|
{
|
||||||
sops.secrets.wireguard_private_key = {
|
vault.secrets.wireguard = {
|
||||||
sopsFile = ../../secrets/http-proxy/wireguard.yaml;
|
secretPath = "hosts/http-proxy/wireguard";
|
||||||
key = "wg_private_key";
|
extractKey = "private_key";
|
||||||
|
outputDir = "/run/secrets/wireguard_private_key";
|
||||||
|
services = [ "wireguard-wg0" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
networking.wireguard = {
|
networking.wireguard = {
|
||||||
enable = true;
|
enable = true;
|
||||||
useNetworkd = true;
|
useNetworkd = true;
|
||||||
@@ -13,7 +16,7 @@
|
|||||||
ips = [ "10.69.222.3/24" ];
|
ips = [ "10.69.222.3/24" ];
|
||||||
mtu = 1384;
|
mtu = 1384;
|
||||||
listenPort = 51820;
|
listenPort = 51820;
|
||||||
privateKeyFile = config.sops.secrets.wireguard_private_key.path;
|
privateKeyFile = "/run/secrets/wireguard_private_key";
|
||||||
peers = [
|
peers = [
|
||||||
{
|
{
|
||||||
name = "docker2.t-juice.club";
|
name = "docker2.t-juice.club";
|
||||||
|
|||||||
@@ -8,6 +8,9 @@
|
|||||||
];
|
];
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
|
||||||
|
homelab.host.role = "bastion";
|
||||||
|
|
||||||
# Use the systemd-boot EFI boot loader.
|
# Use the systemd-boot EFI boot loader.
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/sda";
|
boot.loader.grub.device = "/dev/sda";
|
||||||
|
|||||||
@@ -56,7 +56,16 @@
|
|||||||
|
|
||||||
services.qemuGuest.enable = true;
|
services.qemuGuest.enable = true;
|
||||||
|
|
||||||
sops.secrets."backup_helper_secret" = { };
|
# Vault secrets management
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
vault.secrets.backup-helper = {
|
||||||
|
secretPath = "shared/backup/password";
|
||||||
|
extractKey = "password";
|
||||||
|
outputDir = "/run/secrets/backup_helper_secret";
|
||||||
|
services = [ "restic-backups-grafana" "restic-backups-grafana-db" ];
|
||||||
|
};
|
||||||
|
|
||||||
services.restic.backups.grafana = {
|
services.restic.backups.grafana = {
|
||||||
repository = "rest:http://10.69.12.52:8000/backup-nix";
|
repository = "rest:http://10.69.12.52:8000/backup-nix";
|
||||||
passwordFile = "/run/secrets/backup_helper_secret";
|
passwordFile = "/run/secrets/backup_helper_secret";
|
||||||
@@ -64,6 +73,7 @@
|
|||||||
timerConfig = {
|
timerConfig = {
|
||||||
OnCalendar = "daily";
|
OnCalendar = "daily";
|
||||||
Persistent = true;
|
Persistent = true;
|
||||||
|
RandomizedDelaySec = "2h";
|
||||||
};
|
};
|
||||||
pruneOpts = [
|
pruneOpts = [
|
||||||
"--keep-daily 7"
|
"--keep-daily 7"
|
||||||
@@ -80,6 +90,7 @@
|
|||||||
timerConfig = {
|
timerConfig = {
|
||||||
OnCalendar = "daily";
|
OnCalendar = "daily";
|
||||||
Persistent = true;
|
Persistent = true;
|
||||||
|
RandomizedDelaySec = "2h";
|
||||||
};
|
};
|
||||||
pruneOpts = [
|
pruneOpts = [
|
||||||
"--keep-daily 7"
|
"--keep-daily 7"
|
||||||
|
|||||||
@@ -13,6 +13,8 @@
|
|||||||
|
|
||||||
homelab.dns.cnames = [ "nix-cache" "actions1" ];
|
homelab.dns.cnames = [ "nix-cache" "actions1" ];
|
||||||
|
|
||||||
|
homelab.host.role = "build-host";
|
||||||
|
|
||||||
fileSystems."/nix" = {
|
fileSystems."/nix" = {
|
||||||
device = "/dev/disk/by-label/nixcache";
|
device = "/dev/disk/by-label/nixcache";
|
||||||
fsType = "xfs";
|
fsType = "xfs";
|
||||||
@@ -52,6 +54,9 @@
|
|||||||
"nix-command"
|
"nix-command"
|
||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nix.settings.tarball-ttl = 0;
|
nix.settings.tarball-ttl = 0;
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
vim
|
vim
|
||||||
|
|||||||
@@ -47,6 +47,14 @@
|
|||||||
"nix-command"
|
"nix-command"
|
||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
homelab.host = {
|
||||||
|
role = "dns";
|
||||||
|
labels.dns_role = "primary";
|
||||||
|
};
|
||||||
|
|
||||||
nix.settings.tarball-ttl = 0;
|
nix.settings.tarball-ttl = 0;
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
vim
|
vim
|
||||||
|
|||||||
@@ -47,6 +47,14 @@
|
|||||||
"nix-command"
|
"nix-command"
|
||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
homelab.host = {
|
||||||
|
role = "dns";
|
||||||
|
labels.dns_role = "secondary";
|
||||||
|
};
|
||||||
|
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
vim
|
vim
|
||||||
wget
|
wget
|
||||||
|
|||||||
@@ -11,6 +11,11 @@
|
|||||||
# Template host - exclude from DNS zone generation
|
# Template host - exclude from DNS zone generation
|
||||||
homelab.dns.enable = false;
|
homelab.dns.enable = false;
|
||||||
|
|
||||||
|
homelab.host = {
|
||||||
|
tier = "test";
|
||||||
|
priority = "low";
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/sda";
|
boot.loader.grub.device = "/dev/sda";
|
||||||
|
|||||||
@@ -1,7 +1,9 @@
|
|||||||
{ pkgs, ... }:
|
{ pkgs, ... }:
|
||||||
let
|
let
|
||||||
prepare-host-script = pkgs.writeShellScriptBin "prepare-host.sh"
|
prepare-host-script = pkgs.writeShellApplication {
|
||||||
''
|
name = "prepare-host.sh";
|
||||||
|
runtimeInputs = [ pkgs.age ];
|
||||||
|
text = ''
|
||||||
echo "Removing machine-id"
|
echo "Removing machine-id"
|
||||||
rm -f /etc/machine-id || true
|
rm -f /etc/machine-id || true
|
||||||
|
|
||||||
@@ -24,8 +26,9 @@ let
|
|||||||
echo "Generate age key"
|
echo "Generate age key"
|
||||||
rm -rf /var/lib/sops-nix || true
|
rm -rf /var/lib/sops-nix || true
|
||||||
mkdir -p /var/lib/sops-nix
|
mkdir -p /var/lib/sops-nix
|
||||||
${pkgs.age}/bin/age-keygen -o /var/lib/sops-nix/key.txt
|
age-keygen -o /var/lib/sops-nix/key.txt
|
||||||
'';
|
'';
|
||||||
|
};
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
environment.systemPackages = [ prepare-host-script ];
|
environment.systemPackages = [ prepare-host-script ];
|
||||||
|
|||||||
@@ -32,6 +32,11 @@
|
|||||||
datasource_list = [ "ConfigDrive" "NoCloud" ];
|
datasource_list = [ "ConfigDrive" "NoCloud" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
homelab.host = {
|
||||||
|
tier = "test";
|
||||||
|
priority = "low";
|
||||||
|
};
|
||||||
|
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/vda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
networking.hostName = "nixos-template2";
|
networking.hostName = "nixos-template2";
|
||||||
|
|||||||
@@ -1,7 +1,9 @@
|
|||||||
{ pkgs, ... }:
|
{ pkgs, ... }:
|
||||||
let
|
let
|
||||||
prepare-host-script = pkgs.writeShellScriptBin "prepare-host.sh"
|
prepare-host-script = pkgs.writeShellApplication {
|
||||||
''
|
name = "prepare-host.sh";
|
||||||
|
runtimeInputs = [ pkgs.age ];
|
||||||
|
text = ''
|
||||||
echo "Removing machine-id"
|
echo "Removing machine-id"
|
||||||
rm -f /etc/machine-id || true
|
rm -f /etc/machine-id || true
|
||||||
|
|
||||||
@@ -24,8 +26,9 @@ let
|
|||||||
echo "Generate age key"
|
echo "Generate age key"
|
||||||
rm -rf /var/lib/sops-nix || true
|
rm -rf /var/lib/sops-nix || true
|
||||||
mkdir -p /var/lib/sops-nix
|
mkdir -p /var/lib/sops-nix
|
||||||
${pkgs.age}/bin/age-keygen -o /var/lib/sops-nix/key.txt
|
age-keygen -o /var/lib/sops-nix/key.txt
|
||||||
'';
|
'';
|
||||||
|
};
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
environment.systemPackages = [ prepare-host-script ];
|
environment.systemPackages = [ prepare-host-script ];
|
||||||
|
|||||||
@@ -13,8 +13,16 @@
|
|||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
# Test VM - exclude from DNS zone generation
|
# Host metadata (adjust as needed)
|
||||||
homelab.dns.enable = false;
|
homelab.host = {
|
||||||
|
tier = "test"; # Start in test tier, move to prod after validation
|
||||||
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
@@ -24,7 +32,7 @@
|
|||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
services.resolved.enable = false;
|
services.resolved.enable = true;
|
||||||
networking.nameservers = [
|
networking.nameservers = [
|
||||||
"10.69.13.5"
|
"10.69.13.5"
|
||||||
"10.69.13.6"
|
"10.69.13.6"
|
||||||
@@ -34,7 +42,7 @@
|
|||||||
systemd.network.networks."ens18" = {
|
systemd.network.networks."ens18" = {
|
||||||
matchConfig.Name = "ens18";
|
matchConfig.Name = "ens18";
|
||||||
address = [
|
address = [
|
||||||
"10.69.13.101/24"
|
"10.69.13.20/24"
|
||||||
];
|
];
|
||||||
routes = [
|
routes = [
|
||||||
{ Gateway = "10.69.13.1"; }
|
{ Gateway = "10.69.13.1"; }
|
||||||
|
|||||||
@@ -1,27 +1,34 @@
|
|||||||
{
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
pkgs,
|
pkgs,
|
||||||
...
|
...
|
||||||
}:
|
}:
|
||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
homelab.dns.cnames = [ "ldap" ];
|
# Host metadata (adjust as needed)
|
||||||
|
homelab.host = {
|
||||||
nixpkgs.config.allowUnfree = true;
|
tier = "test"; # Start in test tier, move to prod after validation
|
||||||
# Use the systemd-boot EFI boot loader.
|
|
||||||
boot.loader.grub = {
|
|
||||||
enable = true;
|
|
||||||
device = "/dev/sda";
|
|
||||||
configurationLimit = 3;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
networking.hostName = "auth01";
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
boot.loader.grub.enable = true;
|
||||||
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
|
networking.hostName = "testvm02";
|
||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
@@ -35,7 +42,7 @@
|
|||||||
systemd.network.networks."ens18" = {
|
systemd.network.networks."ens18" = {
|
||||||
matchConfig.Name = "ens18";
|
matchConfig.Name = "ens18";
|
||||||
address = [
|
address = [
|
||||||
"10.69.13.18/24"
|
"10.69.13.21/24"
|
||||||
];
|
];
|
||||||
routes = [
|
routes = [
|
||||||
{ Gateway = "10.69.13.1"; }
|
{ Gateway = "10.69.13.1"; }
|
||||||
@@ -55,13 +62,11 @@
|
|||||||
git
|
git
|
||||||
];
|
];
|
||||||
|
|
||||||
services.qemuGuest.enable = true;
|
|
||||||
|
|
||||||
# Open ports in the firewall.
|
# Open ports in the firewall.
|
||||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
72
hosts/testvm03/configuration.nix
Normal file
72
hosts/testvm03/configuration.nix
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
|
../../system
|
||||||
|
../../common/vm
|
||||||
|
];
|
||||||
|
|
||||||
|
# Host metadata (adjust as needed)
|
||||||
|
homelab.host = {
|
||||||
|
tier = "test"; # Start in test tier, move to prod after validation
|
||||||
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
boot.loader.grub.enable = true;
|
||||||
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
|
networking.hostName = "testvm03";
|
||||||
|
networking.domain = "home.2rjus.net";
|
||||||
|
networking.useNetworkd = true;
|
||||||
|
networking.useDHCP = false;
|
||||||
|
services.resolved.enable = true;
|
||||||
|
networking.nameservers = [
|
||||||
|
"10.69.13.5"
|
||||||
|
"10.69.13.6"
|
||||||
|
];
|
||||||
|
|
||||||
|
systemd.network.enable = true;
|
||||||
|
systemd.network.networks."ens18" = {
|
||||||
|
matchConfig.Name = "ens18";
|
||||||
|
address = [
|
||||||
|
"10.69.13.22/24"
|
||||||
|
];
|
||||||
|
routes = [
|
||||||
|
{ Gateway = "10.69.13.1"; }
|
||||||
|
];
|
||||||
|
linkConfig.RequiredForOnline = "routable";
|
||||||
|
};
|
||||||
|
time.timeZone = "Europe/Oslo";
|
||||||
|
|
||||||
|
nix.settings.experimental-features = [
|
||||||
|
"nix-command"
|
||||||
|
"flakes"
|
||||||
|
];
|
||||||
|
nix.settings.tarball-ttl = 0;
|
||||||
|
environment.systemPackages = with pkgs; [
|
||||||
|
vim
|
||||||
|
wget
|
||||||
|
git
|
||||||
|
];
|
||||||
|
|
||||||
|
# Open ports in the firewall.
|
||||||
|
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||||
|
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||||
|
# Or disable the firewall altogether.
|
||||||
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
|
}
|
||||||
5
hosts/testvm03/default.nix
Normal file
5
hosts/testvm03/default.nix
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
{ ... }: {
|
||||||
|
imports = [
|
||||||
|
./configuration.nix
|
||||||
|
];
|
||||||
|
}
|
||||||
@@ -16,6 +16,8 @@
|
|||||||
|
|
||||||
homelab.dns.cnames = [ "vault" ];
|
homelab.dns.cnames = [ "vault" ];
|
||||||
|
|
||||||
|
homelab.host.role = "vault";
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/vda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|||||||
@@ -1,121 +0,0 @@
|
|||||||
{
|
|
||||||
config,
|
|
||||||
lib,
|
|
||||||
pkgs,
|
|
||||||
...
|
|
||||||
}:
|
|
||||||
|
|
||||||
{
|
|
||||||
imports = [
|
|
||||||
../template2/hardware-configuration.nix
|
|
||||||
|
|
||||||
../../system
|
|
||||||
../../common/vm
|
|
||||||
];
|
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
|
||||||
boot.loader.grub.enable = true;
|
|
||||||
boot.loader.grub.device = "/dev/vda";
|
|
||||||
|
|
||||||
networking.hostName = "vaulttest01";
|
|
||||||
networking.domain = "home.2rjus.net";
|
|
||||||
networking.useNetworkd = true;
|
|
||||||
networking.useDHCP = false;
|
|
||||||
services.resolved.enable = true;
|
|
||||||
networking.nameservers = [
|
|
||||||
"10.69.13.5"
|
|
||||||
"10.69.13.6"
|
|
||||||
];
|
|
||||||
|
|
||||||
systemd.network.enable = true;
|
|
||||||
systemd.network.networks."ens18" = {
|
|
||||||
matchConfig.Name = "ens18";
|
|
||||||
address = [
|
|
||||||
"10.69.13.150/24"
|
|
||||||
];
|
|
||||||
routes = [
|
|
||||||
{ Gateway = "10.69.13.1"; }
|
|
||||||
];
|
|
||||||
linkConfig.RequiredForOnline = "routable";
|
|
||||||
};
|
|
||||||
time.timeZone = "Europe/Oslo";
|
|
||||||
|
|
||||||
nix.settings.experimental-features = [
|
|
||||||
"nix-command"
|
|
||||||
"flakes"
|
|
||||||
];
|
|
||||||
nix.settings.tarball-ttl = 0;
|
|
||||||
environment.systemPackages = with pkgs; [
|
|
||||||
vim
|
|
||||||
wget
|
|
||||||
git
|
|
||||||
];
|
|
||||||
|
|
||||||
# Open ports in the firewall.
|
|
||||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
|
||||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
|
||||||
# Or disable the firewall altogether.
|
|
||||||
networking.firewall.enable = false;
|
|
||||||
|
|
||||||
# Testing config
|
|
||||||
# Enable Vault secrets management
|
|
||||||
vault.enable = true;
|
|
||||||
|
|
||||||
# Define a test secret
|
|
||||||
vault.secrets.test-service = {
|
|
||||||
secretPath = "hosts/vaulttest01/test-service";
|
|
||||||
restartTrigger = true;
|
|
||||||
restartInterval = "daily";
|
|
||||||
services = [ "vault-test" ];
|
|
||||||
};
|
|
||||||
|
|
||||||
# Create a test service that uses the secret
|
|
||||||
systemd.services.vault-test = {
|
|
||||||
description = "Test Vault secret fetching";
|
|
||||||
wantedBy = [ "multi-user.target" ];
|
|
||||||
after = [ "vault-secret-test-service.service" ];
|
|
||||||
|
|
||||||
serviceConfig = {
|
|
||||||
Type = "oneshot";
|
|
||||||
RemainAfterExit = true;
|
|
||||||
|
|
||||||
ExecStart = pkgs.writeShellScript "vault-test" ''
|
|
||||||
echo "=== Vault Secret Test ==="
|
|
||||||
echo "Secret path: hosts/vaulttest01/test-service"
|
|
||||||
|
|
||||||
if [ -f /run/secrets/test-service/password ]; then
|
|
||||||
echo "✓ Password file exists"
|
|
||||||
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
|
|
||||||
else
|
|
||||||
echo "✗ Password file missing!"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [ -d /var/lib/vault/cache/test-service ]; then
|
|
||||||
echo "✓ Cache directory exists"
|
|
||||||
else
|
|
||||||
echo "✗ Cache directory missing!"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo "Test successful!"
|
|
||||||
'';
|
|
||||||
|
|
||||||
StandardOutput = "journal+console";
|
|
||||||
};
|
|
||||||
};
|
|
||||||
|
|
||||||
# Test ACME certificate issuance from OpenBao PKI
|
|
||||||
# Override the global ACME server (from system/acme.nix) to use OpenBao instead of step-ca
|
|
||||||
security.acme.defaults.server = lib.mkForce "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory";
|
|
||||||
|
|
||||||
# Request a certificate for this host
|
|
||||||
# Using HTTP-01 challenge with standalone listener on port 80
|
|
||||||
security.acme.certs."vaulttest01.home.2rjus.net" = {
|
|
||||||
listenHTTP = ":80";
|
|
||||||
enableDebugLogs = true;
|
|
||||||
};
|
|
||||||
|
|
||||||
system.stateVersion = "25.11"; # Did you read the comment?
|
|
||||||
}
|
|
||||||
|
|
||||||
@@ -1,7 +1,9 @@
|
|||||||
{ ... }:
|
{ ... }:
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
|
./deploy.nix
|
||||||
./dns.nix
|
./dns.nix
|
||||||
|
./host.nix
|
||||||
./monitoring.nix
|
./monitoring.nix
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
|
|||||||
16
modules/homelab/deploy.nix
Normal file
16
modules/homelab/deploy.nix
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{ config, lib, ... }:
|
||||||
|
|
||||||
|
{
|
||||||
|
options.homelab.deploy = {
|
||||||
|
enable = lib.mkEnableOption "homelab-deploy listener for NATS-based deployments";
|
||||||
|
};
|
||||||
|
|
||||||
|
config = {
|
||||||
|
assertions = [
|
||||||
|
{
|
||||||
|
assertion = config.homelab.deploy.enable -> config.vault.enable;
|
||||||
|
message = "homelab.deploy.enable requires vault.enable to be true (needed for NKey secret)";
|
||||||
|
}
|
||||||
|
];
|
||||||
|
};
|
||||||
|
}
|
||||||
28
modules/homelab/host.nix
Normal file
28
modules/homelab/host.nix
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
{ lib, ... }:
|
||||||
|
{
|
||||||
|
options.homelab.host = {
|
||||||
|
tier = lib.mkOption {
|
||||||
|
type = lib.types.enum [ "test" "prod" ];
|
||||||
|
default = "prod";
|
||||||
|
description = "Deployment tier - controls which credentials can deploy to this host";
|
||||||
|
};
|
||||||
|
|
||||||
|
priority = lib.mkOption {
|
||||||
|
type = lib.types.enum [ "high" "low" ];
|
||||||
|
default = "high";
|
||||||
|
description = "Alerting priority - low priority hosts have relaxed thresholds";
|
||||||
|
};
|
||||||
|
|
||||||
|
role = lib.mkOption {
|
||||||
|
type = lib.types.nullOr lib.types.str;
|
||||||
|
default = null;
|
||||||
|
description = "Primary role of this host (dns, database, monitoring, etc.)";
|
||||||
|
};
|
||||||
|
|
||||||
|
labels = lib.mkOption {
|
||||||
|
type = lib.types.attrsOf lib.types.str;
|
||||||
|
default = { };
|
||||||
|
description = "Additional free-form labels (e.g., dns_role = 'primary')";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
78
playbooks/provision-approle.yml
Normal file
78
playbooks/provision-approle.yml
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
---
|
||||||
|
# Provision OpenBao AppRole credentials to an existing host
|
||||||
|
# Usage: nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=ha1
|
||||||
|
# Requires: BAO_ADDR and BAO_TOKEN environment variables set
|
||||||
|
|
||||||
|
- name: Fetch AppRole credentials from OpenBao
|
||||||
|
hosts: localhost
|
||||||
|
connection: local
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
vars:
|
||||||
|
vault_addr: "{{ lookup('env', 'BAO_ADDR') | default('https://vault01.home.2rjus.net:8200', true) }}"
|
||||||
|
domain: "home.2rjus.net"
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Validate hostname is provided
|
||||||
|
ansible.builtin.fail:
|
||||||
|
msg: "hostname variable is required. Use: -e hostname=<name>"
|
||||||
|
when: hostname is not defined
|
||||||
|
|
||||||
|
- name: Get role-id for host
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: "bao read -field=role_id auth/approle/role/{{ hostname }}/role-id"
|
||||||
|
environment:
|
||||||
|
BAO_ADDR: "{{ vault_addr }}"
|
||||||
|
BAO_SKIP_VERIFY: "1"
|
||||||
|
register: role_id_result
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Generate secret-id for host
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: "bao write -field=secret_id -f auth/approle/role/{{ hostname }}/secret-id"
|
||||||
|
environment:
|
||||||
|
BAO_ADDR: "{{ vault_addr }}"
|
||||||
|
BAO_SKIP_VERIFY: "1"
|
||||||
|
register: secret_id_result
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
- name: Add target host to inventory
|
||||||
|
ansible.builtin.add_host:
|
||||||
|
name: "{{ hostname }}.{{ domain }}"
|
||||||
|
groups: vault_target
|
||||||
|
ansible_user: root
|
||||||
|
vault_role_id: "{{ role_id_result.stdout }}"
|
||||||
|
vault_secret_id: "{{ secret_id_result.stdout }}"
|
||||||
|
|
||||||
|
- name: Deploy AppRole credentials to host
|
||||||
|
hosts: vault_target
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Create AppRole directory
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: /var/lib/vault/approle
|
||||||
|
state: directory
|
||||||
|
mode: "0700"
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
|
||||||
|
- name: Write role-id
|
||||||
|
ansible.builtin.copy:
|
||||||
|
content: "{{ vault_role_id }}"
|
||||||
|
dest: /var/lib/vault/approle/role-id
|
||||||
|
mode: "0600"
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
|
||||||
|
- name: Write secret-id
|
||||||
|
ansible.builtin.copy:
|
||||||
|
content: "{{ vault_secret_id }}"
|
||||||
|
dest: /var/lib/vault/approle/secret-id
|
||||||
|
mode: "0600"
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
|
||||||
|
- name: Display success
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "AppRole credentials provisioned to {{ inventory_hostname }}"
|
||||||
@@ -18,6 +18,8 @@ from manipulators import (
|
|||||||
remove_from_flake_nix,
|
remove_from_flake_nix,
|
||||||
remove_from_terraform_vms,
|
remove_from_terraform_vms,
|
||||||
remove_from_vault_terraform,
|
remove_from_vault_terraform,
|
||||||
|
remove_from_approle_tf,
|
||||||
|
find_host_secrets,
|
||||||
check_entries_exist,
|
check_entries_exist,
|
||||||
)
|
)
|
||||||
from models import HostConfig
|
from models import HostConfig
|
||||||
@@ -255,7 +257,10 @@ def handle_remove(
|
|||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
# Check what entries exist
|
# Check what entries exist
|
||||||
flake_exists, terraform_exists, vault_exists = check_entries_exist(hostname, repo_root)
|
flake_exists, terraform_exists, vault_exists, approle_exists = check_entries_exist(hostname, repo_root)
|
||||||
|
|
||||||
|
# Check for secrets in secrets.tf
|
||||||
|
host_secrets = find_host_secrets(hostname, repo_root)
|
||||||
|
|
||||||
# Collect all files in the host directory recursively
|
# Collect all files in the host directory recursively
|
||||||
files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()])
|
files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()])
|
||||||
@@ -294,6 +299,21 @@ def handle_remove(
|
|||||||
else:
|
else:
|
||||||
console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]")
|
console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]")
|
||||||
|
|
||||||
|
if approle_exists:
|
||||||
|
console.print(f' • terraform/vault/approle.tf (host_policies["{hostname}"])')
|
||||||
|
else:
|
||||||
|
console.print(f" • terraform/vault/approle.tf [dim](not found)[/dim]")
|
||||||
|
|
||||||
|
# Warn about secrets in secrets.tf
|
||||||
|
if host_secrets:
|
||||||
|
console.print(f"\n[yellow]⚠️ Warning: Found {len(host_secrets)} secret(s) in terraform/vault/secrets.tf:[/yellow]")
|
||||||
|
for secret_path in host_secrets:
|
||||||
|
console.print(f' • "{secret_path}"')
|
||||||
|
console.print(f"\n [yellow]These will NOT be removed automatically.[/yellow]")
|
||||||
|
console.print(f" After removal, manually edit secrets.tf and run:")
|
||||||
|
for secret_path in host_secrets:
|
||||||
|
console.print(f" [white]vault kv delete secret/{secret_path}[/white]")
|
||||||
|
|
||||||
# Warn about secrets directory
|
# Warn about secrets directory
|
||||||
if secrets_exist:
|
if secrets_exist:
|
||||||
console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists and will NOT be deleted[/yellow]")
|
console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists and will NOT be deleted[/yellow]")
|
||||||
@@ -323,6 +343,13 @@ def handle_remove(
|
|||||||
else:
|
else:
|
||||||
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf")
|
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf")
|
||||||
|
|
||||||
|
# Remove from terraform/vault/approle.tf
|
||||||
|
if approle_exists:
|
||||||
|
if remove_from_approle_tf(hostname, repo_root):
|
||||||
|
console.print("[green]✓[/green] Removed from terraform/vault/approle.tf")
|
||||||
|
else:
|
||||||
|
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/approle.tf")
|
||||||
|
|
||||||
# Remove from terraform/vms.tf
|
# Remove from terraform/vms.tf
|
||||||
if terraform_exists:
|
if terraform_exists:
|
||||||
if remove_from_terraform_vms(hostname, repo_root):
|
if remove_from_terraform_vms(hostname, repo_root):
|
||||||
@@ -345,19 +372,34 @@ def handle_remove(
|
|||||||
console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n")
|
console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n")
|
||||||
|
|
||||||
# Display next steps
|
# Display next steps
|
||||||
display_removal_next_steps(hostname, vault_exists)
|
display_removal_next_steps(hostname, vault_exists, approle_exists, host_secrets)
|
||||||
|
|
||||||
|
|
||||||
def display_removal_next_steps(hostname: str, had_vault: bool) -> None:
|
def display_removal_next_steps(hostname: str, had_vault: bool, had_approle: bool, host_secrets: list) -> None:
|
||||||
"""Display next steps after successful removal."""
|
"""Display next steps after successful removal."""
|
||||||
vault_file = " terraform/vault/hosts-generated.tf" if had_vault else ""
|
vault_files = ""
|
||||||
vault_apply = ""
|
|
||||||
if had_vault:
|
if had_vault:
|
||||||
|
vault_files += " terraform/vault/hosts-generated.tf"
|
||||||
|
if had_approle:
|
||||||
|
vault_files += " terraform/vault/approle.tf"
|
||||||
|
|
||||||
|
vault_apply = ""
|
||||||
|
if had_vault or had_approle:
|
||||||
vault_apply = f"""
|
vault_apply = f"""
|
||||||
3. Apply Vault changes:
|
3. Apply Vault changes:
|
||||||
[white]cd terraform/vault && tofu apply[/white]
|
[white]cd terraform/vault && tofu apply[/white]
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
secrets_cleanup = ""
|
||||||
|
if host_secrets:
|
||||||
|
secrets_cleanup = f"""
|
||||||
|
5. Clean up secrets (manual):
|
||||||
|
Edit terraform/vault/secrets.tf to remove entries for {hostname}
|
||||||
|
Then delete from Vault:"""
|
||||||
|
for secret_path in host_secrets:
|
||||||
|
secrets_cleanup += f"\n [white]vault kv delete secret/{secret_path}[/white]"
|
||||||
|
secrets_cleanup += "\n"
|
||||||
|
|
||||||
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
|
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
|
||||||
|
|
||||||
1. Review changes:
|
1. Review changes:
|
||||||
@@ -367,9 +409,9 @@ def display_removal_next_steps(hostname: str, had_vault: bool) -> None:
|
|||||||
[white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white]
|
[white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white]
|
||||||
{vault_apply}
|
{vault_apply}
|
||||||
4. Commit changes:
|
4. Commit changes:
|
||||||
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_file}
|
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_files}
|
||||||
git commit -m "hosts: remove {hostname}"[/white]
|
git commit -m "hosts: remove {hostname}"[/white]
|
||||||
"""
|
{secrets_cleanup}"""
|
||||||
console.print(Panel(next_steps, border_style="cyan"))
|
console.print(Panel(next_steps, border_style="cyan"))
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -144,7 +144,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
|
|||||||
|
|
||||||
backend = vault_auth_backend.approle.path
|
backend = vault_auth_backend.approle.path
|
||||||
role_name = each.key
|
role_name = each.key
|
||||||
token_policies = ["host-\${each.key}"]
|
token_policies = ["host-\${each.key}", "homelab-deploy"]
|
||||||
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
||||||
token_ttl = 3600
|
token_ttl = 3600
|
||||||
token_max_ttl = 3600
|
token_max_ttl = 3600
|
||||||
|
|||||||
@@ -101,7 +101,68 @@ def remove_from_vault_terraform(hostname: str, repo_root: Path) -> bool:
|
|||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool]:
|
def remove_from_approle_tf(hostname: str, repo_root: Path) -> bool:
|
||||||
|
"""
|
||||||
|
Remove host entry from terraform/vault/approle.tf locals.host_policies.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Hostname to remove
|
||||||
|
repo_root: Path to repository root
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if found and removed, False if not found
|
||||||
|
"""
|
||||||
|
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
|
||||||
|
|
||||||
|
if not approle_path.exists():
|
||||||
|
return False
|
||||||
|
|
||||||
|
content = approle_path.read_text()
|
||||||
|
|
||||||
|
# Check if hostname exists in host_policies
|
||||||
|
hostname_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
|
||||||
|
if not re.search(hostname_pattern, content, re.MULTILINE):
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Match the entire block from "hostname" = { to closing }
|
||||||
|
# The block contains paths = [ ... ] and possibly extra_policies = [...]
|
||||||
|
replace_pattern = rf'\n?\s+"{re.escape(hostname)}" = \{{[^}}]*\}}\n?'
|
||||||
|
new_content, count = re.subn(replace_pattern, "\n", content, flags=re.DOTALL)
|
||||||
|
|
||||||
|
if count == 0:
|
||||||
|
return False
|
||||||
|
|
||||||
|
approle_path.write_text(new_content)
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def find_host_secrets(hostname: str, repo_root: Path) -> list:
|
||||||
|
"""
|
||||||
|
Find secrets in terraform/vault/secrets.tf that belong to a host.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Hostname to search for
|
||||||
|
repo_root: Path to repository root
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of secret paths found (e.g., ["hosts/hostname/test-service"])
|
||||||
|
"""
|
||||||
|
secrets_path = repo_root / "terraform" / "vault" / "secrets.tf"
|
||||||
|
|
||||||
|
if not secrets_path.exists():
|
||||||
|
return []
|
||||||
|
|
||||||
|
content = secrets_path.read_text()
|
||||||
|
|
||||||
|
# Find all secret paths matching hosts/{hostname}/
|
||||||
|
pattern = rf'"(hosts/{re.escape(hostname)}/[^"]+)"'
|
||||||
|
matches = re.findall(pattern, content)
|
||||||
|
|
||||||
|
# Return unique paths, preserving order
|
||||||
|
return list(dict.fromkeys(matches))
|
||||||
|
|
||||||
|
|
||||||
|
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool, bool]:
|
||||||
"""
|
"""
|
||||||
Check which entries exist for a hostname.
|
Check which entries exist for a hostname.
|
||||||
|
|
||||||
@@ -110,7 +171,7 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
|
|||||||
repo_root: Path to repository root
|
repo_root: Path to repository root
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Tuple of (flake_exists, terraform_vms_exists, vault_exists)
|
Tuple of (flake_exists, terraform_vms_exists, vault_generated_exists, approle_exists)
|
||||||
"""
|
"""
|
||||||
# Check flake.nix
|
# Check flake.nix
|
||||||
flake_path = repo_root / "flake.nix"
|
flake_path = repo_root / "flake.nix"
|
||||||
@@ -131,7 +192,15 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
|
|||||||
vault_content = vault_tf_path.read_text()
|
vault_content = vault_tf_path.read_text()
|
||||||
vault_exists = f'"{hostname}"' in vault_content
|
vault_exists = f'"{hostname}"' in vault_content
|
||||||
|
|
||||||
return (flake_exists, terraform_exists, vault_exists)
|
# Check terraform/vault/approle.tf
|
||||||
|
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
|
||||||
|
approle_exists = False
|
||||||
|
if approle_path.exists():
|
||||||
|
approle_content = approle_path.read_text()
|
||||||
|
approle_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
|
||||||
|
approle_exists = bool(re.search(approle_pattern, approle_content, re.MULTILINE))
|
||||||
|
|
||||||
|
return (flake_exists, terraform_exists, vault_exists, approle_exists)
|
||||||
|
|
||||||
|
|
||||||
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
|
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
|
||||||
@@ -152,15 +221,8 @@ def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -
|
|||||||
specialArgs = {{
|
specialArgs = {{
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self sops-nix;
|
||||||
}};
|
}};
|
||||||
modules = [
|
modules = commonModules ++ [
|
||||||
(
|
|
||||||
{{ config, pkgs, ... }}:
|
|
||||||
{{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}}
|
|
||||||
)
|
|
||||||
./hosts/{config.hostname}
|
./hosts/{config.hostname}
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
];
|
||||||
}};
|
}};
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -13,6 +13,17 @@
|
|||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
|
# Host metadata (adjust as needed)
|
||||||
|
homelab.host = {
|
||||||
|
tier = "test"; # Start in test tier, move to prod after validation
|
||||||
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/vda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|||||||
@@ -137,9 +137,9 @@ fetch_from_vault() {
|
|||||||
|
|
||||||
# Write each secret key to a separate file
|
# Write each secret key to a separate file
|
||||||
log "Writing secrets to $OUTPUT_DIR"
|
log "Writing secrets to $OUTPUT_DIR"
|
||||||
echo "$SECRET_DATA" | jq -r 'to_entries[] | "\(.key)\n\(.value)"' | while read -r key; read -r value; do
|
for key in $(echo "$SECRET_DATA" | jq -r 'keys[]'); do
|
||||||
echo -n "$value" > "$OUTPUT_DIR/$key"
|
echo "$SECRET_DATA" | jq -j --arg k "$key" '.[$k]' > "$OUTPUT_DIR/$key"
|
||||||
echo -n "$value" > "$CACHE_DIR/$key"
|
echo "$SECRET_DATA" | jq -j --arg k "$key" '.[$k]' > "$CACHE_DIR/$key"
|
||||||
chmod 600 "$OUTPUT_DIR/$key"
|
chmod 600 "$OUTPUT_DIR/$key"
|
||||||
chmod 600 "$CACHE_DIR/$key"
|
chmod 600 "$CACHE_DIR/$key"
|
||||||
log " - Wrote secret key: $key"
|
log " - Wrote secret key: $key"
|
||||||
|
|||||||
@@ -1,29 +0,0 @@
|
|||||||
authelia_ldap_password: ENC[AES256_GCM,data:x2UDMpqQKoRVSlDSmK5XiC9x4/WWzmjk7cwtFA70waAD7xYQfXEOV+AeX1LlFfj0qHYrhyn//TLsa+tJzb7HPEAfl8vYR4MdkVFOm5vjPWWoF5Ul8ZVn8+B1VJLbiXkexv0/hfXL8NMzEcp/pF4H0Yei7xaKezu9OPtGzKufHws=,iv:88RXaOj8Zy9fGeDLAE0ItY7TKCCzxn6F0+kU5+Zy/XU=,tag:yPdCJ9d139iO6J97thVVgA==,type:str]
|
|
||||||
authelia_jwt_secret: ENC[AES256_GCM,data:9ZHkT2o5KZLmml95g8HZce8fNBmaWtRn+175Gaz0KhsndNl3zdgGq3hydRuoZuEgLVsherJImVmb5DQAZpv04lUEsDKCYeFNwAyYl4Go2jCp1fI53fdcRCKlNVZA37pMi4AYaCoe8vIl/cwPOOBDEwK5raOBnklCzVERoO0B8a0=,iv:9CTWCw0ImZR0OSrl2znbhpRHlzAxA5Cpcy98JeH9Z+Y=,tag:L+0xKqiwXTi7XiDYWA1Bcw==,type:str]
|
|
||||||
authelia_storage_encryption_key_file: ENC[AES256_GCM,data:RfbcQK8+rrW/Krd2rbDfgo7YI2YvQKqpLuDtk5DZJNNhw4giBh5nFp/8LNeo8r39/oiJLYTe6FjTLBu72TZz2wWrJFsBqjwQ/3TfATQGdLUsaXXRDr88ezHLTiYvEHIHJhUS5qsr7VMwBam5e7YGWBe5sGZCE/nX41ijyPUjtOY=,iv:sayYcAC38cApAtL+cDhgGNjWaHn+furKRowKL6AmfdU=,tag:1IZpnlpvDWGLLpZyU9iJUw==,type:str]
|
|
||||||
authelia_session_secret: ENC[AES256_GCM,data:4PaLv4RRA7/9Z8QzETXLwo3OctJ0mvzQkYmHsGGF97nq9QeB3eo0xj4FyuCbkJGGZ/huAyRgmFBTyscY3wgxoc4t+8BdlYcSbefEk1/xRFjmG8ooXLKhvGJ5c6t72KJRcqsEGTiC0l9CFJWQ2qYcjM4dPwG8z0tjUZ6j25Zfx4M=,iv:QORJkf0w6iyuRHM/xuql1s7K75Qa49ygq+lwHfrm9rk=,tag:/HZ/qI80fKjmuTRwIwmX8g==,type:str]
|
|
||||||
lldap_user_pass: ENC[AES256_GCM,data:56gF7uqVQ+/J5/lY/N904Q==,iv:qtY1XhHs4WWA4kPY56NigPvX4OslO0koZepgdv947zg=,tag:UDmJs8FPXskp7rUS2Sxinw==,type:str]
|
|
||||||
sops:
|
|
||||||
age:
|
|
||||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBlc1dxK3FKU2ZGWTNGUmxZ
|
|
||||||
aWx1NngySjVHclJTd3hXejJRTmVHRExReHcwCk55c0xMbGcyTktySkJZdHRZbzhK
|
|
||||||
bEI3RzBHQkROTU1qWXBoU1RqTXppdVkKLS0tIHkwZ0QyNTMydWRqUlBtTEdhZ05r
|
|
||||||
YVpuT1JadnlyN1hqNnJxYzVPT3pXN1UKDCeIv0xv+5pcoDdtYc+rYjwi8SLrqWth
|
|
||||||
vdWepxmV2edajZRqcwFEC9weOZ1j2lh7Z3hR6RSN/+X3sFpqkpw+Yg==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age16prza00sqzuhwwcyakj6z4hvwkruwkqpmmrsn94a5ucgpkelncdq2ldctk
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSAvbU0wNmFLelRmNmJTRlho
|
|
||||||
dTEwVXZqUVI5NHZkb1QyNUZ4R0pLVFZWVDM4CkhVc00zY2FKaVdNRXdGVk1ranpG
|
|
||||||
MlRWWGJmd2FWeFE1dXU4WHVFL0FHZ3MKLS0tIGt2ZWlaOW5wNkJnQVkrTDZWTnY0
|
|
||||||
RW5HRjA3cERCUU1CVWZhck12SGhTRUkK6k/zQ87TIETYouRBby7ujtwgpqIPKKv+
|
|
||||||
2aLJW6lSWMVzL/f3ZrIeg12tJjHs3f44EXR6j3tfLfSKog2iL8Y57w==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
lastmodified: "2025-12-06T10:03:56Z"
|
|
||||||
mac: ENC[AES256_GCM,data:SRNqx5n+xg/cNGiyze3CGKufox3IuXmOKLqNRDeJhBNMBHC1iYYCjRdHEVXsl7XSiYe51dSwjV0KrJa/SG1pRVkuyT+xyPrTjT2/DyXN7A/CESSAkBIwI7lkZmIf8DkxB3CELF1PgjIr1o2isxlBnkAnhEBTxQ7t8AzpcH7I5yU=,iv:P3FGQurZrL0ed5UuBPRFk11T0VRFtL6xI4iQ4LmYTec=,tag:8gQL08ojjIMyCl5E0Qs/Ww==,type:str]
|
|
||||||
unencrypted_suffix: _unencrypted
|
|
||||||
version: 3.11.0
|
|
||||||
@@ -7,110 +7,101 @@ sops:
|
|||||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0elpybDFQMmlXV21XaTBR
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBuWXhzQWFmeCt1R05jREcz
|
||||||
TGExNEVPa3N2VzBCRzJpN2lSVzNFN09CWGowCkFUbTA1MmtNelJZZHgwMHpJcEQ1
|
Ui9HZFN5dkxHNVE0RVJGZUJUa3hKK2sxdkhBCktYcGpLeGZIQzZIV3ZZWGs3YzF1
|
||||||
dXNmRy9yODBrU01FYXh4RkJ2MzJmMU0KLS0tIDZMWSthOHovVWhSQ1pSYmcrQXFh
|
T09sUEhPWkRkOWZFWkltQXBlM1lQV1UKLS0tIERRSlRUYW5QeW9TVjJFSmorOWNI
|
||||||
R3JBaDM1R2VxcUI4OFhyRUFlZEMxNkkKxTb8QBnxBQ2zfbTEZuQ3QIv9bKwm2c0p
|
ZytmaEhzMjVhRXI1S0hielF0NlBrMmcK4I1PtSf7tSvSIJxWBjTnfBCO8GEFHbuZ
|
||||||
wWSxxSI2u3crC17Vb8yVX8p5tZuKxierxOuIVXLxxvU51ldIQquKPw==
|
BkZskr5fRnWUIs72ZOGoTAVSO5ZNiBglOZ8YChl4Vz1U7bvdOCt0bw==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1hz2lz4k050ru3shrk5j3zk3f8azxmrp54pktw5a7nzjml4saudesx6jsl0
|
- recipient: age1hz2lz4k050ru3shrk5j3zk3f8azxmrp54pktw5a7nzjml4saudesx6jsl0
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA5Wk05REFwZSszWWlaZWJV
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBQcXM0RHlGcmZrYW4yNGZs
|
||||||
UFNzK3g0TXhGd1N4YjJpSUQvaFJCM21BT1FVCkd6d210cndtVVEyeUFhUXJvR0lM
|
S1ZqQzVaYmQ4MGhGaTFMUVIwOTk5K0tZZjB3ClN0QkhVeHRrNXZHdmZWMzFBRnJ6
|
||||||
N0p2aHExZlBibW1OTERiQ1JtZ29hbFUKLS0tIHVLYWtIZUFRUDBXK3BZYU9KdUlU
|
WTFtaWZyRmx2TitkOXkrVkFiYVd3RncKLS0tIExpeGUvY1VpODNDL2NCaUhtZkp0
|
||||||
bXl0VnVZTEJ6clljeTVnVGxKOXhwYTgKUGw+3Ry03lsYOrM8zBT3Q0lGVFnaQ9Ca
|
cGNVZTI3UGxlNWdFWVZMd3FlS3pDR3cKBulaMeonV++pArXOg3ilgKnW/51IyT6Z
|
||||||
nLWJEwZXrqTstBxVtcVO8EbQHIhs0FH1PnvmXZWDS7ADABXlSEjwYQ==
|
vH9HOJUix+ryEwDIcjv4aWx9pYDHthPFZUDC25kLYG91WrJFQOo2oA==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1w2q4gm2lrcgdzscq8du3ssyvk6qtzm4fcszc92z9ftclq23yyydqdga5um
|
- recipient: age1w2q4gm2lrcgdzscq8du3ssyvk6qtzm4fcszc92z9ftclq23yyydqdga5um
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBMUytmK2JmMnNPNVdpUE5u
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBabTdsZWxZQjV2TGx2YjNM
|
||||||
RlhJS3JSdm1sSW1CUnVKcXo1STI5WkhsTncwCndua0dzam9VeEY3RnR2S0I4NXg4
|
ZTgzWktqTjY0S0M3bFpNZXlDRDk5TSt3V2k0CjdWWTN0TlRlK1RpUm9xYW03MFFG
|
||||||
a1dTNlZ0VmFpdmo1R1hoNzVrRzl4MWsKLS0tIDFvT2JwZWxJMFRwUkFUMFNyaHgy
|
aWN4a3o4VUVnYzBDd2FrelUraWtrMTAKLS0tIE1vTGpKYkhzcWErWDRreml2QmE2
|
||||||
a3hpSDQzaHN2M1JWTG82TU4wOGo4RkEKlF/YdB/l5WqPrWR+gHS4CDnQ2WLD0emV
|
ZkNIWERKb1drdVR6MTBSTnVmdm51VEkKVNDYdyBSrUT7dUn6a4eF7ELQ2B2Pk6V9
|
||||||
ScxDCgHnFYdKkv4TTaVV6opcB5t7uJECqUqBNxTyvwBrN9+n6m7Edg==
|
Z5fbT75ibuyX1JO315/gl2P/FhxmlRW1K6e+04gQe2R/t/3H11Q7YQ==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1d2w5zece9647qwyq4vas9qyqegg96xwmg6c86440a6eg4uj6dd2qrq0w3l
|
- recipient: age1d2w5zece9647qwyq4vas9qyqegg96xwmg6c86440a6eg4uj6dd2qrq0w3l
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBlcnNCZmNTRWdDUER3Tlpl
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBVSFhDOFRVbnZWbVlQaG5G
|
||||||
S0dMc25qTzRiYlBsWE05OWZGRUJhYnNUWGt3CkNZcGNQaGJDbWdrQUNNa1d0emhI
|
U0NWekU0NzI1SlpRN0NVS1hPN210MXY3Z244CmtFemR5OUpzdlBzMHBUV3g0SFFo
|
||||||
UmtkL2dBbEEzNFp5ZnVFeHV2dDR0QzgKLS0tIG0xVE1LQjBHUUx2bklFVy9lVXBu
|
eUtqNThXZDJ2b01yVVVuOFdwQVo2Qm8KLS0tIHpXRWd3OEpPRkpaVDNDTEJLMWEv
|
||||||
NzRMb1dnSTU2MlRtVkhLdjVlalFQOUkKYMY2yykgH8Qgmw7xyPf8dYybBuiRxQwy
|
ZlZtaFpBdzF0YXFmdjNkNUR3YkxBZU0KAub+HF/OBZQR9bx/SVadZcL6Ms+NQ7yq
|
||||||
hh2tgikE/90asVQTmW9ioRMy/e4cKnJGi8irGXoK4rkM/+fOVMWQ7Q==
|
21HCcDTWyWHbN4ymUrIYXci1A/0tTOrQL9Mkvaz7IJh4VdHLPZrwwA==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1gq8434ku0xekqmvnseeunv83e779cg03c06gwrusnymdsr3rpufqx6vr3m
|
- recipient: age1gq8434ku0xekqmvnseeunv83e779cg03c06gwrusnymdsr3rpufqx6vr3m
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBKSFI4bUJXOS9zV082Ykho
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBWkhBL1NTdjFDeEhQcEgv
|
||||||
ZnFYazVyb2hheUVTb0k5czlqRDRIVXJSTjNzClZ6TndTRnRwQ0ZZUkFld2c2WFl4
|
Z3c3Z213L2ZhWGo0Qm5Zd1A1RTBDY3plUkh3CkNWV2ZtNWkrUjB0eWFzUlVtbHlk
|
||||||
N0l3UHB1SnN4YUx5YTM3bDkrdzFScG8KLS0tIE5jYmVmelcxZGxPRjBIV1dobHF5
|
WTdTQjN4eDIzY0c0dyt6ajVXZ0krd1UKLS0tIHB4aEJqTTRMenV3UkFkTGEySjQ2
|
||||||
d2QxRzlRaWZ2ZjB2UEwyNHQrTDNwZDAKyWp3vMfeE1/oT7hRcAdoxnZKPnZYRF5F
|
YVM1a3ZPdUU4T244UU0rc3hVQ3NYczQK10wug4kTjsvv/iOPWi5WrVZMOYUq4/Mf
|
||||||
YrRBIGJdVaC6h9YwlzsQ3Ew3TRg65dq+h4xew/227ZY7Qg9uVuHk5Q==
|
oXS4sikXeUsqH1T2LUBjVnUieSneQVn7puYZlN+cpDQ0XdK/RZ+91A==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk
|
- recipient: age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBXR2xSd0pTd04wemhqZHNH
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBYcEtHbjNWRkdodUxYdHRn
|
||||||
UVJ1ZjFEWG9OZGtQQUVNUnJBR2dLeXFNM0F3ClhkLzA3cWVTR01XZzNmaUgwdnlR
|
MDBMU08zWDlKa0Z4cHJvc28rZk5pUjhnMjE0CmdzRmVGWDlYQ052Wm1zWnlYSFV6
|
||||||
bEExTjluYXpIZmRvdURBdkFIY2VubTAKLS0tIGVsWmlPNCtWbWxMWFQ4Ky9jZVcr
|
dURQK3JSbThxQlg3M2ZaL1hGRzVuL0UKLS0tIEI3UGZvbEpvRS9aR2J2Tnc1YmxZ
|
||||||
VHhlNnV1cTlEd3U4YjV3UGlLYVRWVUEKhjbs9nRhu5s1SD3CJTDkW8s0koPvW6LY
|
aUY5Q2MrdHNQWDJNaGt5MWx6MVRrRVEKRPxyAekGHFMKs0Z6spVDayBA4EtPk18e
|
||||||
jJlw8dPctC1bfWgzca3WxhuBIE14TWoxI2+ec9y6x8yYzdvIQhNIIg==
|
jiFc97BGVtC5IoSu4icq3ZpKOdxymnkqKEt0YP/p/JTC+8MKvTJFQw==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1vpns76ykll8jgdlu3h05cur4ew2t3k7u03kxdg8y6ypfhsfhq9fqyurjey
|
- recipient: age1vpns76ykll8jgdlu3h05cur4ew2t3k7u03kxdg8y6ypfhsfhq9fqyurjey
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBULy91QnFLSmxrNlU1U0RV
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBQL3ZMUkI1dUV1T2tTSHhn
|
||||||
Mnprc2dBVVRHMzdQTzhHL2d5ejB5cEYxSVZzClp4UXZNbWdJZk5LWnZlSVdEM0Vk
|
SjhyQ3dKTytoaDBNcit1VHpwVGUzWVNpdjBnCklYZWtBYzBpcGxZSDBvM2tIZm9H
|
||||||
MEV3WmlLVlVsWXduSFpVQW9KU1d6WlEKLS0tIE8xYjRxY1ZySlZMbG5acm5RSU1Z
|
bTFjb1ZCaDkrOU1JODVBVTBTbmxFbmcKLS0tIGtGcS9kejZPZlhHRXI5QnI5Wm9Q
|
||||||
c2Y5aXJSMFJNcVp0YS96MGtMTEJHMEEKm2jRWDsdpMnDXPMOhA56Qld3yjlJe246
|
VjMxTDdWZEltWThKVDl0S24yWHJxZHcKgzH79zT2I7ZgyTbbbvIhLN/rEcfiomJH
|
||||||
6Xbc4924WparHwPh8YmVKP3IYsrNYw2WxFmLZpDGVQmd5Tz1lD4s9w==
|
oSZDFvPiXlhPgy8bRyyq3l47CVpWbUI2Y7DFXRuODpLUirt3K3TmCA==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1hchvlf3apn8g8jq2743pw53sd6v6ay6xu6lqk0qufrjeccan9vzsc7hdfq
|
- recipient: age1hchvlf3apn8g8jq2743pw53sd6v6ay6xu6lqk0qufrjeccan9vzsc7hdfq
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBkUitINVFScFY5R2dKTWtC
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBPcm9zUm1XUkpLWm1Jb3Uw
|
||||||
ai83UmNVbzdWNTNMWUhHc2lRTW1ZVnVHdVc0CjlSVmVOc0FvOUVvZnVuQUVCells
|
RncveGozOW5SRThEM1Y4SFF5RDdxUEhZTUE4CjVESHE5R3JZK0krOXZDL0RHR0oy
|
||||||
eW9uc21sZ0dpTjQ4N2ZvbGsyYVo5dlUKLS0tIDdsSGdZcVZLbXowUzNsYTNlR3VP
|
Z3JKaEpydjRjeFFHck1ic2JTRU5yZTQKLS0tIGY2ck56eG95YnpDYlNqUDh5RVp1
|
||||||
N1JNQmhDVWdid0pHOEZxM1dBSmRrSjAKP9z3b9b1huO/iFxUVf34W4P/Xnok9It7
|
U3dRYkNleUtsQU1LMWpDbitJbnRIem8K+27HRtZihG8+k7ZC33XVfuXDFjC1e8lA
|
||||||
ENRMctqEmHIp3Je/p/fMWUArSznMpxm0ukmBb9bGn3NCRxG5sEs1lw==
|
kffmxp9kOEShZF3IKmAjVHFBiPXRyGk3fGPyQLmSMK2UOOfCy/a/qA==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
- recipient: age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0K0xxVUkyNWJtekFBdW0r
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBTZHlldDdSOEhjTklCSXQr
|
||||||
YUNBaUlzbmdNbktIUDEzVVlhSUtJTENHRDNFCjJpRHgycGFQbkhTUHRFNGpsNlJU
|
U2pXajFwZnNqQzZOTzY5b3lkMzlyREhXRWo4CmxId2F6NkNqeHNCSWNrcUJIY0Nw
|
||||||
L2puZkhwSlExb3pXTXZMNHFhL0pjZVkKLS0tIHgza01pZ2hzUDlITGlYYnVDTWNF
|
cGF6NXJaQnovK1FYSXQ2TkJSTFloTUEKLS0tIHRhWk5aZ0lDVkZaZEJobm9FTDNw
|
||||||
RkpIbUJMRlJ2ZXJPSHRUTlpZYUUxOG8KF27qYEyAyt8kN8H7mFO0wf8IkXH0NcWR
|
a29sZE1GL2ZQSk0vUEc1ZGhkUlpNRkEK9tfe7cNOznSKgxshd5Z6TQiNKp+XW6XH
|
||||||
w7Y1Nea6yMXHhEIazONJsmAkmLvQA+j7RxcTUI0Ej8qIxnJ0ZtT6RQ==
|
VvPgMqMitgiDYnUPj10bYo3kqhd0xZH2IhLXMnZnqqQ0I23zfPiNaw==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
- recipient: age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBpenhpVHJDajBMaVExeHJD
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB5bk9NVjJNWmMxUGd3cXRx
|
||||||
NFhuM2x4Y2xzR2I2S1JybkJVd1pZWDhoUVY0CklEVDRRcFBGeFMrbUwrOVh5ZUt3
|
amZ5SWJ3dHpHcnM4UHJxdmh6NnhFVmJQdldzCm95dHN3R21qSkE4Vm9VTnVPREp3
|
||||||
WW9DTDhMNWUvOFFEYnB1RFNUelg3TjAKLS0tIC9Ed3dVaTZRZjJSMHJIS0M5cmZ3
|
dUQyS1B4MWhhdmd3dk5LQ0htZEtpTWMKLS0tIGFaa3MxVExFYk1MY2loOFBvWm1o
|
||||||
eTlyWlZIS1VxcHlpSnBBaG1aUTVtR1kKE4DLKal6eYRf4N9ni7vd7lUcEJKeaIBJ
|
L0NoRStkeW9VZVdpWlhteC8yTnRmMUkKMYjUdE1rGgVR29FnhJ5OEVjTB1Rh5Mtu
|
||||||
AOQYspAD8NSNVc1QlVzClb9sipUxoCDLNOaKjlPLMkN0fOQbNmzhlQ==
|
M/DvlhW3a7tZU8nDF3IgG2GE5xOXZMDO9QWGdB8zO2RJZAr3Q+YIlA==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
- recipient: age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
- recipient: age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
||||||
enc: |
|
enc: |
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
-----BEGIN AGE ENCRYPTED FILE-----
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBGS1JKc092ZmRza0wydklU
|
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBU0xYMnhqOE0wdXdleStF
|
||||||
NUhTVHJtbzBpU1NBb3ZIYXgzMnlLVXBCcFU0Ci9idmJWd2RUaGM2V0VqVjY3SjBW
|
THcrY2NBQzNoRHdYTXY3ZmM5YXRZZkQ4aUZnCm9ad0IxSWxYT1JBd2RseUdVT1pi
|
||||||
dTZLNHVYUEhvOEx2QzJVN0RzL2RPOWMKLS0tIHlpV3RmR0F1b3BBK3hjWjFHb2pj
|
UXBuNzFxVlN0OWNTQU5BV2NiVEV0RUUKLS0tIGJHY0dzSDczUzcrV0RpTjE0czEy
|
||||||
WnJkUVowU3M0L09CSmxmeFBkUGRvQ3cKDS24pnHugCvkMCbiXd0R4Rk5xqn9IWC6
|
cWZMNUNlTzBRcEV5MjlRV1BsWGhoaUUKGhYaH8I0oPCfrbs7HbQKVOF/99rg3HXv
|
||||||
CErAOoAITdfrhoci4SG6LZu28de+OrKnO3W4wWm4DioSQgn3mVRmdg==
|
RRTXUI71/ejKIuxehOvifClQc3nUW73bWkASFQ0guUvO4R+c0xOgUg==
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age16prza00sqzuhwwcyakj6z4hvwkruwkqpmmrsn94a5ucgpkelncdq2ldctk
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBSY25GWkVoMk9jaGJlL2lj
|
|
||||||
cjQ1QW9XTTJVanRiS28rbmNMNmVKVTRDblZNCnJZUTNMYWpQOHlEbHI0eXZZQS91
|
|
||||||
bjdsdDFxL2VOYUoyblZhNEp3UXVtTncKLS0tIFFlU3BReWpYaHRjM2hBUlFiR2V5
|
|
||||||
S0t2dFdScW9RY2t6Y0hYN0N3d2dwa3MKNB9nsg3t6T0QzwB0tKk5JMxNGVZXH1cr
|
|
||||||
DJ/D8lE9sSV43oFx19p2ckzHigtFJQeS/bKaiWIR972vaoYmpLetSg==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
-----END AGE ENCRYPTED FILE-----
|
||||||
lastmodified: "2025-02-11T21:18:22Z"
|
lastmodified: "2025-02-11T21:18:22Z"
|
||||||
mac: ENC[AES256_GCM,data:5//boMp1awc/2XAkSASSCuobpkxa0E6IKf3GR8xHpMoCD30FJsCwV7PgX3fR8OuLEhOJ7UguqMNQdNqG37RMacreuDmI1J8oCFKp+3M2j4kCbXaEo8bw7WAtyjUez+SAXKzZWYmBibH0KOy6jdt+v0fdgy5hMBT4IFDofYRsyD0=,iv:6pD+SLwncpmal/FR4U8It2njvaQfUzzpALBCxa0NyME=,tag:4QN8ZFjdqck5ZgulF+FtbA==,type:str]
|
mac: ENC[AES256_GCM,data:5//boMp1awc/2XAkSASSCuobpkxa0E6IKf3GR8xHpMoCD30FJsCwV7PgX3fR8OuLEhOJ7UguqMNQdNqG37RMacreuDmI1J8oCFKp+3M2j4kCbXaEo8bw7WAtyjUez+SAXKzZWYmBibH0KOy6jdt+v0fdgy5hMBT4IFDofYRsyD0=,iv:6pD+SLwncpmal/FR4U8It2njvaQfUzzpALBCxa0NyME=,tag:4QN8ZFjdqck5ZgulF+FtbA==,type:str]
|
||||||
|
|||||||
@@ -1,8 +1,10 @@
|
|||||||
{ pkgs, config, ... }:
|
{ pkgs, config, ... }:
|
||||||
{
|
{
|
||||||
sops.secrets."actions-token-1" = {
|
vault.secrets.actions-token = {
|
||||||
sopsFile = ../../secrets/nix-cache01/actions_token_1;
|
secretPath = "hosts/nix-cache01/actions-token";
|
||||||
format = "binary";
|
extractKey = "token";
|
||||||
|
outputDir = "/run/secrets/actions-token-1";
|
||||||
|
services = [ "gitea-runner-actions1" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
virtualisation.podman = {
|
virtualisation.podman = {
|
||||||
@@ -13,7 +15,7 @@
|
|||||||
services.gitea-actions-runner.instances = {
|
services.gitea-actions-runner.instances = {
|
||||||
actions1 = {
|
actions1 = {
|
||||||
enable = true;
|
enable = true;
|
||||||
tokenFile = config.sops.secrets.actions-token-1.path;
|
tokenFile = "/run/secrets/actions-token-1";
|
||||||
name = "actions1.home.2rjus.net";
|
name = "actions1.home.2rjus.net";
|
||||||
settings = {
|
settings = {
|
||||||
log = {
|
log = {
|
||||||
|
|||||||
@@ -1,87 +0,0 @@
|
|||||||
{ config, ... }:
|
|
||||||
{
|
|
||||||
sops.secrets.authelia_ldap_password = {
|
|
||||||
format = "yaml";
|
|
||||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
|
||||||
key = "authelia_ldap_password";
|
|
||||||
restartUnits = [ "authelia-auth.service" ];
|
|
||||||
owner = "authelia-auth";
|
|
||||||
group = "authelia-auth";
|
|
||||||
};
|
|
||||||
sops.secrets.authelia_jwt_secret = {
|
|
||||||
format = "yaml";
|
|
||||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
|
||||||
key = "authelia_jwt_secret";
|
|
||||||
restartUnits = [ "authelia-auth.service" ];
|
|
||||||
owner = "authelia-auth";
|
|
||||||
group = "authelia-auth";
|
|
||||||
};
|
|
||||||
sops.secrets.authelia_storage_encryption_key_file = {
|
|
||||||
format = "yaml";
|
|
||||||
key = "authelia_storage_encryption_key_file";
|
|
||||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
|
||||||
restartUnits = [ "authelia-auth.service" ];
|
|
||||||
owner = "authelia-auth";
|
|
||||||
group = "authelia-auth";
|
|
||||||
};
|
|
||||||
sops.secrets.authelia_session_secret = {
|
|
||||||
format = "yaml";
|
|
||||||
key = "authelia_session_secret";
|
|
||||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
|
||||||
restartUnits = [ "authelia-auth.service" ];
|
|
||||||
owner = "authelia-auth";
|
|
||||||
group = "authelia-auth";
|
|
||||||
};
|
|
||||||
|
|
||||||
services.authelia.instances."auth" = {
|
|
||||||
enable = true;
|
|
||||||
environmentVariables = {
|
|
||||||
AUTHELIA_AUTHENTICATION_BACKEND_LDAP_PASSWORD_FILE =
|
|
||||||
config.sops.secrets.authelia_ldap_password.path;
|
|
||||||
AUTHELIA_SESSION_SECRET_FILE = config.sops.secrets.authelia_session_secret.path;
|
|
||||||
};
|
|
||||||
secrets = {
|
|
||||||
jwtSecretFile = config.sops.secrets.authelia_jwt_secret.path;
|
|
||||||
storageEncryptionKeyFile = config.sops.secrets.authelia_storage_encryption_key_file.path;
|
|
||||||
};
|
|
||||||
settings = {
|
|
||||||
access_control = {
|
|
||||||
default_policy = "two_factor";
|
|
||||||
};
|
|
||||||
session = {
|
|
||||||
# secret = "{{- fileContent \"${config.sops.secrets.authelia_session_secret.path}\" }}";
|
|
||||||
cookies = [
|
|
||||||
{
|
|
||||||
domain = "home.2rjus.net";
|
|
||||||
authelia_url = "https://auth.home.2rjus.net";
|
|
||||||
default_redirection_url = "https://dashboard.home.2rjus.net";
|
|
||||||
name = "authelia_session";
|
|
||||||
same_site = "lax";
|
|
||||||
inactivity = "1h";
|
|
||||||
expiration = "24h";
|
|
||||||
remember_me = "30d";
|
|
||||||
}
|
|
||||||
];
|
|
||||||
};
|
|
||||||
notifier = {
|
|
||||||
filesystem.filename = "/var/lib/authelia-auth/notification.txt";
|
|
||||||
};
|
|
||||||
storage = {
|
|
||||||
local.path = "/var/lib/authelia-auth/db.sqlite3";
|
|
||||||
};
|
|
||||||
authentication_backend = {
|
|
||||||
password_reset = {
|
|
||||||
disable = false;
|
|
||||||
};
|
|
||||||
ldap = {
|
|
||||||
address = "ldap://127.0.0.1:3890";
|
|
||||||
implementation = "lldap";
|
|
||||||
timeout = "5s";
|
|
||||||
base_dn = "dc=home,dc=2rjus,dc=net";
|
|
||||||
user = "uid=authelia_ldap_user,ou=people,dc=home,dc=2rjus,dc=net";
|
|
||||||
# password = "{{- fileContent \"${config.sops.secrets.authelia_ldap_password.path}\" -}}";
|
|
||||||
};
|
|
||||||
};
|
|
||||||
};
|
|
||||||
};
|
|
||||||
}
|
|
||||||
@@ -69,6 +69,44 @@
|
|||||||
frontend = true;
|
frontend = true;
|
||||||
permit_join = false;
|
permit_join = false;
|
||||||
serial.port = "/dev/ttyUSB0";
|
serial.port = "/dev/ttyUSB0";
|
||||||
|
|
||||||
|
# Inline device configuration (replaces devices.yaml)
|
||||||
|
# This allows declarative management and homeassistant overrides
|
||||||
|
devices = {
|
||||||
|
# Temperature sensors with battery fix
|
||||||
|
# WSDCGQ12LM sensors report battery: 0 due to firmware quirk
|
||||||
|
# Override battery calculation using voltage (mV): (voltage - 2100) / 9
|
||||||
|
"0x54ef441000a547bd" = {
|
||||||
|
friendly_name = "0x54ef441000a547bd";
|
||||||
|
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||||
|
};
|
||||||
|
"0x54ef441000a54d3c" = {
|
||||||
|
friendly_name = "0x54ef441000a54d3c";
|
||||||
|
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||||
|
};
|
||||||
|
"0x54ef441000a564b6" = {
|
||||||
|
friendly_name = "temp_server";
|
||||||
|
homeassistant.battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||||
|
};
|
||||||
|
|
||||||
|
# Other sensors
|
||||||
|
"0x00124b0025495463".friendly_name = "0x00124b0025495463"; # SONOFF temp sensor (battery works)
|
||||||
|
"0x54ef4410009ac117".friendly_name = "0x54ef4410009ac117"; # Water leak sensor
|
||||||
|
|
||||||
|
# Buttons
|
||||||
|
"0x54ef441000a1f907".friendly_name = "btn_livingroom";
|
||||||
|
"0x54ef441000a1ee71".friendly_name = "btn_bedroom";
|
||||||
|
|
||||||
|
# Philips Hue lights
|
||||||
|
"0x001788010d1b599a" = {
|
||||||
|
friendly_name = "0x001788010d1b599a";
|
||||||
|
transition = 5;
|
||||||
|
};
|
||||||
|
"0x001788010d253b99".friendly_name = "0x001788010d253b99";
|
||||||
|
"0x001788010e371aa4".friendly_name = "0x001788010e371aa4";
|
||||||
|
"0x001788010dc5f003".friendly_name = "0x001788010dc5f003";
|
||||||
|
"0x001788010dc35d06".friendly_name = "0x001788010dc35d06";
|
||||||
|
};
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -86,22 +86,6 @@
|
|||||||
}
|
}
|
||||||
reverse_proxy http://jelly01.home.2rjus.net:8096
|
reverse_proxy http://jelly01.home.2rjus.net:8096
|
||||||
}
|
}
|
||||||
lldap.home.2rjus.net {
|
|
||||||
log {
|
|
||||||
output file /var/log/caddy/auth.log {
|
|
||||||
mode 644
|
|
||||||
}
|
|
||||||
}
|
|
||||||
reverse_proxy http://auth01.home.2rjus.net:17170
|
|
||||||
}
|
|
||||||
auth.home.2rjus.net {
|
|
||||||
log {
|
|
||||||
output file /var/log/caddy/auth.log {
|
|
||||||
mode 644
|
|
||||||
}
|
|
||||||
}
|
|
||||||
reverse_proxy http://auth01.home.2rjus.net:9091
|
|
||||||
}
|
|
||||||
pyroscope.home.2rjus.net {
|
pyroscope.home.2rjus.net {
|
||||||
log {
|
log {
|
||||||
output file /var/log/caddy/pyroscope.log {
|
output file /var/log/caddy/pyroscope.log {
|
||||||
|
|||||||
@@ -1,38 +0,0 @@
|
|||||||
{ config, ... }:
|
|
||||||
{
|
|
||||||
sops.secrets.lldap_user_pass = {
|
|
||||||
format = "yaml";
|
|
||||||
key = "lldap_user_pass";
|
|
||||||
sopsFile = ../../secrets/auth01/secrets.yaml;
|
|
||||||
restartUnits = [ "lldap.service" ];
|
|
||||||
group = "acme";
|
|
||||||
mode = "0440";
|
|
||||||
};
|
|
||||||
|
|
||||||
services.lldap = {
|
|
||||||
enable = true;
|
|
||||||
settings = {
|
|
||||||
ldap_base_dn = "dc=home,dc=2rjus,dc=net";
|
|
||||||
ldap_user_email = "admin@home.2rjus.net";
|
|
||||||
ldap_user_dn = "admin";
|
|
||||||
ldap_user_pass_file = config.sops.secrets.lldap_user_pass.path;
|
|
||||||
ldaps_options = {
|
|
||||||
enabled = true;
|
|
||||||
port = 6360;
|
|
||||||
cert_file = "/var/lib/acme/auth01.home.2rjus.net/cert.pem";
|
|
||||||
key_file = "/var/lib/acme/auth01.home.2rjus.net/key.pem";
|
|
||||||
};
|
|
||||||
};
|
|
||||||
};
|
|
||||||
systemd.services.lldap = {
|
|
||||||
serviceConfig = {
|
|
||||||
SupplementaryGroups = [ "acme" ];
|
|
||||||
};
|
|
||||||
};
|
|
||||||
security.acme.certs."auth01.home.2rjus.net" = {
|
|
||||||
listenHTTP = ":80";
|
|
||||||
reloadServices = [ "lldap" ];
|
|
||||||
extraDomainNames = [ "ldap.home.2rjus.net" ];
|
|
||||||
enableDebugLogs = true;
|
|
||||||
};
|
|
||||||
}
|
|
||||||
@@ -1,12 +1,18 @@
|
|||||||
{ pkgs, config, ... }:
|
{ pkgs, config, ... }:
|
||||||
{
|
{
|
||||||
sops.secrets."nats_nkey" = { };
|
vault.secrets.nats-nkey = {
|
||||||
|
secretPath = "shared/nats/nkey";
|
||||||
|
extractKey = "nkey";
|
||||||
|
outputDir = "/run/secrets/nats_nkey";
|
||||||
|
services = [ "alerttonotify" ];
|
||||||
|
};
|
||||||
|
|
||||||
systemd.services."alerttonotify" = {
|
systemd.services."alerttonotify" = {
|
||||||
enable = true;
|
enable = true;
|
||||||
wants = [ "network-online.target" ];
|
wants = [ "network-online.target" ];
|
||||||
after = [
|
after = [
|
||||||
"network-online.target"
|
"network-online.target"
|
||||||
"sops-nix.service"
|
"vault-secret-nats-nkey.service"
|
||||||
];
|
];
|
||||||
wantedBy = [ "multi-user.target" ];
|
wantedBy = [ "multi-user.target" ];
|
||||||
restartIfChanged = true;
|
restartIfChanged = true;
|
||||||
|
|||||||
@@ -1,14 +1,82 @@
|
|||||||
{ self, lib, ... }:
|
{ self, lib, pkgs, ... }:
|
||||||
let
|
let
|
||||||
monLib = import ../../lib/monitoring.nix { inherit lib; };
|
monLib = import ../../lib/monitoring.nix { inherit lib; };
|
||||||
externalTargets = import ./external-targets.nix;
|
externalTargets = import ./external-targets.nix;
|
||||||
|
|
||||||
nodeExporterTargets = monLib.generateNodeExporterTargets self externalTargets;
|
nodeExporterTargets = monLib.generateNodeExporterTargets self externalTargets;
|
||||||
autoScrapeConfigs = monLib.generateScrapeConfigs self externalTargets;
|
autoScrapeConfigs = monLib.generateScrapeConfigs self externalTargets;
|
||||||
|
|
||||||
|
# Script to fetch AppRole token for Prometheus to use when scraping OpenBao metrics
|
||||||
|
fetchOpenbaoToken = pkgs.writeShellApplication {
|
||||||
|
name = "fetch-openbao-token";
|
||||||
|
runtimeInputs = [ pkgs.curl pkgs.jq ];
|
||||||
|
text = ''
|
||||||
|
VAULT_ADDR="https://vault01.home.2rjus.net:8200"
|
||||||
|
APPROLE_DIR="/var/lib/vault/approle"
|
||||||
|
OUTPUT_FILE="/run/secrets/prometheus/openbao-token"
|
||||||
|
|
||||||
|
# Read AppRole credentials
|
||||||
|
if [ ! -f "$APPROLE_DIR/role-id" ] || [ ! -f "$APPROLE_DIR/secret-id" ]; then
|
||||||
|
echo "AppRole credentials not found at $APPROLE_DIR" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
ROLE_ID=$(cat "$APPROLE_DIR/role-id")
|
||||||
|
SECRET_ID=$(cat "$APPROLE_DIR/secret-id")
|
||||||
|
|
||||||
|
# Authenticate to Vault
|
||||||
|
AUTH_RESPONSE=$(curl -sf -k -X POST \
|
||||||
|
-d "{\"role_id\":\"$ROLE_ID\",\"secret_id\":\"$SECRET_ID\"}" \
|
||||||
|
"$VAULT_ADDR/v1/auth/approle/login")
|
||||||
|
|
||||||
|
# Extract token
|
||||||
|
VAULT_TOKEN=$(echo "$AUTH_RESPONSE" | jq -r '.auth.client_token')
|
||||||
|
if [ -z "$VAULT_TOKEN" ] || [ "$VAULT_TOKEN" = "null" ]; then
|
||||||
|
echo "Failed to extract Vault token from response" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Write token to file
|
||||||
|
mkdir -p "$(dirname "$OUTPUT_FILE")"
|
||||||
|
echo -n "$VAULT_TOKEN" > "$OUTPUT_FILE"
|
||||||
|
chown prometheus:prometheus "$OUTPUT_FILE"
|
||||||
|
chmod 0400 "$OUTPUT_FILE"
|
||||||
|
|
||||||
|
echo "Successfully fetched OpenBao token"
|
||||||
|
'';
|
||||||
|
};
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
|
# Systemd service to fetch AppRole token for Prometheus OpenBao scraping
|
||||||
|
# The token is used to authenticate when scraping /v1/sys/metrics
|
||||||
|
systemd.services.prometheus-openbao-token = {
|
||||||
|
description = "Fetch OpenBao token for Prometheus metrics scraping";
|
||||||
|
after = [ "network-online.target" ];
|
||||||
|
wants = [ "network-online.target" ];
|
||||||
|
before = [ "prometheus.service" ];
|
||||||
|
requiredBy = [ "prometheus.service" ];
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
ExecStart = lib.getExe fetchOpenbaoToken;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
# Timer to periodically refresh the token (AppRole tokens have 1-hour TTL)
|
||||||
|
systemd.timers.prometheus-openbao-token = {
|
||||||
|
description = "Refresh OpenBao token for Prometheus";
|
||||||
|
wantedBy = [ "timers.target" ];
|
||||||
|
timerConfig = {
|
||||||
|
OnBootSec = "5min";
|
||||||
|
OnUnitActiveSec = "30min";
|
||||||
|
RandomizedDelaySec = "5min";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
services.prometheus = {
|
services.prometheus = {
|
||||||
enable = true;
|
enable = true;
|
||||||
|
# syntax-only check because we use external credential files (e.g., openbao-token)
|
||||||
|
checkConfig = "syntax-only";
|
||||||
alertmanager = {
|
alertmanager = {
|
||||||
enable = true;
|
enable = true;
|
||||||
configuration = {
|
configuration = {
|
||||||
@@ -61,6 +129,15 @@ in
|
|||||||
}
|
}
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
|
# Systemd exporter on all hosts (same targets, different port)
|
||||||
|
{
|
||||||
|
job_name = "systemd-exporter";
|
||||||
|
static_configs = [
|
||||||
|
{
|
||||||
|
targets = map (t: builtins.replaceStrings [":9100"] [":9558"] t) nodeExporterTargets;
|
||||||
|
}
|
||||||
|
];
|
||||||
|
}
|
||||||
# Local monitoring services (not auto-generated)
|
# Local monitoring services (not auto-generated)
|
||||||
{
|
{
|
||||||
job_name = "prometheus";
|
job_name = "prometheus";
|
||||||
@@ -152,6 +229,22 @@ in
|
|||||||
}
|
}
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
|
# OpenBao metrics with bearer token auth
|
||||||
|
{
|
||||||
|
job_name = "openbao";
|
||||||
|
scheme = "https";
|
||||||
|
metrics_path = "/v1/sys/metrics";
|
||||||
|
params = {
|
||||||
|
format = [ "prometheus" ];
|
||||||
|
};
|
||||||
|
static_configs = [{
|
||||||
|
targets = [ "vault01.home.2rjus.net:8200" ];
|
||||||
|
}];
|
||||||
|
authorization = {
|
||||||
|
type = "Bearer";
|
||||||
|
credentials_file = "/run/secrets/prometheus/openbao-token";
|
||||||
|
};
|
||||||
|
}
|
||||||
] ++ autoScrapeConfigs;
|
] ++ autoScrapeConfigs;
|
||||||
|
|
||||||
pushgateway = {
|
pushgateway = {
|
||||||
|
|||||||
@@ -1,14 +1,16 @@
|
|||||||
{ config, ... }:
|
{ config, ... }:
|
||||||
{
|
{
|
||||||
sops.secrets.pve_exporter = {
|
vault.secrets.pve-exporter = {
|
||||||
format = "yaml";
|
secretPath = "hosts/monitoring01/pve-exporter";
|
||||||
sopsFile = ../../secrets/monitoring01/pve-exporter.yaml;
|
extractKey = "config";
|
||||||
key = "";
|
outputDir = "/run/secrets/pve_exporter";
|
||||||
mode = "0444";
|
mode = "0444";
|
||||||
|
services = [ "prometheus-pve-exporter" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
services.prometheus.exporters.pve = {
|
services.prometheus.exporters.pve = {
|
||||||
enable = true;
|
enable = true;
|
||||||
configFile = config.sops.secrets.pve_exporter.path;
|
configFile = "/run/secrets/pve_exporter";
|
||||||
collectors = {
|
collectors = {
|
||||||
cluster = false;
|
cluster = false;
|
||||||
replication = false;
|
replication = false;
|
||||||
|
|||||||
@@ -75,12 +75,12 @@ groups:
|
|||||||
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
description: "Based on the last 6h trend, the root filesystem on {{ $labels.instance }} is predicted to run out of space within 24 hours."
|
||||||
- alert: systemd_not_running
|
- alert: systemd_not_running
|
||||||
expr: node_systemd_system_running == 0
|
expr: node_systemd_system_running == 0
|
||||||
for: 5m
|
for: 10m
|
||||||
labels:
|
labels:
|
||||||
severity: critical
|
severity: warning
|
||||||
annotations:
|
annotations:
|
||||||
summary: "Systemd not in running state on {{ $labels.instance }}"
|
summary: "Systemd not in running state on {{ $labels.instance }}"
|
||||||
description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state."
|
description: "Systemd is not in running state on {{ $labels.instance }}. The system may be in a degraded state. Note: brief degraded states during nixos-rebuild are normal."
|
||||||
- alert: high_file_descriptors
|
- alert: high_file_descriptors
|
||||||
expr: node_filefd_allocated / node_filefd_maximum > 0.8
|
expr: node_filefd_allocated / node_filefd_maximum > 0.8
|
||||||
for: 5m
|
for: 5m
|
||||||
@@ -115,6 +115,14 @@ groups:
|
|||||||
annotations:
|
annotations:
|
||||||
summary: "NSD not running on {{ $labels.instance }}"
|
summary: "NSD not running on {{ $labels.instance }}"
|
||||||
description: "NSD has been down on {{ $labels.instance }} more than 5 minutes."
|
description: "NSD has been down on {{ $labels.instance }} more than 5 minutes."
|
||||||
|
- alert: unbound_low_cache_hit_ratio
|
||||||
|
expr: (rate(unbound_cache_hits_total[5m]) / (rate(unbound_cache_hits_total[5m]) + rate(unbound_cache_misses_total[5m]))) < 0.5
|
||||||
|
for: 15m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Low DNS cache hit ratio on {{ $labels.instance }}"
|
||||||
|
description: "Unbound cache hit ratio is below 50% on {{ $labels.instance }}."
|
||||||
- name: http_proxy_rules
|
- name: http_proxy_rules
|
||||||
rules:
|
rules:
|
||||||
- alert: caddy_down
|
- alert: caddy_down
|
||||||
@@ -151,6 +159,14 @@ groups:
|
|||||||
annotations:
|
annotations:
|
||||||
summary: "NATS not running on {{ $labels.instance }}"
|
summary: "NATS not running on {{ $labels.instance }}"
|
||||||
description: "NATS has been down on {{ $labels.instance }} more than 5 minutes."
|
description: "NATS has been down on {{ $labels.instance }} more than 5 minutes."
|
||||||
|
- alert: nats_slow_consumers
|
||||||
|
expr: nats_core_slow_consumer_count > 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "NATS has slow consumers on {{ $labels.instance }}"
|
||||||
|
description: "NATS has {{ $value }} slow consumers on {{ $labels.instance }}."
|
||||||
- name: nix_cache_rules
|
- name: nix_cache_rules
|
||||||
rules:
|
rules:
|
||||||
- alert: build_flakes_service_not_active_recently
|
- alert: build_flakes_service_not_active_recently
|
||||||
@@ -210,6 +226,14 @@ groups:
|
|||||||
annotations:
|
annotations:
|
||||||
summary: "Mosquitto not running on {{ $labels.instance }}"
|
summary: "Mosquitto not running on {{ $labels.instance }}"
|
||||||
description: "Mosquitto has been down on {{ $labels.instance }} more than 5 minutes."
|
description: "Mosquitto has been down on {{ $labels.instance }} more than 5 minutes."
|
||||||
|
- alert: zigbee_sensor_stale
|
||||||
|
expr: (time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 7200
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Zigbee sensor {{ $labels.friendly_name }} is stale"
|
||||||
|
description: "Zigbee temperature sensor {{ $labels.entity }} has not reported data for over 2 hours. The sensor may have a dead battery or connectivity issues."
|
||||||
- name: smartctl_rules
|
- name: smartctl_rules
|
||||||
rules:
|
rules:
|
||||||
- alert: smart_critical_warning
|
- alert: smart_critical_warning
|
||||||
@@ -364,3 +388,65 @@ groups:
|
|||||||
annotations:
|
annotations:
|
||||||
summary: "Proxmox VM {{ $labels.id }} is stopped"
|
summary: "Proxmox VM {{ $labels.id }} is stopped"
|
||||||
description: "Proxmox VM {{ $labels.id }} ({{ $labels.name }}) has onboot=1 but is stopped."
|
description: "Proxmox VM {{ $labels.id }} ({{ $labels.name }}) has onboot=1 but is stopped."
|
||||||
|
- name: postgres_rules
|
||||||
|
rules:
|
||||||
|
- alert: postgres_down
|
||||||
|
expr: node_systemd_unit_state{instance="pgdb1.home.2rjus.net:9100", name="postgresql.service", state="active"} == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "PostgreSQL not running on {{ $labels.instance }}"
|
||||||
|
description: "PostgreSQL has been down on {{ $labels.instance }} more than 5 minutes."
|
||||||
|
- alert: postgres_exporter_down
|
||||||
|
expr: up{job="postgres"} == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "PostgreSQL exporter down on {{ $labels.instance }}"
|
||||||
|
description: "Cannot scrape PostgreSQL metrics from {{ $labels.instance }}."
|
||||||
|
- alert: postgres_high_connections
|
||||||
|
expr: pg_stat_activity_count / pg_settings_max_connections > 0.8
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "PostgreSQL connection pool near exhaustion on {{ $labels.instance }}"
|
||||||
|
description: "PostgreSQL is using over 80% of max_connections on {{ $labels.instance }}."
|
||||||
|
- name: jellyfin_rules
|
||||||
|
rules:
|
||||||
|
- alert: jellyfin_down
|
||||||
|
expr: up{job="jellyfin"} == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Jellyfin not responding on {{ $labels.instance }}"
|
||||||
|
description: "Cannot scrape Jellyfin metrics from {{ $labels.instance }} for 5 minutes."
|
||||||
|
- name: vault_rules
|
||||||
|
rules:
|
||||||
|
- alert: openbao_down
|
||||||
|
expr: node_systemd_unit_state{instance="vault01.home.2rjus.net:9100", name="openbao.service", state="active"} == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "OpenBao not running on {{ $labels.instance }}"
|
||||||
|
description: "OpenBao has been down on {{ $labels.instance }} more than 5 minutes."
|
||||||
|
- alert: openbao_sealed
|
||||||
|
expr: vault_core_unsealed == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "OpenBao is sealed on {{ $labels.instance }}"
|
||||||
|
description: "OpenBao has been sealed on {{ $labels.instance }} for more than 5 minutes."
|
||||||
|
- alert: openbao_scrape_down
|
||||||
|
expr: up{job="openbao"} == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Cannot scrape OpenBao metrics from {{ $labels.instance }}"
|
||||||
|
description: "OpenBao metrics endpoint is not responding on {{ $labels.instance }}."
|
||||||
|
|||||||
@@ -1,10 +1,28 @@
|
|||||||
{ ... }:
|
{ ... }:
|
||||||
{
|
{
|
||||||
|
homelab.monitoring.scrapeTargets = [
|
||||||
|
{
|
||||||
|
job_name = "nats";
|
||||||
|
port = 7777;
|
||||||
|
}
|
||||||
|
];
|
||||||
|
|
||||||
|
services.prometheus.exporters.nats = {
|
||||||
|
enable = true;
|
||||||
|
url = "http://localhost:8222";
|
||||||
|
extraFlags = [
|
||||||
|
"-varz" # General server info
|
||||||
|
"-connz" # Connection info
|
||||||
|
"-jsz=all" # JetStream info
|
||||||
|
];
|
||||||
|
};
|
||||||
|
|
||||||
services.nats = {
|
services.nats = {
|
||||||
enable = true;
|
enable = true;
|
||||||
jetstream = true;
|
jetstream = true;
|
||||||
serverName = "nats1";
|
serverName = "nats1";
|
||||||
settings = {
|
settings = {
|
||||||
|
http_port = 8222;
|
||||||
accounts = {
|
accounts = {
|
||||||
ADMIN = {
|
ADMIN = {
|
||||||
users = [
|
users = [
|
||||||
@@ -22,6 +40,48 @@
|
|||||||
}
|
}
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
DEPLOY = {
|
||||||
|
users = [
|
||||||
|
# Shared listener (all hosts use this)
|
||||||
|
{
|
||||||
|
nkey = "UCCZJSUGLCSLBBKHBPL4QA66TUMQUGIXGLIFTWDEH43MGWM3LDD232X4";
|
||||||
|
permissions = {
|
||||||
|
subscribe = [
|
||||||
|
"deploy.test.>"
|
||||||
|
"deploy.prod.>"
|
||||||
|
"deploy.discover"
|
||||||
|
];
|
||||||
|
publish = [
|
||||||
|
"deploy.responses.>"
|
||||||
|
"deploy.discover"
|
||||||
|
];
|
||||||
|
};
|
||||||
|
}
|
||||||
|
# Test deployer (MCP without admin)
|
||||||
|
{
|
||||||
|
nkey = "UBR66CX2ZNY5XNVQF5VBG4WFAF54LSGUYCUNNCEYRILDQ4NXDAD2THZU";
|
||||||
|
permissions = {
|
||||||
|
publish = [
|
||||||
|
"deploy.test.>"
|
||||||
|
"deploy.discover"
|
||||||
|
];
|
||||||
|
subscribe = [
|
||||||
|
"deploy.responses.>"
|
||||||
|
"deploy.discover"
|
||||||
|
];
|
||||||
|
};
|
||||||
|
}
|
||||||
|
# Admin deployer (full access)
|
||||||
|
{
|
||||||
|
nkey = "UD2BFB7DLM67P5UUVCKBUJMCHADIZLGGVUNSRLZE2ZC66FW2XT44P73Y";
|
||||||
|
permissions = {
|
||||||
|
publish = [ "deploy.>" ];
|
||||||
|
subscribe = [ "deploy.>" ];
|
||||||
|
};
|
||||||
|
}
|
||||||
|
];
|
||||||
|
};
|
||||||
};
|
};
|
||||||
system_account = "ADMIN";
|
system_account = "ADMIN";
|
||||||
jetstream = {
|
jetstream = {
|
||||||
|
|||||||
@@ -1,14 +1,16 @@
|
|||||||
{ pkgs, config, ... }:
|
{ pkgs, config, ... }:
|
||||||
{
|
{
|
||||||
sops.secrets."cache-secret" = {
|
vault.secrets.cache-secret = {
|
||||||
sopsFile = ../../secrets/nix-cache01/cache-secret;
|
secretPath = "hosts/nix-cache01/cache-secret";
|
||||||
format = "binary";
|
extractKey = "key";
|
||||||
|
outputDir = "/run/secrets/cache-secret";
|
||||||
|
services = [ "harmonia" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
services.harmonia = {
|
services.harmonia = {
|
||||||
enable = true;
|
enable = true;
|
||||||
package = pkgs.unstable.harmonia;
|
package = pkgs.unstable.harmonia;
|
||||||
signKeyPaths = [ config.sops.secrets.cache-secret.path ];
|
signKeyPaths = [ "/run/secrets/cache-secret" ];
|
||||||
};
|
};
|
||||||
systemd.services.harmonia = {
|
systemd.services.harmonia = {
|
||||||
environment.RUST_LOG = "info,actix_web=debug";
|
environment.RUST_LOG = "info,actix_web=debug";
|
||||||
|
|||||||
@@ -12,8 +12,11 @@ let
|
|||||||
};
|
};
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
sops.secrets.ns_xfer_key = {
|
vault.secrets.ns-xfer-key = {
|
||||||
path = "/etc/nsd/xfer.key";
|
secretPath = "shared/dns/xfer-key";
|
||||||
|
extractKey = "key";
|
||||||
|
outputDir = "/etc/nsd/xfer.key";
|
||||||
|
services = [ "nsd" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
networking.firewall.allowedTCPPorts = [ 8053 ];
|
networking.firewall.allowedTCPPorts = [ 8053 ];
|
||||||
|
|||||||
@@ -1,10 +1,24 @@
|
|||||||
{ pkgs, ... }: {
|
{ pkgs, ... }: {
|
||||||
|
homelab.monitoring.scrapeTargets = [{
|
||||||
|
job_name = "unbound";
|
||||||
|
port = 9167;
|
||||||
|
}];
|
||||||
|
|
||||||
networking.firewall.allowedTCPPorts = [
|
networking.firewall.allowedTCPPorts = [
|
||||||
53
|
53
|
||||||
];
|
];
|
||||||
networking.firewall.allowedUDPPorts = [
|
networking.firewall.allowedUDPPorts = [
|
||||||
53
|
53
|
||||||
];
|
];
|
||||||
|
|
||||||
|
services.prometheus.exporters.unbound = {
|
||||||
|
enable = true;
|
||||||
|
unbound.host = "unix:///run/unbound/unbound.ctl";
|
||||||
|
};
|
||||||
|
|
||||||
|
# Grant exporter access to unbound socket
|
||||||
|
systemd.services.prometheus-unbound-exporter.serviceConfig.SupplementaryGroups = [ "unbound" ];
|
||||||
|
|
||||||
services.unbound = {
|
services.unbound = {
|
||||||
enable = true;
|
enable = true;
|
||||||
|
|
||||||
@@ -23,6 +37,11 @@
|
|||||||
do-ip6 = "no";
|
do-ip6 = "no";
|
||||||
do-udp = "yes";
|
do-udp = "yes";
|
||||||
do-tcp = "yes";
|
do-tcp = "yes";
|
||||||
|
extended-statistics = true;
|
||||||
|
};
|
||||||
|
remote-control = {
|
||||||
|
control-enable = true;
|
||||||
|
control-interface = "/run/unbound/unbound.ctl";
|
||||||
};
|
};
|
||||||
stub-zone = {
|
stub-zone = {
|
||||||
name = "home.2rjus.net";
|
name = "home.2rjus.net";
|
||||||
|
|||||||
@@ -12,8 +12,11 @@ let
|
|||||||
};
|
};
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
sops.secrets.ns_xfer_key = {
|
vault.secrets.ns-xfer-key = {
|
||||||
path = "/etc/nsd/xfer.key";
|
secretPath = "shared/dns/xfer-key";
|
||||||
|
extractKey = "key";
|
||||||
|
outputDir = "/etc/nsd/xfer.key";
|
||||||
|
services = [ "nsd" ];
|
||||||
};
|
};
|
||||||
networking.firewall.allowedTCPPorts = [ 8053 ];
|
networking.firewall.allowedTCPPorts = [ 8053 ];
|
||||||
networking.firewall.allowedUDPPorts = [ 8053 ];
|
networking.firewall.allowedUDPPorts = [ 8053 ];
|
||||||
|
|||||||
@@ -1,5 +1,15 @@
|
|||||||
{ pkgs, ... }:
|
{ pkgs, ... }:
|
||||||
{
|
{
|
||||||
|
homelab.monitoring.scrapeTargets = [{
|
||||||
|
job_name = "postgres";
|
||||||
|
port = 9187;
|
||||||
|
}];
|
||||||
|
|
||||||
|
services.prometheus.exporters.postgres = {
|
||||||
|
enable = true;
|
||||||
|
runAsLocalSuperUser = true; # Use peer auth as postgres user
|
||||||
|
};
|
||||||
|
|
||||||
services.postgresql = {
|
services.postgresql = {
|
||||||
enable = true;
|
enable = true;
|
||||||
enableJIT = true;
|
enableJIT = true;
|
||||||
|
|||||||
@@ -166,6 +166,11 @@ in
|
|||||||
settings = {
|
settings = {
|
||||||
ui = true;
|
ui = true;
|
||||||
|
|
||||||
|
telemetry = {
|
||||||
|
prometheus_retention_time = "60s";
|
||||||
|
disable_hostname = true;
|
||||||
|
};
|
||||||
|
|
||||||
storage.file.path = "/var/lib/openbao";
|
storage.file.path = "/var/lib/openbao";
|
||||||
listener.default = {
|
listener.default = {
|
||||||
type = "tcp";
|
type = "tcp";
|
||||||
|
|||||||
@@ -3,7 +3,9 @@
|
|||||||
imports = [
|
imports = [
|
||||||
./acme.nix
|
./acme.nix
|
||||||
./autoupgrade.nix
|
./autoupgrade.nix
|
||||||
|
./homelab-deploy.nix
|
||||||
./monitoring
|
./monitoring
|
||||||
|
./motd.nix
|
||||||
./packages.nix
|
./packages.nix
|
||||||
./nix.nix
|
./nix.nix
|
||||||
./root-user.nix
|
./root-user.nix
|
||||||
@@ -11,7 +13,5 @@
|
|||||||
./sops.nix
|
./sops.nix
|
||||||
./sshd.nix
|
./sshd.nix
|
||||||
./vault-secrets.nix
|
./vault-secrets.nix
|
||||||
|
|
||||||
../modules/homelab
|
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
|
|||||||
37
system/homelab-deploy.nix
Normal file
37
system/homelab-deploy.nix
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
{ config, lib, ... }:
|
||||||
|
|
||||||
|
let
|
||||||
|
hostCfg = config.homelab.host;
|
||||||
|
in
|
||||||
|
{
|
||||||
|
config = lib.mkIf config.homelab.deploy.enable {
|
||||||
|
# Fetch listener NKey from Vault
|
||||||
|
vault.secrets.homelab-deploy-nkey = {
|
||||||
|
secretPath = "shared/homelab-deploy/listener-nkey";
|
||||||
|
extractKey = "nkey";
|
||||||
|
};
|
||||||
|
|
||||||
|
# Enable homelab-deploy listener
|
||||||
|
services.homelab-deploy.listener = {
|
||||||
|
enable = true;
|
||||||
|
tier = hostCfg.tier;
|
||||||
|
role = hostCfg.role;
|
||||||
|
natsUrl = "nats://nats1.home.2rjus.net:4222";
|
||||||
|
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
||||||
|
flakeUrl = "git+https://git.t-juice.club/torjus/nixos-servers.git";
|
||||||
|
metrics.enable = true;
|
||||||
|
};
|
||||||
|
|
||||||
|
# Expose metrics for Prometheus scraping
|
||||||
|
homelab.monitoring.scrapeTargets = [{
|
||||||
|
job_name = "homelab-deploy";
|
||||||
|
port = 9972;
|
||||||
|
}];
|
||||||
|
|
||||||
|
# Ensure listener starts after vault secret is available
|
||||||
|
systemd.services.homelab-deploy-listener = {
|
||||||
|
after = [ "vault-secret-homelab-deploy-nkey.service" ];
|
||||||
|
requires = [ "vault-secret-homelab-deploy-nkey.service" ];
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
@@ -9,4 +9,30 @@
|
|||||||
"processes"
|
"processes"
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
services.prometheus.exporters.systemd = {
|
||||||
|
enable = true;
|
||||||
|
# Default port: 9558
|
||||||
|
extraFlags = [
|
||||||
|
"--systemd.collector.enable-restart-count"
|
||||||
|
"--systemd.collector.enable-ip-accounting"
|
||||||
|
];
|
||||||
|
};
|
||||||
|
|
||||||
|
services.prometheus.exporters.nixos = {
|
||||||
|
enable = true;
|
||||||
|
# Default port: 9971
|
||||||
|
flake = {
|
||||||
|
enable = true;
|
||||||
|
url = "git+https://git.t-juice.club/torjus/nixos-servers.git";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
# Register nixos-exporter as a Prometheus scrape target
|
||||||
|
homelab.monitoring.scrapeTargets = [
|
||||||
|
{
|
||||||
|
job_name = "nixos-exporter";
|
||||||
|
port = 9971;
|
||||||
|
}
|
||||||
|
];
|
||||||
}
|
}
|
||||||
|
|||||||
28
system/motd.nix
Normal file
28
system/motd.nix
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
{ config, lib, self, ... }:
|
||||||
|
|
||||||
|
let
|
||||||
|
hostname = config.networking.hostName;
|
||||||
|
domain = config.networking.domain or "";
|
||||||
|
fqdn = if domain != "" then "${hostname}.${domain}" else hostname;
|
||||||
|
|
||||||
|
# Get commit hash (handles both clean and dirty trees)
|
||||||
|
shortRev = self.shortRev or self.dirtyShortRev or "unknown";
|
||||||
|
|
||||||
|
# Format timestamp from lastModified (Unix timestamp)
|
||||||
|
# lastModifiedDate is in format "YYYYMMDDHHMMSS"
|
||||||
|
dateStr = self.sourceInfo.lastModifiedDate or "unknown";
|
||||||
|
formattedDate = if dateStr != "unknown" then
|
||||||
|
"${builtins.substring 0 4 dateStr}-${builtins.substring 4 2 dateStr}-${builtins.substring 6 2 dateStr} ${builtins.substring 8 2 dateStr}:${builtins.substring 10 2 dateStr} UTC"
|
||||||
|
else
|
||||||
|
"unknown";
|
||||||
|
|
||||||
|
banner = ''
|
||||||
|
####################################
|
||||||
|
${fqdn}
|
||||||
|
Commit: ${shortRev} (${formattedDate})
|
||||||
|
####################################
|
||||||
|
'';
|
||||||
|
in
|
||||||
|
{
|
||||||
|
users.motd = lib.mkDefault banner;
|
||||||
|
}
|
||||||
@@ -1,5 +1,25 @@
|
|||||||
{ lib, ... }:
|
{ lib, pkgs, ... }:
|
||||||
|
let
|
||||||
|
nixos-rebuild-test = pkgs.writeShellApplication {
|
||||||
|
name = "nixos-rebuild-test";
|
||||||
|
runtimeInputs = [ pkgs.nixos-rebuild ];
|
||||||
|
text = ''
|
||||||
|
if [ $# -lt 2 ]; then
|
||||||
|
echo "Usage: nixos-rebuild-test <action> <branch>"
|
||||||
|
echo "Example: nixos-rebuild-test boot my-feature-branch"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
action="$1"
|
||||||
|
branch="$2"
|
||||||
|
shift 2
|
||||||
|
|
||||||
|
exec nixos-rebuild "$action" --flake "git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$branch" "$@"
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
in
|
||||||
{
|
{
|
||||||
|
environment.systemPackages = [ nixos-rebuild-test ];
|
||||||
nix = {
|
nix = {
|
||||||
gc = {
|
gc = {
|
||||||
automatic = true;
|
automatic = true;
|
||||||
|
|||||||
@@ -1,11 +1,10 @@
|
|||||||
{ pkgs, config, ... }: {
|
{ pkgs, config, ... }:
|
||||||
|
{
|
||||||
programs.zsh.enable = true;
|
programs.zsh.enable = true;
|
||||||
sops.secrets.root_password_hash = { };
|
|
||||||
sops.secrets.root_password_hash.neededForUsers = true;
|
|
||||||
|
|
||||||
users.users.root = {
|
users.users.root = {
|
||||||
shell = pkgs.zsh;
|
shell = pkgs.zsh;
|
||||||
hashedPasswordFile = config.sops.secrets.root_password_hash.path;
|
hashedPassword = "$y$j9T$N09APWqKc4//z9BoGyzSb0$3dMUzojSmo3/10nbIfShd6/IpaYoKdI21bfbWER3jl8";
|
||||||
openssh.authorizedKeys.keys = [
|
openssh.authorizedKeys.keys = [
|
||||||
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
|
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAwfb2jpKrBnCw28aevnH8HbE5YbcMXpdaVv2KmueDu6 torjus@gunter"
|
||||||
];
|
];
|
||||||
|
|||||||
@@ -8,6 +8,48 @@ let
|
|||||||
# Import vault-fetch package
|
# Import vault-fetch package
|
||||||
vault-fetch = pkgs.callPackage ../scripts/vault-fetch { };
|
vault-fetch = pkgs.callPackage ../scripts/vault-fetch { };
|
||||||
|
|
||||||
|
# Helper to create fetch scripts using writeShellApplication
|
||||||
|
mkFetchScript = name: secretCfg: pkgs.writeShellApplication {
|
||||||
|
name = "fetch-${name}";
|
||||||
|
runtimeInputs = [ vault-fetch ];
|
||||||
|
text = ''
|
||||||
|
# Set Vault environment variables
|
||||||
|
export VAULT_ADDR="${cfg.vaultAddress}"
|
||||||
|
export VAULT_SKIP_VERIFY="${if cfg.skipTlsVerify then "1" else "0"}"
|
||||||
|
'' + (if secretCfg.extractKey != null then ''
|
||||||
|
# Fetch to temporary directory, then extract single key
|
||||||
|
TMPDIR=$(mktemp -d)
|
||||||
|
trap 'rm -rf $TMPDIR' EXIT
|
||||||
|
|
||||||
|
vault-fetch \
|
||||||
|
"${secretCfg.secretPath}" \
|
||||||
|
"$TMPDIR" \
|
||||||
|
"${secretCfg.cacheDir}"
|
||||||
|
|
||||||
|
# Extract the specified key and write as a single file
|
||||||
|
if [ ! -f "$TMPDIR/${secretCfg.extractKey}" ]; then
|
||||||
|
echo "ERROR: Key '${secretCfg.extractKey}' not found in secret" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Ensure parent directory exists
|
||||||
|
mkdir -p "$(dirname "${secretCfg.outputDir}")"
|
||||||
|
cp "$TMPDIR/${secretCfg.extractKey}" "${secretCfg.outputDir}"
|
||||||
|
chown ${secretCfg.owner}:${secretCfg.group} "${secretCfg.outputDir}"
|
||||||
|
chmod ${secretCfg.mode} "${secretCfg.outputDir}"
|
||||||
|
'' else ''
|
||||||
|
# Fetch secret as directory of files
|
||||||
|
vault-fetch \
|
||||||
|
"${secretCfg.secretPath}" \
|
||||||
|
"${secretCfg.outputDir}" \
|
||||||
|
"${secretCfg.cacheDir}"
|
||||||
|
|
||||||
|
# Set ownership and permissions
|
||||||
|
chown -R ${secretCfg.owner}:${secretCfg.group} "${secretCfg.outputDir}"
|
||||||
|
chmod ${secretCfg.mode} "${secretCfg.outputDir}"/*
|
||||||
|
'');
|
||||||
|
};
|
||||||
|
|
||||||
# Secret configuration type
|
# Secret configuration type
|
||||||
secretType = types.submodule ({ name, config, ... }: {
|
secretType = types.submodule ({ name, config, ... }: {
|
||||||
options = {
|
options = {
|
||||||
@@ -73,6 +115,16 @@ let
|
|||||||
'';
|
'';
|
||||||
};
|
};
|
||||||
|
|
||||||
|
extractKey = mkOption {
|
||||||
|
type = types.nullOr types.str;
|
||||||
|
default = null;
|
||||||
|
description = ''
|
||||||
|
Extract a single key from the vault secret JSON and write it as a
|
||||||
|
plain file instead of a directory of files. When set, outputDir
|
||||||
|
becomes a file path rather than a directory path.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
services = mkOption {
|
services = mkOption {
|
||||||
type = types.listOf types.str;
|
type = types.listOf types.str;
|
||||||
default = [];
|
default = [];
|
||||||
@@ -152,23 +204,7 @@ in
|
|||||||
RemainAfterExit = true;
|
RemainAfterExit = true;
|
||||||
|
|
||||||
# Fetch the secret
|
# Fetch the secret
|
||||||
ExecStart = pkgs.writeShellScript "fetch-${name}" ''
|
ExecStart = lib.getExe (mkFetchScript name secretCfg);
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
# Set Vault environment variables
|
|
||||||
export VAULT_ADDR="${cfg.vaultAddress}"
|
|
||||||
export VAULT_SKIP_VERIFY="${if cfg.skipTlsVerify then "1" else "0"}"
|
|
||||||
|
|
||||||
# Fetch secret using vault-fetch
|
|
||||||
${vault-fetch}/bin/vault-fetch \
|
|
||||||
"${secretCfg.secretPath}" \
|
|
||||||
"${secretCfg.outputDir}" \
|
|
||||||
"${secretCfg.cacheDir}"
|
|
||||||
|
|
||||||
# Set ownership and permissions
|
|
||||||
chown -R ${secretCfg.owner}:${secretCfg.group} "${secretCfg.outputDir}"
|
|
||||||
chmod ${secretCfg.mode} "${secretCfg.outputDir}"/*
|
|
||||||
'';
|
|
||||||
|
|
||||||
# Logging
|
# Logging
|
||||||
StandardOutput = "journal";
|
StandardOutput = "journal";
|
||||||
@@ -216,7 +252,10 @@ in
|
|||||||
[ "d /run/secrets 0755 root root -" ] ++
|
[ "d /run/secrets 0755 root root -" ] ++
|
||||||
[ "d /var/lib/vault/cache 0700 root root -" ] ++
|
[ "d /var/lib/vault/cache 0700 root root -" ] ++
|
||||||
flatten (mapAttrsToList (name: secretCfg: [
|
flatten (mapAttrsToList (name: secretCfg: [
|
||||||
"d ${secretCfg.outputDir} 0755 root root -"
|
# When extractKey is set, outputDir is a file path - create parent dir instead
|
||||||
|
(if secretCfg.extractKey != null
|
||||||
|
then "d ${dirOf secretCfg.outputDir} 0755 root root -"
|
||||||
|
else "d ${secretCfg.outputDir} 0755 root root -")
|
||||||
"d ${secretCfg.cacheDir} 0700 root root -"
|
"d ${secretCfg.cacheDir} 0700 root root -"
|
||||||
]) cfg.secrets);
|
]) cfg.secrets);
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -4,6 +4,17 @@ resource "vault_auth_backend" "approle" {
|
|||||||
path = "approle"
|
path = "approle"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# Shared policy for homelab-deploy (all hosts need this for NATS-based deployments)
|
||||||
|
resource "vault_policy" "homelab_deploy" {
|
||||||
|
name = "homelab-deploy"
|
||||||
|
|
||||||
|
policy = <<EOT
|
||||||
|
path "secret/data/shared/homelab-deploy/*" {
|
||||||
|
capabilities = ["read", "list"]
|
||||||
|
}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
|
||||||
# Define host access policies
|
# Define host access policies
|
||||||
locals {
|
locals {
|
||||||
host_policies = {
|
host_policies = {
|
||||||
@@ -15,6 +26,7 @@ locals {
|
|||||||
# "secret/data/services/grafana/*",
|
# "secret/data/services/grafana/*",
|
||||||
# "secret/data/shared/smtp/*"
|
# "secret/data/shared/smtp/*"
|
||||||
# ]
|
# ]
|
||||||
|
# extra_policies = ["some-other-policy"] # Optional: additional policies
|
||||||
# }
|
# }
|
||||||
|
|
||||||
# Example: ha1 host
|
# Example: ha1 host
|
||||||
@@ -25,19 +37,70 @@ locals {
|
|||||||
# ]
|
# ]
|
||||||
# }
|
# }
|
||||||
|
|
||||||
# TODO: actually use this policy
|
|
||||||
"ha1" = {
|
"ha1" = {
|
||||||
paths = [
|
paths = [
|
||||||
"secret/data/hosts/ha1/*",
|
"secret/data/hosts/ha1/*",
|
||||||
|
"secret/data/shared/backup/*",
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
# TODO: actually use this policy
|
|
||||||
"monitoring01" = {
|
"monitoring01" = {
|
||||||
paths = [
|
paths = [
|
||||||
"secret/data/hosts/monitoring01/*",
|
"secret/data/hosts/monitoring01/*",
|
||||||
|
"secret/data/shared/backup/*",
|
||||||
|
"secret/data/shared/nats/*",
|
||||||
|
]
|
||||||
|
extra_policies = ["prometheus-metrics"]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Wave 1: hosts with no service secrets (only need vault.enable for future use)
|
||||||
|
"nats1" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/nats1/*",
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
"jelly01" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/jelly01/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
"pgdb1" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/pgdb1/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Wave 3: DNS servers
|
||||||
|
"ns1" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/ns1/*",
|
||||||
|
"secret/data/shared/dns/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
"ns2" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/ns2/*",
|
||||||
|
"secret/data/shared/dns/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Wave 4: http-proxy
|
||||||
|
"http-proxy" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/http-proxy/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Wave 5: nix-cache01
|
||||||
|
"nix-cache01" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/nix-cache01/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -62,7 +125,10 @@ resource "vault_approle_auth_backend_role" "hosts" {
|
|||||||
|
|
||||||
backend = vault_auth_backend.approle.path
|
backend = vault_auth_backend.approle.path
|
||||||
role_name = each.key
|
role_name = each.key
|
||||||
token_policies = ["${each.key}-policy"]
|
token_policies = concat(
|
||||||
|
["${each.key}-policy", "homelab-deploy"],
|
||||||
|
lookup(each.value, "extra_policies", [])
|
||||||
|
)
|
||||||
|
|
||||||
# Token configuration
|
# Token configuration
|
||||||
token_ttl = 3600 # 1 hour
|
token_ttl = 3600 # 1 hour
|
||||||
|
|||||||
@@ -5,9 +5,19 @@
|
|||||||
# Each host gets access to its own secrets under hosts/<hostname>/*
|
# Each host gets access to its own secrets under hosts/<hostname>/*
|
||||||
locals {
|
locals {
|
||||||
generated_host_policies = {
|
generated_host_policies = {
|
||||||
"vaulttest01" = {
|
"testvm01" = {
|
||||||
paths = [
|
paths = [
|
||||||
"secret/data/hosts/vaulttest01/*",
|
"secret/data/hosts/testvm01/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
"testvm02" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/testvm02/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
"testvm03" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/testvm03/*",
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -40,7 +50,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
|
|||||||
|
|
||||||
backend = vault_auth_backend.approle.path
|
backend = vault_auth_backend.approle.path
|
||||||
role_name = each.key
|
role_name = each.key
|
||||||
token_policies = ["host-${each.key}"]
|
token_policies = ["host-${each.key}", "homelab-deploy"]
|
||||||
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
||||||
token_ttl = 3600
|
token_ttl = 3600
|
||||||
token_max_ttl = 3600
|
token_max_ttl = 3600
|
||||||
|
|||||||
10
terraform/vault/policies.tf
Normal file
10
terraform/vault/policies.tf
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# Generic policies for services (not host-specific)
|
||||||
|
|
||||||
|
resource "vault_policy" "prometheus_metrics" {
|
||||||
|
name = "prometheus-metrics"
|
||||||
|
policy = <<EOT
|
||||||
|
path "sys/metrics" {
|
||||||
|
capabilities = ["read"]
|
||||||
|
}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
@@ -35,22 +35,73 @@ locals {
|
|||||||
# }
|
# }
|
||||||
# }
|
# }
|
||||||
|
|
||||||
# TODO: actually use the secret
|
|
||||||
"hosts/monitoring01/grafana-admin" = {
|
"hosts/monitoring01/grafana-admin" = {
|
||||||
auto_generate = true
|
auto_generate = true
|
||||||
password_length = 32
|
password_length = 32
|
||||||
}
|
}
|
||||||
|
|
||||||
# TODO: actually use the secret
|
|
||||||
"hosts/ha1/mqtt-password" = {
|
"hosts/ha1/mqtt-password" = {
|
||||||
auto_generate = true
|
auto_generate = true
|
||||||
password_length = 24
|
password_length = 24
|
||||||
}
|
}
|
||||||
# TODO: Remove after testing
|
|
||||||
"hosts/vaulttest01/test-service" = {
|
# Shared backup password (auto-generated, add alongside existing restic key)
|
||||||
|
"shared/backup/password" = {
|
||||||
auto_generate = true
|
auto_generate = true
|
||||||
password_length = 32
|
password_length = 32
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# NATS NKey for alerttonotify
|
||||||
|
"shared/nats/nkey" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { nkey = var.nats_nkey }
|
||||||
|
}
|
||||||
|
|
||||||
|
# PVE exporter config for monitoring01
|
||||||
|
"hosts/monitoring01/pve-exporter" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { config = var.pve_exporter_config }
|
||||||
|
}
|
||||||
|
|
||||||
|
# DNS zone transfer key
|
||||||
|
"shared/dns/xfer-key" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { key = var.ns_xfer_key }
|
||||||
|
}
|
||||||
|
|
||||||
|
# WireGuard private key for http-proxy
|
||||||
|
"hosts/http-proxy/wireguard" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { private_key = var.wireguard_private_key }
|
||||||
|
}
|
||||||
|
|
||||||
|
# Nix cache signing key
|
||||||
|
"hosts/nix-cache01/cache-secret" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { key = var.cache_signing_key }
|
||||||
|
}
|
||||||
|
|
||||||
|
# Gitea Actions runner token
|
||||||
|
"hosts/nix-cache01/actions-token" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { token = var.actions_token_1 }
|
||||||
|
}
|
||||||
|
|
||||||
|
# Homelab-deploy NKeys
|
||||||
|
"shared/homelab-deploy/listener-nkey" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { nkey = var.homelab_deploy_listener_nkey }
|
||||||
|
}
|
||||||
|
|
||||||
|
"shared/homelab-deploy/test-deployer-nkey" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { nkey = var.homelab_deploy_test_deployer_nkey }
|
||||||
|
}
|
||||||
|
|
||||||
|
"shared/homelab-deploy/admin-deployer-nkey" = {
|
||||||
|
auto_generate = false
|
||||||
|
data = { nkey = var.homelab_deploy_admin_deployer_nkey }
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -16,11 +16,60 @@ variable "vault_skip_tls_verify" {
|
|||||||
default = true
|
default = true
|
||||||
}
|
}
|
||||||
|
|
||||||
# Example variables for manual secrets
|
variable "nats_nkey" {
|
||||||
# Uncomment and add to terraform.tfvars as needed
|
description = "NATS NKey for alerttonotify"
|
||||||
|
type = string
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "pve_exporter_config" {
|
||||||
|
description = "PVE exporter YAML configuration"
|
||||||
|
type = string
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "ns_xfer_key" {
|
||||||
|
description = "DNS zone transfer TSIG key"
|
||||||
|
type = string
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "wireguard_private_key" {
|
||||||
|
description = "WireGuard private key for http-proxy"
|
||||||
|
type = string
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "cache_signing_key" {
|
||||||
|
description = "Nix binary cache signing key"
|
||||||
|
type = string
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "actions_token_1" {
|
||||||
|
description = "Gitea Actions runner token"
|
||||||
|
type = string
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "homelab_deploy_listener_nkey" {
|
||||||
|
description = "NKey seed for homelab-deploy listeners"
|
||||||
|
type = string
|
||||||
|
default = "PLACEHOLDER"
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "homelab_deploy_test_deployer_nkey" {
|
||||||
|
description = "NKey seed for test-tier deployer"
|
||||||
|
type = string
|
||||||
|
default = "PLACEHOLDER"
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "homelab_deploy_admin_deployer_nkey" {
|
||||||
|
description = "NKey seed for admin deployer"
|
||||||
|
type = string
|
||||||
|
default = "PLACEHOLDER"
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
# variable "smtp_password" {
|
|
||||||
# description = "SMTP password for notifications"
|
|
||||||
# type = string
|
|
||||||
# sensitive = true
|
|
||||||
# }
|
|
||||||
|
|||||||
@@ -31,13 +31,6 @@ locals {
|
|||||||
# Example Minimal VM using all defaults (uncomment to deploy):
|
# Example Minimal VM using all defaults (uncomment to deploy):
|
||||||
# "minimal-vm" = {}
|
# "minimal-vm" = {}
|
||||||
# "bootstrap-verify-test" = {}
|
# "bootstrap-verify-test" = {}
|
||||||
"testvm01" = {
|
|
||||||
ip = "10.69.13.101/24"
|
|
||||||
cpu_cores = 2
|
|
||||||
memory = 2048
|
|
||||||
disk_size = "20G"
|
|
||||||
flake_branch = "pipeline-testing-improvements"
|
|
||||||
}
|
|
||||||
"vault01" = {
|
"vault01" = {
|
||||||
ip = "10.69.13.19/24"
|
ip = "10.69.13.19/24"
|
||||||
cpu_cores = 2
|
cpu_cores = 2
|
||||||
@@ -45,13 +38,29 @@ locals {
|
|||||||
disk_size = "20G"
|
disk_size = "20G"
|
||||||
flake_branch = "vault-setup" # Bootstrap from this branch instead of master
|
flake_branch = "vault-setup" # Bootstrap from this branch instead of master
|
||||||
}
|
}
|
||||||
"vaulttest01" = {
|
"testvm01" = {
|
||||||
ip = "10.69.13.150/24"
|
ip = "10.69.13.20/24"
|
||||||
cpu_cores = 2
|
cpu_cores = 2
|
||||||
memory = 2048
|
memory = 2048
|
||||||
disk_size = "20G"
|
disk_size = "20G"
|
||||||
flake_branch = "pki-migration"
|
flake_branch = "deploy-test-hosts"
|
||||||
vault_wrapped_token = "s.UCpQCOp7cOKDdtGGBvfRWwAt"
|
vault_wrapped_token = "s.YRGRpAZVVtSYEa3wOYOqFmjt"
|
||||||
|
}
|
||||||
|
"testvm02" = {
|
||||||
|
ip = "10.69.13.21/24"
|
||||||
|
cpu_cores = 2
|
||||||
|
memory = 2048
|
||||||
|
disk_size = "20G"
|
||||||
|
flake_branch = "deploy-test-hosts"
|
||||||
|
vault_wrapped_token = "s.tvs8yhJOkLjBs548STs6DBw7"
|
||||||
|
}
|
||||||
|
"testvm03" = {
|
||||||
|
ip = "10.69.13.22/24"
|
||||||
|
cpu_cores = 2
|
||||||
|
memory = 2048
|
||||||
|
disk_size = "20G"
|
||||||
|
flake_branch = "deploy-test-hosts"
|
||||||
|
vault_wrapped_token = "s.sQ80FZGeG3z6jgrsuh74IopC"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user