Compare commits
169 Commits
homelab-de
...
d485948df0
| Author | SHA1 | Date | |
|---|---|---|---|
|
d485948df0
|
|||
|
7b804450a3
|
|||
|
2f0dad1acc
|
|||
|
1544415ef3
|
|||
|
5babd7f507
|
|||
|
7e0c5fbf0f
|
|||
|
ffaf95d109
|
|||
|
b2b6ab4799
|
|||
|
5d3d93b280
|
|||
|
ae823e439d
|
|||
|
0d9f49a3b4
|
|||
|
08d9e1ec3f
|
|||
|
fa8d65b612
|
|||
|
6726f111e3
|
|||
| 3a083285cb | |||
|
ed1821b073
|
|||
|
fa4a418007
|
|||
| 963e5f6d3c | |||
|
0bc10cb1fe
|
|||
|
b03e2e8ee4
|
|||
|
ddcbc30665
|
|||
|
75210805d5
|
|||
|
ade0538717
|
|||
|
83fce5f927
|
|||
|
afff3f28ca
|
|||
|
49f7e3ae2e
|
|||
|
751edfc11d
|
|||
|
98a7301985
|
|||
| 34efa58cfe | |||
|
5bfb51a497
|
|||
|
f83145d97a
|
|||
|
47747329c4
|
|||
|
2d9ca2a73f
|
|||
|
98ea679ef2
|
|||
|
b709c0b703
|
|||
|
33c5d5b3f0
|
|||
|
0a28c5f495
|
|||
|
9bd48e0808
|
|||
|
1460eea700
|
|||
|
98c4f54f94
|
|||
|
d1b0a5dc20
|
|||
|
4d32707130
|
|||
|
8e1753c2c8
|
|||
|
75e4fb61a5
|
|||
|
2be213e454
|
|||
|
12c252653b
|
|||
|
6493338c4c
|
|||
|
6e08ba9720
|
|||
|
7ff3d2a09b
|
|||
|
e85f15b73d
|
|||
|
2f5a2a4bf1
|
|||
|
287141c623
|
|||
|
9ed11b712f
|
|||
|
ffad2dd205
|
|||
|
ed7d2aa727
|
|||
|
bf7a025364
|
|||
| 4ae99dbc89 | |||
|
5c142b1323
|
|||
|
4091e51f41
|
|||
|
a8e558a6b7
|
|||
|
4efc798c38
|
|||
|
016f8c9119
|
|||
| fec2a261ab | |||
|
60c04a2052
|
|||
|
39e3f37263
|
|||
| a2d93baba8 | |||
|
f66dfc753c
|
|||
| 79a6a72719 | |||
|
89d0a6f358
|
|||
|
03ebee4d82
|
|||
|
05630eb4d4
|
|||
|
1e52eec02a
|
|||
|
d333aa0164
|
|||
|
a5d5827dcc
|
|||
|
1c13ec12a4
|
|||
|
4bf0eeeadb
|
|||
| 304cb117ce | |||
|
02270a0e4a
|
|||
|
030e8518c5
|
|||
|
9ffdd4f862
|
|||
|
0b977808ca
|
|||
|
8786113f8f
|
|||
|
fdb2c31f84
|
|||
|
78eb04205f
|
|||
| 19cb61ebbc | |||
|
9ed09c9a9c
|
|||
|
b31c64f1b9
|
|||
|
54b6e37420
|
|||
|
b845a8bb8b
|
|||
|
bfbf0cea68
|
|||
|
3abe5e83a7
|
|||
|
67c27555f3
|
|||
|
1674b6a844
|
|||
|
311be282b6
|
|||
|
11cbb64097
|
|||
|
e2dd21c994
|
|||
|
463342133e
|
|||
|
de36b9d016
|
|||
|
3f1d966919
|
|||
|
7fcc043a4d
|
|||
|
70ec5f8109
|
|||
|
c2ec34cab9
|
|||
|
8fbf1224fa
|
|||
|
8959829f77
|
|||
|
93dbb45802
|
|||
|
538c2ad097
|
|||
|
d99c82c74c
|
|||
|
ca0e3fd629
|
|||
|
732e9b8c22
|
|||
|
3a14ffd6b5
|
|||
|
f9a3961457
|
|||
|
003d4ccf03
|
|||
|
735b8a9ee3
|
|||
|
94feae82a0
|
|||
|
3f94f7ee95
|
|||
|
b7e398c9a7
|
|||
|
8ec2a083bd
|
|||
|
ec4ac1477e
|
|||
|
e937c68965
|
|||
|
98e808cd6c
|
|||
|
ba9f47f914
|
|||
|
1066e81ba8
|
|||
|
f0950b33de
|
|||
|
bf199bd7c6
|
|||
| 4e8ecb8a99 | |||
|
38c104ea8c
|
|||
|
536daee4c7
|
|||
| 4c1debf0a3 | |||
|
f36457ee0d
|
|||
|
aedccbd9a0
|
|||
|
bdc6057689
|
|||
| 3a25e3f7bc | |||
|
46f03871f1
|
|||
|
9d019f2b9a
|
|||
|
21db7e9573
|
|||
|
979040aaf7
|
|||
|
8791c29402
|
|||
|
c7a067d7b3
|
|||
|
c518093578
|
|||
| 0b462f0a96 | |||
|
116abf3bec
|
|||
|
b794aa89db
|
|||
|
50a85daa44
|
|||
|
23e561cf49
|
|||
|
7d291f85bf
|
|||
|
2a842c655a
|
|||
|
1f4a5571dc
|
|||
| 13d6d0ea3a | |||
|
eea000b337
|
|||
|
f19ba2f4b6
|
|||
|
a90d9c33d5
|
|||
|
09c9df1bbe
|
|||
|
ae3039af19
|
|||
|
11261c4636
|
|||
|
4ca3c8890f
|
|||
|
78e8d7a600
|
|||
|
0cf72ec191
|
|||
|
6a3a51407e
|
|||
|
a1ae766eb8
|
|||
|
11999b37f3
|
|||
|
29b2b7db52
|
|||
|
b046a1b862
|
|||
|
38348c5980
|
|||
|
370cf2b03a
|
|||
|
7bc465b414
|
|||
|
8d7bc50108
|
|||
|
03e70ac094
|
|||
|
3b32c9479f
|
|||
|
b0d35f9a99
|
180
.claude/agents/auditor.md
Normal file
180
.claude/agents/auditor.md
Normal file
@@ -0,0 +1,180 @@
|
|||||||
|
---
|
||||||
|
name: auditor
|
||||||
|
description: Analyzes audit logs to investigate user activity, command execution, and suspicious behavior on hosts. Can be used standalone for security reviews or called by other agents for behavioral context.
|
||||||
|
tools: Read, Grep, Glob
|
||||||
|
mcpServers:
|
||||||
|
- lab-monitoring
|
||||||
|
---
|
||||||
|
|
||||||
|
You are a security auditor for a NixOS homelab infrastructure. Your task is to analyze audit logs and reconstruct user activity on hosts.
|
||||||
|
|
||||||
|
## Input
|
||||||
|
|
||||||
|
You may receive:
|
||||||
|
- A host or list of hosts to investigate
|
||||||
|
- A time window (e.g., "last hour", "today", "between 14:00 and 15:00")
|
||||||
|
- Optional context: specific events to look for, user to focus on, or suspicious activity to investigate
|
||||||
|
- Optional context from a parent investigation (e.g., "a service stopped at 14:32, what happened around that time?")
|
||||||
|
|
||||||
|
## Audit Log Structure
|
||||||
|
|
||||||
|
Logs are shipped to Loki via promtail. Audit events use these labels:
|
||||||
|
- `hostname` - hostname
|
||||||
|
- `systemd_unit` - typically `auditd.service` for audit logs
|
||||||
|
- `job` - typically `systemd-journal`
|
||||||
|
|
||||||
|
Audit log entries contain structured data:
|
||||||
|
- `EXECVE` - command execution with full arguments
|
||||||
|
- `USER_LOGIN` / `USER_LOGOUT` - session start/end
|
||||||
|
- `USER_CMD` - sudo command execution
|
||||||
|
- `CRED_ACQ` / `CRED_DISP` - credential acquisition/disposal
|
||||||
|
- `SERVICE_START` / `SERVICE_STOP` - systemd service events
|
||||||
|
|
||||||
|
## Investigation Techniques
|
||||||
|
|
||||||
|
### 1. SSH Session Activity
|
||||||
|
|
||||||
|
Find SSH logins and session activity:
|
||||||
|
```logql
|
||||||
|
{hostname="<hostname>", systemd_unit="sshd.service"}
|
||||||
|
```
|
||||||
|
|
||||||
|
Look for:
|
||||||
|
- Accepted/Failed authentication
|
||||||
|
- Session opened/closed
|
||||||
|
- Unusual source IPs or users
|
||||||
|
|
||||||
|
### 2. Command Execution
|
||||||
|
|
||||||
|
Query executed commands (filter out noise):
|
||||||
|
```logql
|
||||||
|
{hostname="<hostname>"} |= "EXECVE" != "PATH item" != "PROCTITLE" != "SYSCALL" != "BPF"
|
||||||
|
```
|
||||||
|
|
||||||
|
Further filtering:
|
||||||
|
- Exclude systemd noise: `!= "systemd" != "/nix/store"`
|
||||||
|
- Focus on specific commands: `|= "rm" |= "-rf"`
|
||||||
|
- Focus on specific user: `|= "uid=1000"`
|
||||||
|
|
||||||
|
### 3. Sudo Activity
|
||||||
|
|
||||||
|
Check for privilege escalation:
|
||||||
|
```logql
|
||||||
|
{hostname="<hostname>"} |= "sudo" |= "COMMAND"
|
||||||
|
```
|
||||||
|
|
||||||
|
Or via audit:
|
||||||
|
```logql
|
||||||
|
{hostname="<hostname>"} |= "USER_CMD"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Service Manipulation
|
||||||
|
|
||||||
|
Check if services were manually stopped/started:
|
||||||
|
```logql
|
||||||
|
{hostname="<hostname>"} |= "EXECVE" |= "systemctl"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. File Operations
|
||||||
|
|
||||||
|
Look for file modifications (if auditd rules are configured):
|
||||||
|
```logql
|
||||||
|
{hostname="<hostname>"} |= "EXECVE" |= "vim"
|
||||||
|
{hostname="<hostname>"} |= "EXECVE" |= "nano"
|
||||||
|
{hostname="<hostname>"} |= "EXECVE" |= "rm"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Query Guidelines
|
||||||
|
|
||||||
|
**Start narrow, expand if needed:**
|
||||||
|
- Begin with `limit: 20-30`
|
||||||
|
- Use tight time windows: `start: "15m"` or `start: "30m"`
|
||||||
|
- Add filters progressively
|
||||||
|
|
||||||
|
**Avoid:**
|
||||||
|
- Querying all audit logs without EXECVE filter (extremely verbose)
|
||||||
|
- Large time ranges without specific filters
|
||||||
|
- Limits over 50 without tight filters
|
||||||
|
|
||||||
|
**Time-bounded queries:**
|
||||||
|
When investigating around a specific event:
|
||||||
|
```logql
|
||||||
|
{hostname="<hostname>"} |= "EXECVE" != "systemd"
|
||||||
|
```
|
||||||
|
With `start: "2026-02-08T14:30:00Z"` and `end: "2026-02-08T14:35:00Z"`
|
||||||
|
|
||||||
|
## Suspicious Patterns to Watch For
|
||||||
|
|
||||||
|
1. **Unusual login times** - Activity outside normal hours
|
||||||
|
2. **Failed authentication** - Brute force attempts
|
||||||
|
3. **Privilege escalation** - Unexpected sudo usage
|
||||||
|
4. **Reconnaissance commands** - `whoami`, `id`, `uname`, `cat /etc/passwd`
|
||||||
|
5. **Data exfiltration indicators** - `curl`, `wget`, `scp`, `rsync` to external destinations
|
||||||
|
6. **Persistence mechanisms** - Cron modifications, systemd service creation
|
||||||
|
7. **Log tampering** - Commands targeting log files
|
||||||
|
8. **Lateral movement** - SSH to other internal hosts
|
||||||
|
9. **Service manipulation** - Stopping security services, disabling firewalls
|
||||||
|
10. **Cleanup activity** - Deleting bash history, clearing logs
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
### For Standalone Security Reviews
|
||||||
|
|
||||||
|
```
|
||||||
|
## Activity Summary
|
||||||
|
|
||||||
|
**Host:** <hostname>
|
||||||
|
**Time Period:** <start> to <end>
|
||||||
|
**Sessions Found:** <count>
|
||||||
|
|
||||||
|
## User Sessions
|
||||||
|
|
||||||
|
### Session 1: <user> from <source_ip>
|
||||||
|
- **Login:** HH:MM:SSZ
|
||||||
|
- **Logout:** HH:MM:SSZ (or ongoing)
|
||||||
|
- **Commands executed:**
|
||||||
|
- HH:MM:SSZ - <command>
|
||||||
|
- HH:MM:SSZ - <command>
|
||||||
|
|
||||||
|
## Suspicious Activity
|
||||||
|
|
||||||
|
[If any patterns from the watch list were detected]
|
||||||
|
- **Finding:** <description>
|
||||||
|
- **Evidence:** <log entries>
|
||||||
|
- **Risk Level:** Low / Medium / High
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
[Overall assessment: normal activity, concerning patterns, or clear malicious activity]
|
||||||
|
```
|
||||||
|
|
||||||
|
### When Called by Another Agent
|
||||||
|
|
||||||
|
Provide a focused response addressing the specific question:
|
||||||
|
|
||||||
|
```
|
||||||
|
## Audit Findings
|
||||||
|
|
||||||
|
**Query:** <what was asked>
|
||||||
|
**Time Window:** <investigated period>
|
||||||
|
|
||||||
|
## Relevant Activity
|
||||||
|
|
||||||
|
[Chronological list of relevant events]
|
||||||
|
- HH:MM:SSZ - <event>
|
||||||
|
- HH:MM:SSZ - <event>
|
||||||
|
|
||||||
|
## Assessment
|
||||||
|
|
||||||
|
[Direct answer to the question with supporting evidence]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
- Reconstruct timelines chronologically
|
||||||
|
- Correlate events (login → commands → logout)
|
||||||
|
- Note gaps or missing data
|
||||||
|
- Distinguish between automated (systemd, cron) and interactive activity
|
||||||
|
- Consider the host's role and tier when assessing severity
|
||||||
|
- When called by another agent, focus on answering their specific question
|
||||||
|
- Don't speculate without evidence - state what the logs show and don't show
|
||||||
211
.claude/agents/investigate-alarm.md
Normal file
211
.claude/agents/investigate-alarm.md
Normal file
@@ -0,0 +1,211 @@
|
|||||||
|
---
|
||||||
|
name: investigate-alarm
|
||||||
|
description: Investigates a single system alarm by querying Prometheus metrics and Loki logs, analyzing configuration files for affected hosts/services, and providing root cause analysis.
|
||||||
|
tools: Read, Grep, Glob
|
||||||
|
mcpServers:
|
||||||
|
- lab-monitoring
|
||||||
|
- git-explorer
|
||||||
|
---
|
||||||
|
|
||||||
|
You are an alarm investigation specialist for a NixOS homelab infrastructure. Your task is to analyze a single alarm and determine its root cause.
|
||||||
|
|
||||||
|
## Input
|
||||||
|
|
||||||
|
You will receive information about an alarm, which may include:
|
||||||
|
- Alert name and severity
|
||||||
|
- Affected host or service
|
||||||
|
- Alert expression/threshold
|
||||||
|
- Current value or status
|
||||||
|
- When it started firing
|
||||||
|
|
||||||
|
## Investigation Process
|
||||||
|
|
||||||
|
### 1. Understand the Alert Context
|
||||||
|
|
||||||
|
Start by understanding what the alert is measuring:
|
||||||
|
- Use `get_alert` if you have a fingerprint, or `list_alerts` to find matching alerts
|
||||||
|
- Use `get_metric_metadata` to understand the metric being monitored
|
||||||
|
- Use `search_metrics` to find related metrics
|
||||||
|
|
||||||
|
### 2. Query Current State
|
||||||
|
|
||||||
|
Gather evidence about the current system state:
|
||||||
|
- Use `query` to check the current metric values and related metrics
|
||||||
|
- Use `list_targets` to verify the host/service is being scraped successfully
|
||||||
|
- Look for correlated metrics that might explain the issue
|
||||||
|
|
||||||
|
### 3. Check Service Logs
|
||||||
|
|
||||||
|
Search for relevant log entries using `query_logs`. Focus on service-specific logs and errors.
|
||||||
|
|
||||||
|
**Query strategies (start narrow, expand if needed):**
|
||||||
|
- Start with `limit: 20-30`, increase only if needed
|
||||||
|
- Use tight time windows: `start: "15m"` or `start: "30m"` initially
|
||||||
|
- Filter to specific services: `{hostname="<hostname>", systemd_unit="<service>.service"}`
|
||||||
|
- Search for errors: `{hostname="<hostname>"} |= "error"` or `|= "failed"`
|
||||||
|
|
||||||
|
**Common patterns:**
|
||||||
|
- Service logs: `{hostname="<hostname>", systemd_unit="<service>.service"}`
|
||||||
|
- All errors on host: `{hostname="<hostname>"} |= "error"`
|
||||||
|
- Journal for a unit: `{hostname="<hostname>", systemd_unit="nginx.service"} |= "failed"`
|
||||||
|
|
||||||
|
**Avoid:**
|
||||||
|
- Using `start: "1h"` with no filters on busy hosts
|
||||||
|
- Limits over 50 without specific filters
|
||||||
|
|
||||||
|
### 4. Investigate User Activity
|
||||||
|
|
||||||
|
For any analysis of user activity, **always spawn the `auditor` agent**. Do not query audit logs (EXECVE, USER_LOGIN, etc.) directly - delegate this to the auditor.
|
||||||
|
|
||||||
|
**Always call the auditor when:**
|
||||||
|
- A service stopped unexpectedly (may have been manually stopped)
|
||||||
|
- A process was killed or a config was changed
|
||||||
|
- You need to know who was logged in around the time of an incident
|
||||||
|
- You need to understand what commands led to the current state
|
||||||
|
- The cause isn't obvious from service logs alone
|
||||||
|
|
||||||
|
**Do NOT try to query audit logs yourself.** The auditor is specialized for:
|
||||||
|
- Parsing EXECVE records and reconstructing command lines
|
||||||
|
- Correlating SSH sessions with commands executed
|
||||||
|
- Identifying suspicious patterns
|
||||||
|
- Filtering out systemd/nix-store noise
|
||||||
|
|
||||||
|
**Example prompt for auditor:**
|
||||||
|
```
|
||||||
|
Investigate user activity on <hostname> between <start_time> and <end_time>.
|
||||||
|
Context: The prometheus-node-exporter service stopped at 14:32.
|
||||||
|
Determine if it was manually stopped and by whom.
|
||||||
|
```
|
||||||
|
|
||||||
|
Incorporate the auditor's findings into your timeline and root cause analysis.
|
||||||
|
|
||||||
|
### 5. Check Configuration (if relevant)
|
||||||
|
|
||||||
|
If the alert relates to a NixOS-managed service:
|
||||||
|
- Check host configuration in `/hosts/<hostname>/`
|
||||||
|
- Check service modules in `/services/<service>/`
|
||||||
|
- Look for thresholds, resource limits, or misconfigurations
|
||||||
|
- Check `homelab.host` options for tier/priority/role metadata
|
||||||
|
|
||||||
|
### 6. Check for Configuration Drift
|
||||||
|
|
||||||
|
Use the git-explorer MCP server to compare the host's deployed configuration against the current master branch. This helps identify:
|
||||||
|
- Hosts running outdated configurations
|
||||||
|
- Recent changes that might have caused the issue
|
||||||
|
- Whether a fix has already been committed but not deployed
|
||||||
|
|
||||||
|
**Step 1: Get the deployed revision from Prometheus**
|
||||||
|
```promql
|
||||||
|
nixos_flake_info{hostname="<hostname>"}
|
||||||
|
```
|
||||||
|
The `current_rev` label contains the deployed git commit hash.
|
||||||
|
|
||||||
|
**Step 2: Check if the host is behind master**
|
||||||
|
```
|
||||||
|
resolve_ref("master") # Get current master commit
|
||||||
|
is_ancestor(deployed, master) # Check if host is behind
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3: See what commits are missing**
|
||||||
|
```
|
||||||
|
commits_between(deployed, master) # List commits not yet deployed
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 4: Check which files changed**
|
||||||
|
```
|
||||||
|
get_diff_files(deployed, master) # Files modified since deployment
|
||||||
|
```
|
||||||
|
Look for files in `hosts/<hostname>/`, `services/<relevant-service>/`, or `system/` that affect this host.
|
||||||
|
|
||||||
|
**Step 5: View configuration at the deployed revision**
|
||||||
|
```
|
||||||
|
get_file_at_commit(deployed, "services/<service>/default.nix")
|
||||||
|
```
|
||||||
|
Compare against the current file to understand differences.
|
||||||
|
|
||||||
|
**Step 6: Find when something changed**
|
||||||
|
```
|
||||||
|
search_commits("<service-name>") # Find commits mentioning the service
|
||||||
|
get_commit_info(<hash>) # Get full details of a specific change
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example workflow for a service-related alert:**
|
||||||
|
1. Query `nixos_flake_info{hostname="monitoring01"}` → `current_rev: 8959829`
|
||||||
|
2. `resolve_ref("master")` → `4633421`
|
||||||
|
3. `is_ancestor("8959829", "4633421")` → Yes, host is behind
|
||||||
|
4. `commits_between("8959829", "4633421")` → 7 commits missing
|
||||||
|
5. `get_diff_files("8959829", "4633421")` → Check if relevant service files changed
|
||||||
|
6. If a fix was committed after the deployed rev, recommend deployment
|
||||||
|
|
||||||
|
### 7. Consider Common Causes
|
||||||
|
|
||||||
|
For infrastructure alerts, common causes include:
|
||||||
|
- **Manual intervention**: Service manually stopped/restarted (call auditor to confirm)
|
||||||
|
- **Configuration drift**: Host running outdated config, fix already in master
|
||||||
|
- **Disk space**: Nix store growth, logs, temp files
|
||||||
|
- **Memory pressure**: Service memory leaks, insufficient limits
|
||||||
|
- **CPU**: Runaway processes, build jobs
|
||||||
|
- **Network**: DNS issues, connectivity problems
|
||||||
|
- **Service restarts**: Failed upgrades, configuration errors
|
||||||
|
- **Scrape failures**: Service down, firewall issues, port changes
|
||||||
|
|
||||||
|
**Note:** If a service stopped unexpectedly and service logs don't show a crash or error, it was likely manual intervention - call the auditor to investigate.
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
Provide a concise report with one of two outcomes:
|
||||||
|
|
||||||
|
### If Root Cause Identified:
|
||||||
|
|
||||||
|
```
|
||||||
|
## Root Cause
|
||||||
|
[1-2 sentence summary of the root cause]
|
||||||
|
|
||||||
|
## Timeline
|
||||||
|
[Chronological sequence of relevant events leading to the alert]
|
||||||
|
- HH:MM:SSZ - [Event description]
|
||||||
|
- HH:MM:SSZ - [Event description]
|
||||||
|
- HH:MM:SSZ - [Alert fired]
|
||||||
|
|
||||||
|
### Timeline sources
|
||||||
|
- HH:MM:SSZ - [Source for information about this event. Which metric or log file]
|
||||||
|
- HH:MM:SSZ - [Source for information about this event. Which metric or log file]
|
||||||
|
- HH:MM:SSZ - [Alert fired]
|
||||||
|
|
||||||
|
|
||||||
|
## Evidence
|
||||||
|
- [Specific metric values or log entries that support the conclusion]
|
||||||
|
- [Configuration details if relevant]
|
||||||
|
|
||||||
|
|
||||||
|
## Recommended Actions
|
||||||
|
1. [Specific remediation step]
|
||||||
|
2. [Follow-up actions if any]
|
||||||
|
```
|
||||||
|
|
||||||
|
### If Root Cause Unclear:
|
||||||
|
|
||||||
|
```
|
||||||
|
## Investigation Summary
|
||||||
|
[What was checked and what was found]
|
||||||
|
|
||||||
|
## Possible Causes
|
||||||
|
- [Hypothesis 1 with supporting/contradicting evidence]
|
||||||
|
- [Hypothesis 2 with supporting/contradicting evidence]
|
||||||
|
|
||||||
|
## Additional Information Needed
|
||||||
|
- [Specific data, logs, or access that would help]
|
||||||
|
- [Suggested queries or checks for the operator]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
- Be concise and actionable
|
||||||
|
- Reference specific metric names and values as evidence
|
||||||
|
- Include log snippets when they're informative
|
||||||
|
- Don't speculate without evidence
|
||||||
|
- If the alert is a false positive or expected behavior, explain why
|
||||||
|
- Consider the host's tier (test vs prod) when assessing severity
|
||||||
|
- Build a timeline from log timestamps and metrics to show the sequence of events
|
||||||
|
- **Query logs incrementally**: start with narrow filters and small limits, expand only if needed
|
||||||
|
- **Always delegate to the auditor agent** for any user activity analysis - never query EXECVE or audit logs directly
|
||||||
@@ -30,11 +30,13 @@ Use the `lab-monitoring` MCP server tools:
|
|||||||
### Label Reference
|
### Label Reference
|
||||||
|
|
||||||
Available labels for log queries:
|
Available labels for log queries:
|
||||||
- `host` - Hostname (e.g., `ns1`, `monitoring01`, `ha1`)
|
- `hostname` - Hostname (e.g., `ns1`, `monitoring01`, `ha1`) - matches the Prometheus `hostname` label
|
||||||
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `nixos-upgrade.service`)
|
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `nixos-upgrade.service`)
|
||||||
- `job` - Either `systemd-journal` (most logs) or `varlog` (file-based logs)
|
- `job` - Either `systemd-journal` (most logs), `varlog` (file-based logs), or `bootstrap` (VM bootstrap logs)
|
||||||
- `filename` - For `varlog` job, the log file path
|
- `filename` - For `varlog` job, the log file path
|
||||||
- `hostname` - Alternative to `host` for some streams
|
- `tier` - Deployment tier (`test` or `prod`)
|
||||||
|
- `role` - Host role (e.g., `dns`, `vault`, `monitoring`) - matches the Prometheus `role` label
|
||||||
|
- `level` - Log level mapped from journal PRIORITY (`critical`, `error`, `warning`, `notice`, `info`, `debug`) - journal scrape only
|
||||||
|
|
||||||
### Log Format
|
### Log Format
|
||||||
|
|
||||||
@@ -47,12 +49,12 @@ Journal logs are JSON-formatted. Key fields:
|
|||||||
|
|
||||||
**Logs from a specific service on a host:**
|
**Logs from a specific service on a host:**
|
||||||
```logql
|
```logql
|
||||||
{host="ns1", systemd_unit="nsd.service"}
|
{hostname="ns1", systemd_unit="nsd.service"}
|
||||||
```
|
```
|
||||||
|
|
||||||
**All logs from a host:**
|
**All logs from a host:**
|
||||||
```logql
|
```logql
|
||||||
{host="monitoring01"}
|
{hostname="monitoring01"}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Logs from a service across all hosts:**
|
**Logs from a service across all hosts:**
|
||||||
@@ -62,12 +64,12 @@ Journal logs are JSON-formatted. Key fields:
|
|||||||
|
|
||||||
**Substring matching (case-sensitive):**
|
**Substring matching (case-sensitive):**
|
||||||
```logql
|
```logql
|
||||||
{host="ha1"} |= "error"
|
{hostname="ha1"} |= "error"
|
||||||
```
|
```
|
||||||
|
|
||||||
**Exclude pattern:**
|
**Exclude pattern:**
|
||||||
```logql
|
```logql
|
||||||
{host="ns1"} != "routine"
|
{hostname="ns1"} != "routine"
|
||||||
```
|
```
|
||||||
|
|
||||||
**Regex matching:**
|
**Regex matching:**
|
||||||
@@ -75,6 +77,20 @@ Journal logs are JSON-formatted. Key fields:
|
|||||||
{systemd_unit="prometheus.service"} |~ "scrape.*failed"
|
{systemd_unit="prometheus.service"} |~ "scrape.*failed"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Filter by level (journal scrape only):**
|
||||||
|
```logql
|
||||||
|
{level="error"} # All errors across the fleet
|
||||||
|
{level=~"critical|error", tier="prod"} # Prod errors and criticals
|
||||||
|
{hostname="ns1", level="warning"} # Warnings from a specific host
|
||||||
|
```
|
||||||
|
|
||||||
|
**Filter by tier/role:**
|
||||||
|
```logql
|
||||||
|
{tier="prod"} |= "error" # All errors on prod hosts
|
||||||
|
{role="dns"} # All DNS server logs
|
||||||
|
{tier="test", job="systemd-journal"} # Journal logs from test hosts
|
||||||
|
```
|
||||||
|
|
||||||
**File-based logs (caddy access logs, etc):**
|
**File-based logs (caddy access logs, etc):**
|
||||||
```logql
|
```logql
|
||||||
{job="varlog", hostname="nix-cache01"}
|
{job="varlog", hostname="nix-cache01"}
|
||||||
@@ -102,6 +118,36 @@ Useful systemd units for troubleshooting:
|
|||||||
- `sshd.service` - SSH daemon
|
- `sshd.service` - SSH daemon
|
||||||
- `nix-gc.service` - Nix garbage collection
|
- `nix-gc.service` - Nix garbage collection
|
||||||
|
|
||||||
|
### Bootstrap Logs
|
||||||
|
|
||||||
|
VMs provisioned from template2 send bootstrap progress directly to Loki via curl (before promtail is available). These logs use `job="bootstrap"` with additional labels:
|
||||||
|
|
||||||
|
- `hostname` - Target hostname
|
||||||
|
- `branch` - Git branch being deployed
|
||||||
|
- `stage` - Bootstrap stage (see table below)
|
||||||
|
|
||||||
|
**Bootstrap stages:**
|
||||||
|
|
||||||
|
| Stage | Message | Meaning |
|
||||||
|
|-------|---------|---------|
|
||||||
|
| `starting` | Bootstrap starting for \<host\> (branch: \<branch\>) | Bootstrap service has started |
|
||||||
|
| `network_ok` | Network connectivity confirmed | Can reach git server |
|
||||||
|
| `vault_ok` | Vault credentials unwrapped and stored | AppRole credentials provisioned |
|
||||||
|
| `vault_skip` | No Vault token provided - skipping credential setup | No wrapped token was provided |
|
||||||
|
| `vault_warn` | Failed to unwrap Vault token - continuing without secrets | Token unwrap failed (expired/used) |
|
||||||
|
| `building` | Starting nixos-rebuild boot | NixOS build starting |
|
||||||
|
| `success` | Build successful - rebooting into new configuration | Build complete, rebooting |
|
||||||
|
| `failed` | nixos-rebuild failed - manual intervention required | Build failed |
|
||||||
|
|
||||||
|
**Bootstrap queries:**
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{job="bootstrap"} # All bootstrap logs
|
||||||
|
{job="bootstrap", hostname="myhost"} # Specific host
|
||||||
|
{job="bootstrap", stage="failed"} # All failures
|
||||||
|
{job="bootstrap", stage=~"building|success"} # Track build progress
|
||||||
|
```
|
||||||
|
|
||||||
### Extracting JSON Fields
|
### Extracting JSON Fields
|
||||||
|
|
||||||
Parse JSON and filter on fields:
|
Parse JSON and filter on fields:
|
||||||
@@ -175,31 +221,95 @@ Disk space (root filesystem):
|
|||||||
node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
|
node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Service-Specific Metrics
|
### Prometheus Jobs
|
||||||
|
|
||||||
Common job names:
|
All available Prometheus job names:
|
||||||
- `node-exporter` - System metrics (all hosts)
|
|
||||||
- `nixos-exporter` - NixOS version/generation metrics
|
|
||||||
- `caddy` - Reverse proxy metrics
|
|
||||||
- `prometheus` / `loki` / `grafana` - Monitoring stack
|
|
||||||
- `home-assistant` - Home automation
|
|
||||||
- `step-ca` - Internal CA
|
|
||||||
|
|
||||||
### Instance Label Format
|
**System exporters (on all/most hosts):**
|
||||||
|
- `node-exporter` - System metrics (CPU, memory, disk, network)
|
||||||
|
- `nixos-exporter` - NixOS flake revision and generation info
|
||||||
|
- `systemd-exporter` - Systemd unit status metrics
|
||||||
|
- `homelab-deploy` - Deployment listener metrics
|
||||||
|
|
||||||
The `instance` label uses FQDN format:
|
**Service-specific exporters:**
|
||||||
|
- `caddy` - Reverse proxy metrics (http-proxy)
|
||||||
|
- `nix-cache_caddy` - Nix binary cache metrics
|
||||||
|
- `home-assistant` - Home automation metrics (ha1)
|
||||||
|
- `jellyfin` - Media server metrics (jelly01)
|
||||||
|
- `kanidm` - Authentication server metrics (kanidm01)
|
||||||
|
- `nats` - NATS messaging metrics (nats1)
|
||||||
|
- `openbao` - Secrets management metrics (vault01)
|
||||||
|
- `unbound` - DNS resolver metrics (ns1, ns2)
|
||||||
|
- `wireguard` - VPN tunnel metrics (http-proxy)
|
||||||
|
|
||||||
```
|
**Monitoring stack (localhost on monitoring01):**
|
||||||
<hostname>.home.2rjus.net:<port>
|
- `prometheus` - Prometheus self-metrics
|
||||||
```
|
- `loki` - Loki self-metrics
|
||||||
|
- `grafana` - Grafana self-metrics
|
||||||
|
- `alertmanager` - Alertmanager metrics
|
||||||
|
- `pushgateway` - Push-based metrics gateway
|
||||||
|
|
||||||
Example queries filtering by host:
|
**External/infrastructure:**
|
||||||
|
- `pve-exporter` - Proxmox hypervisor metrics
|
||||||
|
- `smartctl` - Disk SMART health (gunter)
|
||||||
|
- `restic_rest` - Backup server metrics
|
||||||
|
- `ghettoptt` - PTT service metrics (gunter)
|
||||||
|
|
||||||
|
### Target Labels
|
||||||
|
|
||||||
|
All scrape targets have these labels:
|
||||||
|
|
||||||
|
**Standard labels:**
|
||||||
|
- `instance` - Full target address (`<hostname>.home.2rjus.net:<port>`)
|
||||||
|
- `job` - Job name (e.g., `node-exporter`, `unbound`, `nixos-exporter`)
|
||||||
|
- `hostname` - Short hostname (e.g., `ns1`, `monitoring01`) - use this for host filtering
|
||||||
|
|
||||||
|
**Host metadata labels** (when configured in `homelab.host`):
|
||||||
|
- `role` - Host role (e.g., `dns`, `build-host`, `vault`)
|
||||||
|
- `tier` - Deployment tier (`test` for test VMs, absent for prod)
|
||||||
|
- `dns_role` - DNS-specific role (`primary` or `secondary` for ns1/ns2)
|
||||||
|
|
||||||
|
### Filtering by Host
|
||||||
|
|
||||||
|
Use the `hostname` label for easy host filtering across all jobs:
|
||||||
|
|
||||||
```promql
|
```promql
|
||||||
up{instance=~"monitoring01.*"}
|
{hostname="ns1"} # All metrics from ns1
|
||||||
node_load1{instance=~"ns1.*"}
|
node_load1{hostname="monitoring01"} # Specific metric by hostname
|
||||||
|
up{hostname="ha1"} # Check if ha1 is up
|
||||||
```
|
```
|
||||||
|
|
||||||
|
This is simpler than wildcarding the `instance` label:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
# Old way (still works but verbose)
|
||||||
|
up{instance=~"monitoring01.*"}
|
||||||
|
|
||||||
|
# New way (preferred)
|
||||||
|
up{hostname="monitoring01"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Filtering by Role/Tier
|
||||||
|
|
||||||
|
Filter hosts by their role or tier:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
up{role="dns"} # All DNS servers (ns1, ns2)
|
||||||
|
node_cpu_seconds_total{role="build-host"} # Build hosts only (nix-cache01)
|
||||||
|
up{tier="test"} # All test-tier VMs
|
||||||
|
up{dns_role="primary"} # Primary DNS only (ns1)
|
||||||
|
```
|
||||||
|
|
||||||
|
Current host labels:
|
||||||
|
| Host | Labels |
|
||||||
|
|------|--------|
|
||||||
|
| ns1 | `role=dns`, `dns_role=primary` |
|
||||||
|
| ns2 | `role=dns`, `dns_role=secondary` |
|
||||||
|
| nix-cache01 | `role=build-host` |
|
||||||
|
| vault01 | `role=vault` |
|
||||||
|
| kanidm01 | `role=auth`, `tier=test` |
|
||||||
|
| testvm01/02/03 | `tier=test` |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Troubleshooting Workflows
|
## Troubleshooting Workflows
|
||||||
@@ -212,11 +322,12 @@ node_load1{instance=~"ns1.*"}
|
|||||||
|
|
||||||
### Investigate Service Issues
|
### Investigate Service Issues
|
||||||
|
|
||||||
1. Check `up{job="<service>"}` for scrape failures
|
1. Check `up{job="<service>"}` or `up{hostname="<host>"}` for scrape failures
|
||||||
2. Use `list_targets` to see target health details
|
2. Use `list_targets` to see target health details
|
||||||
3. Query service logs: `{host="<host>", systemd_unit="<service>.service"}`
|
3. Query service logs: `{hostname="<host>", systemd_unit="<service>.service"}`
|
||||||
4. Search for errors: `{host="<host>"} |= "error"`
|
4. Search for errors: `{hostname="<host>"} |= "error"`
|
||||||
5. Check `list_alerts` for related alerts
|
5. Check `list_alerts` for related alerts
|
||||||
|
6. Use role filters for group issues: `up{role="dns"}` to check all DNS servers
|
||||||
|
|
||||||
### After Deploying Changes
|
### After Deploying Changes
|
||||||
|
|
||||||
@@ -225,10 +336,21 @@ node_load1{instance=~"ns1.*"}
|
|||||||
3. Check service logs for startup issues
|
3. Check service logs for startup issues
|
||||||
4. Check service metrics are being scraped
|
4. Check service metrics are being scraped
|
||||||
|
|
||||||
|
### Monitor VM Bootstrap
|
||||||
|
|
||||||
|
When provisioning new VMs, track bootstrap progress:
|
||||||
|
|
||||||
|
1. Watch bootstrap logs: `{job="bootstrap", hostname="<hostname>"}`
|
||||||
|
2. Check for failures: `{job="bootstrap", hostname="<hostname>", stage="failed"}`
|
||||||
|
3. After success, verify host appears in metrics: `up{hostname="<hostname>"}`
|
||||||
|
4. Check logs are flowing: `{hostname="<hostname>"}`
|
||||||
|
|
||||||
|
See [docs/host-creation.md](../../../docs/host-creation.md) for the full host creation pipeline.
|
||||||
|
|
||||||
### Debug SSH/Access Issues
|
### Debug SSH/Access Issues
|
||||||
|
|
||||||
```logql
|
```logql
|
||||||
{host="<host>", systemd_unit="sshd.service"}
|
{hostname="<host>", systemd_unit="sshd.service"}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Check Recent Upgrades
|
### Check Recent Upgrades
|
||||||
@@ -246,5 +368,6 @@ With `start: "24h"` to see last 24 hours of upgrades across all hosts.
|
|||||||
- Default scrape interval is 15s for most metrics targets
|
- Default scrape interval is 15s for most metrics targets
|
||||||
- Default log lookback is 1h - use `start` parameter for older logs
|
- Default log lookback is 1h - use `start` parameter for older logs
|
||||||
- Use `rate()` for counter metrics, direct queries for gauges
|
- Use `rate()` for counter metrics, direct queries for gauges
|
||||||
- The `instance` label includes the port, use regex matching (`=~`) for hostname-only filters
|
- Use the `hostname` label to filter metrics by host (simpler than regex on `instance`)
|
||||||
|
- Host metadata labels (`role`, `tier`, `dns_role`) are propagated to all scrape targets
|
||||||
- Log `MESSAGE` field contains the actual log content in JSON format
|
- Log `MESSAGE` field contains the actual log content in JSON format
|
||||||
|
|||||||
10
.mcp.json
10
.mcp.json
@@ -31,8 +31,16 @@
|
|||||||
"--",
|
"--",
|
||||||
"mcp",
|
"mcp",
|
||||||
"--nats-url", "nats://nats1.home.2rjus.net:4222",
|
"--nats-url", "nats://nats1.home.2rjus.net:4222",
|
||||||
"--nkey-file", "/home/torjus/.config/homelab-deploy/test-deployer.nkey"
|
"--nkey-file", "/home/torjus/.config/homelab-deploy/test-deployer.nkey",
|
||||||
|
"--enable-builds"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
"git-explorer": {
|
||||||
|
"command": "nix",
|
||||||
|
"args": ["run", "git+https://git.t-juice.club/torjus/labmcp#git-explorer", "--", "serve"],
|
||||||
|
"env": {
|
||||||
|
"GIT_REPO_PATH": "/home/torjus/git/nixos-servers"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
52
.sops.yaml
52
.sops.yaml
@@ -1,52 +0,0 @@
|
|||||||
keys:
|
|
||||||
- &admin_torjus age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
|
||||||
- &server_ns1 age1hz2lz4k050ru3shrk5j3zk3f8azxmrp54pktw5a7nzjml4saudesx6jsl0
|
|
||||||
- &server_ns2 age1w2q4gm2lrcgdzscq8du3ssyvk6qtzm4fcszc92z9ftclq23yyydqdga5um
|
|
||||||
- &server_ha1 age1d2w5zece9647qwyq4vas9qyqegg96xwmg6c86440a6eg4uj6dd2qrq0w3l
|
|
||||||
- &server_http-proxy age1gq8434ku0xekqmvnseeunv83e779cg03c06gwrusnymdsr3rpufqx6vr3m
|
|
||||||
- &server_ca age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk
|
|
||||||
- &server_monitoring01 age1vpns76ykll8jgdlu3h05cur4ew2t3k7u03kxdg8y6ypfhsfhq9fqyurjey
|
|
||||||
- &server_jelly01 age1hchvlf3apn8g8jq2743pw53sd6v6ay6xu6lqk0qufrjeccan9vzsc7hdfq
|
|
||||||
- &server_nix-cache01 age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
|
||||||
- &server_pgdb1 age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
|
||||||
- &server_nats1 age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
|
||||||
creation_rules:
|
|
||||||
- path_regex: secrets/[^/]+\.(yaml|json|env|ini)
|
|
||||||
key_groups:
|
|
||||||
- age:
|
|
||||||
- *admin_torjus
|
|
||||||
- *server_ns1
|
|
||||||
- *server_ns2
|
|
||||||
- *server_ha1
|
|
||||||
- *server_http-proxy
|
|
||||||
- *server_ca
|
|
||||||
- *server_monitoring01
|
|
||||||
- *server_jelly01
|
|
||||||
- *server_nix-cache01
|
|
||||||
- *server_pgdb1
|
|
||||||
- *server_nats1
|
|
||||||
- path_regex: secrets/ca/[^/]+\.(yaml|json|env|ini|)
|
|
||||||
key_groups:
|
|
||||||
- age:
|
|
||||||
- *admin_torjus
|
|
||||||
- *server_ca
|
|
||||||
- path_regex: secrets/monitoring01/[^/]+\.(yaml|json|env|ini)
|
|
||||||
key_groups:
|
|
||||||
- age:
|
|
||||||
- *admin_torjus
|
|
||||||
- *server_monitoring01
|
|
||||||
- path_regex: secrets/ca/keys/.+
|
|
||||||
key_groups:
|
|
||||||
- age:
|
|
||||||
- *admin_torjus
|
|
||||||
- *server_ca
|
|
||||||
- path_regex: secrets/nix-cache01/.+
|
|
||||||
key_groups:
|
|
||||||
- age:
|
|
||||||
- *admin_torjus
|
|
||||||
- *server_nix-cache01
|
|
||||||
- path_regex: secrets/http-proxy/.+
|
|
||||||
key_groups:
|
|
||||||
- age:
|
|
||||||
- *admin_torjus
|
|
||||||
- *server_http-proxy
|
|
||||||
256
CLAUDE.md
256
CLAUDE.md
@@ -35,6 +35,34 @@ nix build .#create-host
|
|||||||
|
|
||||||
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
Do not automatically deploy changes. Deployments are usually done by updating the master branch, and then triggering the auto update on the specific host.
|
||||||
|
|
||||||
|
### SSH Commands
|
||||||
|
|
||||||
|
Do not run SSH commands directly. If a command needs to be run on a remote host, provide the command to the user and ask them to run it manually.
|
||||||
|
|
||||||
|
### Sharing Command Output via Loki
|
||||||
|
|
||||||
|
All hosts have the `pipe-to-loki` script for sending command output or terminal sessions to Loki, allowing users to share output with Claude without copy-pasting.
|
||||||
|
|
||||||
|
**Pipe mode** - send command output:
|
||||||
|
```bash
|
||||||
|
command | pipe-to-loki # Auto-generated ID
|
||||||
|
command | pipe-to-loki --id my-test # Custom ID
|
||||||
|
```
|
||||||
|
|
||||||
|
**Session mode** - record interactive terminal session:
|
||||||
|
```bash
|
||||||
|
pipe-to-loki --record # Start recording, exit to send
|
||||||
|
pipe-to-loki --record --id my-session # With custom ID
|
||||||
|
```
|
||||||
|
|
||||||
|
The script prints the session ID which the user can share. Query results with:
|
||||||
|
```logql
|
||||||
|
{job="pipe-to-loki"} # All entries
|
||||||
|
{job="pipe-to-loki", id="my-test"} # Specific ID
|
||||||
|
{job="pipe-to-loki", hostname="testvm01"} # From specific host
|
||||||
|
{job="pipe-to-loki", type="session"} # Only sessions
|
||||||
|
```
|
||||||
|
|
||||||
### Testing Feature Branches on Hosts
|
### Testing Feature Branches on Hosts
|
||||||
|
|
||||||
All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging:
|
All hosts have the `nixos-rebuild-test` helper script for testing feature branches before merging:
|
||||||
@@ -61,25 +89,53 @@ Do not run `nix flake update`. Should only be done manually by user.
|
|||||||
### Development Environment
|
### Development Environment
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Enter development shell (provides ansible, python3)
|
# Enter development shell
|
||||||
nix develop
|
nix develop
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The devshell provides: `ansible`, `tofu` (OpenTofu), `bao` (OpenBao CLI), `create-host`, and `homelab-deploy`.
|
||||||
|
|
||||||
|
**Important:** When suggesting commands that use devshell tools, always use `nix develop -c <command>` syntax rather than assuming the user is already in a devshell. For example:
|
||||||
|
```bash
|
||||||
|
# Good - works regardless of current shell
|
||||||
|
nix develop -c tofu plan
|
||||||
|
|
||||||
|
# Avoid - requires user to be in devshell
|
||||||
|
tofu plan
|
||||||
|
```
|
||||||
|
|
||||||
|
**OpenTofu:** Use the `-chdir` option instead of `cd` when running tofu commands in subdirectories:
|
||||||
|
```bash
|
||||||
|
# Good - uses -chdir option
|
||||||
|
nix develop -c tofu -chdir=terraform plan
|
||||||
|
nix develop -c tofu -chdir=terraform/vault apply
|
||||||
|
|
||||||
|
# Avoid - changing directories
|
||||||
|
cd terraform && tofu plan
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ansible
|
||||||
|
|
||||||
|
Ansible configuration and playbooks are in `/ansible/`. See [ansible/README.md](ansible/README.md) for inventory groups, available playbooks, and usage examples.
|
||||||
|
|
||||||
|
The devshell sets `ANSIBLE_CONFIG` automatically, so no `-i` flag is needed.
|
||||||
|
|
||||||
### Secrets Management
|
### Secrets Management
|
||||||
|
|
||||||
Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the
|
Secrets are managed by OpenBao (Vault) using AppRole authentication. Most hosts use the
|
||||||
`vault.secrets` option defined in `system/vault-secrets.nix` to fetch secrets at boot.
|
`vault.secrets` option defined in `system/vault-secrets.nix` to fetch secrets at boot.
|
||||||
Terraform manages the secrets and AppRole policies in `terraform/vault/`.
|
Terraform manages the secrets and AppRole policies in `terraform/vault/`.
|
||||||
|
|
||||||
Legacy sops-nix is still present but only actively used by the `ca` host. Do not edit any
|
|
||||||
`.sops.yaml` or any file within `secrets/`. Ask the user to modify if necessary.
|
|
||||||
|
|
||||||
### Git Workflow
|
### Git Workflow
|
||||||
|
|
||||||
**Important:** Never commit directly to `master` unless the user explicitly asks for it. Always create a feature branch for changes.
|
**Important:** Never commit directly to `master` unless the user explicitly asks for it. Always create a feature branch for changes.
|
||||||
|
|
||||||
**Important:** Never amend commits to `master` unless the user explicitly asks for it. Amending rewrites history and causes issues for deployed configurations.
|
**Important:** Never amend commits to `master` unless the user explicitly asks for it. Amending rewrites history and causes issues for deployed configurations.
|
||||||
|
|
||||||
|
**Important:** Never force push to `master`. If a commit on master has an error, fix it with a new commit rather than rewriting history.
|
||||||
|
|
||||||
|
**Important:** Do not use `gh pr create` to create pull requests. The git server does not support GitHub CLI for PR creation. Instead, push the branch and let the user create the PR manually via the web interface.
|
||||||
|
|
||||||
When starting a new plan or task, the first step should typically be to create and checkout a new branch with an appropriate name (e.g., `git checkout -b dns-automation` or `git checkout -b fix-nginx-config`).
|
When starting a new plan or task, the first step should typically be to create and checkout a new branch with an appropriate name (e.g., `git checkout -b dns-automation` or `git checkout -b fix-nginx-config`).
|
||||||
|
|
||||||
### Plan Management
|
### Plan Management
|
||||||
@@ -132,67 +188,16 @@ Two MCP servers are available for searching NixOS options and packages:
|
|||||||
|
|
||||||
This ensures documentation matches the exact nixpkgs version (currently NixOS 25.11) used by this flake.
|
This ensures documentation matches the exact nixpkgs version (currently NixOS 25.11) used by this flake.
|
||||||
|
|
||||||
### Lab Monitoring Log Queries
|
### Lab Monitoring
|
||||||
|
|
||||||
The **lab-monitoring** MCP server can query logs from Loki. All hosts ship systemd journal logs via Promtail.
|
The **lab-monitoring** MCP server provides access to Prometheus metrics and Loki logs. Use the `/observability` skill for detailed reference on:
|
||||||
|
|
||||||
**Loki Label Reference:**
|
- Available Prometheus jobs and exporters
|
||||||
|
- Loki labels and LogQL query syntax
|
||||||
|
- Bootstrap log monitoring for new VMs
|
||||||
|
- Common troubleshooting workflows
|
||||||
|
|
||||||
- `host` - Hostname (e.g., `ns1`, `ns2`, `monitoring01`, `ha1`). Use this label, not `hostname`.
|
The skill contains up-to-date information about all scrape targets, host labels, and example queries.
|
||||||
- `systemd_unit` - Systemd unit name (e.g., `nsd.service`, `prometheus.service`, `nixos-upgrade.service`)
|
|
||||||
- `job` - Either `systemd-journal` (most logs) or `varlog` (file-based logs like caddy access logs)
|
|
||||||
- `filename` - For `varlog` job, the log file path (e.g., `/var/log/caddy/nix-cache.log`)
|
|
||||||
|
|
||||||
Journal log entries are JSON-formatted with the actual log message in the `MESSAGE` field. Other useful fields include `PRIORITY` and `SYSLOG_IDENTIFIER`.
|
|
||||||
|
|
||||||
**Example LogQL queries:**
|
|
||||||
```
|
|
||||||
# Logs from a specific service on a host
|
|
||||||
{host="ns2", systemd_unit="nsd.service"}
|
|
||||||
|
|
||||||
# Substring match on log content
|
|
||||||
{host="ns1", systemd_unit="nsd.service"} |= "error"
|
|
||||||
|
|
||||||
# File-based logs (e.g., caddy access logs)
|
|
||||||
{job="varlog", hostname="nix-cache01"}
|
|
||||||
```
|
|
||||||
|
|
||||||
Default lookback is 1 hour. Use the `start` parameter with relative durations (e.g., `24h`, `168h`) for older logs.
|
|
||||||
|
|
||||||
### Lab Monitoring Prometheus Queries
|
|
||||||
|
|
||||||
The **lab-monitoring** MCP server can query Prometheus metrics via PromQL. The `instance` label uses the FQDN format `<host>.home.2rjus.net:<port>`.
|
|
||||||
|
|
||||||
**Prometheus Job Names:**
|
|
||||||
|
|
||||||
- `node-exporter` - System metrics from all hosts (CPU, memory, disk, network)
|
|
||||||
- `caddy` - Reverse proxy metrics (http-proxy)
|
|
||||||
- `nix-cache_caddy` - Nix binary cache metrics
|
|
||||||
- `home-assistant` - Home automation metrics
|
|
||||||
- `jellyfin` - Media server metrics
|
|
||||||
- `loki` / `prometheus` / `grafana` - Monitoring stack self-metrics
|
|
||||||
- `step-ca` - Internal CA metrics
|
|
||||||
- `pve-exporter` - Proxmox hypervisor metrics
|
|
||||||
- `smartctl` - Disk SMART health (gunter)
|
|
||||||
- `wireguard` - VPN metrics (http-proxy)
|
|
||||||
- `pushgateway` - Push-based metrics (e.g., backup results)
|
|
||||||
- `restic_rest` - Backup server metrics
|
|
||||||
- `labmon` / `ghettoptt` / `alertmanager` - Other service metrics
|
|
||||||
|
|
||||||
**Example PromQL queries:**
|
|
||||||
```
|
|
||||||
# Check all targets are up
|
|
||||||
up
|
|
||||||
|
|
||||||
# CPU usage for a specific host
|
|
||||||
rate(node_cpu_seconds_total{instance=~"ns1.*", mode!="idle"}[5m])
|
|
||||||
|
|
||||||
# Memory usage across all hosts
|
|
||||||
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
|
|
||||||
|
|
||||||
# Disk space
|
|
||||||
node_filesystem_avail_bytes{mountpoint="/"}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Deploying to Test Hosts
|
### Deploying to Test Hosts
|
||||||
|
|
||||||
@@ -229,6 +234,21 @@ deploy(role="vault", action="switch")
|
|||||||
|
|
||||||
**Note:** Only test-tier hosts with `homelab.deploy.enable = true` and the listener service running will respond to deployments.
|
**Note:** Only test-tier hosts with `homelab.deploy.enable = true` and the listener service running will respond to deployments.
|
||||||
|
|
||||||
|
**Deploying to Prod Hosts:**
|
||||||
|
|
||||||
|
The MCP server only deploys to test-tier hosts. For prod hosts, use the CLI directly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nix develop -c homelab-deploy -- deploy \
|
||||||
|
--nats-url nats://nats1.home.2rjus.net:4222 \
|
||||||
|
--nkey-file ~/.config/homelab-deploy/admin-deployer.nkey \
|
||||||
|
--branch <branch-name> \
|
||||||
|
--action switch \
|
||||||
|
deploy.prod.<hostname>
|
||||||
|
```
|
||||||
|
|
||||||
|
Subject format: `deploy.<tier>.<hostname>` (e.g., `deploy.prod.monitoring01`, `deploy.test.testvm01`)
|
||||||
|
|
||||||
**Verifying Deployments:**
|
**Verifying Deployments:**
|
||||||
|
|
||||||
After deploying, use the `nixos_flake_info` metric from nixos-exporter to verify the host is running the expected revision:
|
After deploying, use the `nixos_flake_info` metric from nixos-exporter to verify the host is running the expected revision:
|
||||||
@@ -248,10 +268,11 @@ The `current_rev` label contains the git commit hash of the deployed flake confi
|
|||||||
- `default.nix` - Entry point, imports configuration.nix and services
|
- `default.nix` - Entry point, imports configuration.nix and services
|
||||||
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
- `configuration.nix` - Host-specific settings (networking, hardware, users)
|
||||||
- `/system/` - Shared system-level configurations applied to ALL hosts
|
- `/system/` - Shared system-level configurations applied to ALL hosts
|
||||||
- Core modules: nix.nix, sshd.nix, sops.nix (legacy), vault-secrets.nix, acme.nix, autoupgrade.nix
|
- Core modules: nix.nix, sshd.nix, vault-secrets.nix, acme.nix, autoupgrade.nix
|
||||||
|
- Additional modules: motd.nix (dynamic MOTD), packages.nix (base packages), root-user.nix (root config), homelab-deploy.nix (NATS listener)
|
||||||
- Monitoring: node-exporter and promtail on every host
|
- Monitoring: node-exporter and promtail on every host
|
||||||
- `/modules/` - Custom NixOS modules
|
- `/modules/` - Custom NixOS modules
|
||||||
- `homelab/` - Homelab-specific options (DNS automation, monitoring scrape targets)
|
- `homelab/` - Homelab-specific options (see "Homelab Module Options" section below)
|
||||||
- `/lib/` - Nix library functions
|
- `/lib/` - Nix library functions
|
||||||
- `dns-zone.nix` - DNS zone generation functions
|
- `dns-zone.nix` - DNS zone generation functions
|
||||||
- `monitoring.nix` - Prometheus scrape target generation functions
|
- `monitoring.nix` - Prometheus scrape target generation functions
|
||||||
@@ -259,14 +280,17 @@ The `current_rev` label contains the git commit hash of the deployed flake confi
|
|||||||
- `home-assistant/` - Home automation stack
|
- `home-assistant/` - Home automation stack
|
||||||
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
- `monitoring/` - Observability stack (Prometheus, Grafana, Loki, Tempo)
|
||||||
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
- `ns/` - DNS services (authoritative, resolver, zone generation)
|
||||||
- `http-proxy/`, `ca/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
- `vault/` - OpenBao (Vault) secrets server
|
||||||
- `/secrets/` - SOPS-encrypted secrets with age encryption (legacy, only used by ca)
|
- `actions-runner/` - GitHub Actions runner
|
||||||
|
- `http-proxy/`, `postgres/`, `nats/`, `jellyfin/`, etc.
|
||||||
- `/common/` - Shared configurations (e.g., VM guest agent)
|
- `/common/` - Shared configurations (e.g., VM guest agent)
|
||||||
- `/docs/` - Documentation and plans
|
- `/docs/` - Documentation and plans
|
||||||
- `plans/` - Future plans and proposals
|
- `plans/` - Future plans and proposals
|
||||||
- `plans/completed/` - Completed plans (moved here when done)
|
- `plans/completed/` - Completed plans (moved here when done)
|
||||||
- `/playbooks/` - Ansible playbooks for fleet management
|
- `/ansible/` - Ansible configuration and playbooks
|
||||||
- `/.sops.yaml` - SOPS configuration with age keys (legacy, only used by ca)
|
- `ansible.cfg` - Ansible configuration (inventory path, defaults)
|
||||||
|
- `inventory/` - Dynamic and static inventory sources
|
||||||
|
- `playbooks/` - Ansible playbooks for fleet management
|
||||||
|
|
||||||
### Configuration Inheritance
|
### Configuration Inheritance
|
||||||
|
|
||||||
@@ -283,37 +307,27 @@ All hosts automatically get:
|
|||||||
- Nix binary cache (nix-cache.home.2rjus.net)
|
- Nix binary cache (nix-cache.home.2rjus.net)
|
||||||
- SSH with root login enabled
|
- SSH with root login enabled
|
||||||
- OpenBao (Vault) secrets management via AppRole
|
- OpenBao (Vault) secrets management via AppRole
|
||||||
- Internal ACME CA integration (ca.home.2rjus.net)
|
- Internal ACME CA integration (OpenBao PKI at vault.home.2rjus.net)
|
||||||
- Daily auto-upgrades with auto-reboot
|
- Daily auto-upgrades with auto-reboot
|
||||||
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
- Prometheus node-exporter + Promtail (logs to monitoring01)
|
||||||
- Monitoring scrape target auto-registration via `homelab.monitoring` options
|
- Monitoring scrape target auto-registration via `homelab.monitoring` options
|
||||||
- Custom root CA trust
|
- Custom root CA trust
|
||||||
- DNS zone auto-registration via `homelab.dns` options
|
- DNS zone auto-registration via `homelab.dns` options
|
||||||
|
|
||||||
### Active Hosts
|
### Hosts
|
||||||
|
|
||||||
Production servers managed by `rebuild-all.sh`:
|
Host configurations are in `/hosts/<hostname>/`. See `flake.nix` for the complete list of `nixosConfigurations`.
|
||||||
- `ns1`, `ns2` - Primary/secondary DNS servers (10.69.13.5/6)
|
|
||||||
- `ca` - Internal Certificate Authority
|
|
||||||
- `ha1` - Home Assistant + Zigbee2MQTT + Mosquitto
|
|
||||||
- `http-proxy` - Reverse proxy
|
|
||||||
- `monitoring01` - Full observability stack (Prometheus, Grafana, Loki, Tempo, Pyroscope)
|
|
||||||
- `jelly01` - Jellyfin media server
|
|
||||||
- `nix-cache01` - Binary cache server
|
|
||||||
- `pgdb1` - PostgreSQL database
|
|
||||||
- `nats1` - NATS messaging server
|
|
||||||
|
|
||||||
Template/test hosts:
|
Use `nix flake show` or `nix develop -c ansible-inventory --graph` to list all hosts.
|
||||||
- `template1` - Base template for cloning new hosts
|
|
||||||
|
|
||||||
### Flake Inputs
|
### Flake Inputs
|
||||||
|
|
||||||
- `nixpkgs` - NixOS 25.11 stable (primary)
|
- `nixpkgs` - NixOS 25.11 stable (primary)
|
||||||
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
- `nixpkgs-unstable` - Unstable channel (available via overlay as `pkgs.unstable.<package>`)
|
||||||
- `sops-nix` - Secrets management (legacy, only used by ca)
|
- `nixos-exporter` - NixOS module for exposing flake revision metrics (used to verify deployments)
|
||||||
|
- `homelab-deploy` - NATS-based remote deployment tool for test-tier hosts
|
||||||
- Custom packages from git.t-juice.club:
|
- Custom packages from git.t-juice.club:
|
||||||
- `alerttonotify` - Alert routing
|
- `alerttonotify` - Alert routing
|
||||||
- `labmon` - Lab monitoring
|
|
||||||
|
|
||||||
### Network Architecture
|
### Network Architecture
|
||||||
|
|
||||||
@@ -335,12 +349,7 @@ Most hosts use OpenBao (Vault) for secrets:
|
|||||||
- `extractKey` option extracts a single key from vault JSON as a plain file
|
- `extractKey` option extracts a single key from vault JSON as a plain file
|
||||||
- Secrets fetched at boot by `vault-secret-<name>.service` systemd units
|
- Secrets fetched at boot by `vault-secret-<name>.service` systemd units
|
||||||
- Fallback to cached secrets in `/var/lib/vault/cache/` when Vault is unreachable
|
- Fallback to cached secrets in `/var/lib/vault/cache/` when Vault is unreachable
|
||||||
- Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
- Provision AppRole credentials: `nix develop -c ansible-playbook ansible/playbooks/provision-approle.yml -l <hostname>`
|
||||||
|
|
||||||
Legacy SOPS (only used by `ca` host):
|
|
||||||
- SOPS with age encryption, keys in `.sops.yaml`
|
|
||||||
- Shared secrets: `/secrets/secrets.yaml`
|
|
||||||
- Per-host secrets: `/secrets/<hostname>/`
|
|
||||||
|
|
||||||
### Auto-Upgrade System
|
### Auto-Upgrade System
|
||||||
|
|
||||||
@@ -364,7 +373,7 @@ Template VMs are built from `hosts/template2` and deployed to Proxmox using Ansi
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Build NixOS image and deploy to Proxmox as template
|
# Build NixOS image and deploy to Proxmox as template
|
||||||
nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml
|
nix develop -c ansible-playbook ansible/playbooks/build-and-deploy-template.yml
|
||||||
```
|
```
|
||||||
|
|
||||||
This playbook:
|
This playbook:
|
||||||
@@ -402,9 +411,21 @@ Example VM deployment includes:
|
|||||||
- Custom CPU/memory/disk sizing
|
- Custom CPU/memory/disk sizing
|
||||||
- VLAN tagging
|
- VLAN tagging
|
||||||
- QEMU guest agent
|
- QEMU guest agent
|
||||||
|
- Automatic Vault credential provisioning via `vault_wrapped_token`
|
||||||
|
|
||||||
OpenTofu outputs the VM's IP address after deployment for easy SSH access.
|
OpenTofu outputs the VM's IP address after deployment for easy SSH access.
|
||||||
|
|
||||||
|
**Automatic Vault Credential Provisioning:**
|
||||||
|
|
||||||
|
VMs can receive Vault (OpenBao) credentials automatically during bootstrap:
|
||||||
|
|
||||||
|
1. OpenTofu generates a wrapped token via `terraform/vault/` and stores it in the VM configuration
|
||||||
|
2. Cloud-init passes `VAULT_WRAPPED_TOKEN` and `NIXOS_FLAKE_BRANCH` to the bootstrap script
|
||||||
|
3. The bootstrap script unwraps the token to obtain AppRole credentials
|
||||||
|
4. Credentials are written to `/var/lib/vault/approle/` before the NixOS rebuild
|
||||||
|
|
||||||
|
This eliminates the need for manual `provision-approle.yml` playbook runs on new VMs. Bootstrap progress is logged to Loki with `job="bootstrap"` labels.
|
||||||
|
|
||||||
#### Template Rebuilding and Terraform State
|
#### Template Rebuilding and Terraform State
|
||||||
|
|
||||||
When the Proxmox template is rebuilt (via `build-and-deploy-template.yml`), the template name may change. This would normally cause Terraform to want to recreate all existing VMs, but that's unnecessary since VMs are independent once cloned.
|
When the Proxmox template is rebuilt (via `build-and-deploy-template.yml`), the template name may change. This would normally cause Terraform to want to recreate all existing VMs, but that's unnecessary since VMs are independent once cloned.
|
||||||
@@ -427,7 +448,7 @@ This means:
|
|||||||
- `tofu plan` won't show spurious changes for Proxmox-managed defaults
|
- `tofu plan` won't show spurious changes for Proxmox-managed defaults
|
||||||
|
|
||||||
**When rebuilding the template:**
|
**When rebuilding the template:**
|
||||||
1. Run `nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml`
|
1. Run `nix develop -c ansible-playbook ansible/playbooks/build-and-deploy-template.yml`
|
||||||
2. Update `default_template_name` in `terraform/variables.tf` if the name changed
|
2. Update `default_template_name` in `terraform/variables.tf` if the name changed
|
||||||
3. Run `tofu plan` - should show no VM recreations (only template name in state)
|
3. Run `tofu plan` - should show no VM recreations (only template name in state)
|
||||||
4. Run `tofu apply` - updates state without touching existing VMs
|
4. Run `tofu apply` - updates state without touching existing VMs
|
||||||
@@ -435,20 +456,11 @@ This means:
|
|||||||
|
|
||||||
### Adding a New Host
|
### Adding a New Host
|
||||||
|
|
||||||
1. Create `/hosts/<hostname>/` directory
|
See [docs/host-creation.md](docs/host-creation.md) for the complete host creation pipeline, including:
|
||||||
2. Copy structure from `template1` or similar host
|
- Using the `create-host` script to generate host configurations
|
||||||
3. Add host entry to `flake.nix` nixosConfigurations
|
- Deploying VMs and secrets with OpenTofu
|
||||||
4. Configure networking in `configuration.nix` (static IP via `systemd.network.networks`, DNS servers)
|
- Monitoring the bootstrap process via Loki
|
||||||
5. (Optional) Add `homelab.dns.cnames` if the host needs CNAME aliases
|
- Verification and troubleshooting steps
|
||||||
6. Add `vault.enable = true;` to the host configuration
|
|
||||||
7. Add AppRole policy in `terraform/vault/approle.tf` and any secrets in `secrets.tf`
|
|
||||||
8. Run `tofu apply` in `terraform/vault/`
|
|
||||||
9. User clones template host
|
|
||||||
10. User runs `prepare-host.sh` on new host
|
|
||||||
11. Provision AppRole credentials: `nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<host>`
|
|
||||||
12. Commit changes, and merge to master.
|
|
||||||
13. Deploy by running `nixos-rebuild boot --flake URL#<hostname>` on the host.
|
|
||||||
14. Run auto-upgrade on DNS servers (ns1, ns2) to pick up the new host's DNS entry
|
|
||||||
|
|
||||||
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
**Note:** DNS A records and Prometheus node-exporter scrape targets are auto-generated from the host's `systemd.network.networks` static IP configuration. No manual zone file or Prometheus config editing is required.
|
||||||
|
|
||||||
@@ -484,11 +496,7 @@ Prometheus scrape targets are automatically generated from host configurations,
|
|||||||
- **External targets**: Non-flake hosts defined in `/services/monitoring/external-targets.nix`
|
- **External targets**: Non-flake hosts defined in `/services/monitoring/external-targets.nix`
|
||||||
- **Library**: `lib/monitoring.nix` provides `generateNodeExporterTargets` and `generateScrapeConfigs`
|
- **Library**: `lib/monitoring.nix` provides `generateNodeExporterTargets` and `generateScrapeConfigs`
|
||||||
|
|
||||||
Host monitoring options (`homelab.monitoring.*`):
|
Service modules declare their scrape targets directly via `homelab.monitoring.scrapeTargets`. The Prometheus config on monitoring01 auto-generates scrape configs from all hosts. See "Homelab Module Options" section for available options.
|
||||||
- `enable` (default: `true`) - Include host in Prometheus node-exporter scrape targets
|
|
||||||
- `scrapeTargets` (default: `[]`) - Additional scrape targets exposed by this host (job_name, port, metrics_path, scheme, scrape_interval, honor_labels)
|
|
||||||
|
|
||||||
Service modules declare their scrape targets directly (e.g., `services/ca/default.nix` declares step-ca on port 9000). The Prometheus config on monitoring01 auto-generates scrape configs from all hosts.
|
|
||||||
|
|
||||||
To add monitoring targets for non-NixOS hosts, edit `/services/monitoring/external-targets.nix`.
|
To add monitoring targets for non-NixOS hosts, edit `/services/monitoring/external-targets.nix`.
|
||||||
|
|
||||||
@@ -507,13 +515,31 @@ DNS zone entries are automatically generated from host configurations:
|
|||||||
- **External hosts**: Non-flake hosts defined in `/services/ns/external-hosts.nix`
|
- **External hosts**: Non-flake hosts defined in `/services/ns/external-hosts.nix`
|
||||||
- **Serial number**: Uses `self.sourceInfo.lastModified` (git commit timestamp)
|
- **Serial number**: Uses `self.sourceInfo.lastModified` (git commit timestamp)
|
||||||
|
|
||||||
Host DNS options (`homelab.dns.*`):
|
|
||||||
- `enable` (default: `true`) - Include host in DNS zone generation
|
|
||||||
- `cnames` (default: `[]`) - List of CNAME aliases pointing to this host
|
|
||||||
|
|
||||||
Hosts are automatically excluded from DNS if:
|
Hosts are automatically excluded from DNS if:
|
||||||
- `homelab.dns.enable = false` (e.g., template hosts)
|
- `homelab.dns.enable = false` (e.g., template hosts)
|
||||||
- No static IP configured (e.g., DHCP-only hosts)
|
- No static IP configured (e.g., DHCP-only hosts)
|
||||||
- Network interface is a VPN/tunnel (wg*, tun*, tap*)
|
- Network interface is a VPN/tunnel (wg*, tun*, tap*)
|
||||||
|
|
||||||
To add DNS entries for non-NixOS hosts, edit `/services/ns/external-hosts.nix`.
|
To add DNS entries for non-NixOS hosts, edit `/services/ns/external-hosts.nix`.
|
||||||
|
|
||||||
|
### Homelab Module Options
|
||||||
|
|
||||||
|
The `modules/homelab/` directory defines custom options used across hosts for automation and metadata.
|
||||||
|
|
||||||
|
**Host options (`homelab.host.*`):**
|
||||||
|
- `tier` - Deployment tier: `test` or `prod`. Test-tier hosts can receive remote deployments and have different credential access.
|
||||||
|
- `priority` - Alerting priority: `high` or `low`. Controls alerting thresholds for the host.
|
||||||
|
- `role` - Primary role designation (e.g., `dns`, `database`, `bastion`, `vault`)
|
||||||
|
- `labels` - Free-form key-value metadata for host categorization
|
||||||
|
- `ansible = "false"` - Exclude host from Ansible dynamic inventory
|
||||||
|
|
||||||
|
**DNS options (`homelab.dns.*`):**
|
||||||
|
- `enable` (default: `true`) - Include host in DNS zone generation
|
||||||
|
- `cnames` (default: `[]`) - List of CNAME aliases pointing to this host
|
||||||
|
|
||||||
|
**Monitoring options (`homelab.monitoring.*`):**
|
||||||
|
- `enable` (default: `true`) - Include host in Prometheus node-exporter scrape targets
|
||||||
|
- `scrapeTargets` (default: `[]`) - Additional scrape targets exposed by this host
|
||||||
|
|
||||||
|
**Deploy options (`homelab.deploy.*`):**
|
||||||
|
- `enable` (default: `false`) - Enable NATS-based remote deployment listener. When enabled, the host listens for deployment commands via NATS and can be targeted by the `homelab-deploy` MCP server.
|
||||||
|
|||||||
@@ -12,8 +12,7 @@ NixOS Flake-based configuration repository for a homelab infrastructure. All hos
|
|||||||
| `http-proxy` | Reverse proxy |
|
| `http-proxy` | Reverse proxy |
|
||||||
| `monitoring01` | Prometheus, Grafana, Loki, Tempo, Pyroscope |
|
| `monitoring01` | Prometheus, Grafana, Loki, Tempo, Pyroscope |
|
||||||
| `jelly01` | Jellyfin media server |
|
| `jelly01` | Jellyfin media server |
|
||||||
| `nix-cache01` | Nix binary cache |
|
| `nix-cache02` | Nix binary cache + NATS-based build service |
|
||||||
| `pgdb1` | PostgreSQL |
|
|
||||||
| `nats1` | NATS messaging |
|
| `nats1` | NATS messaging |
|
||||||
| `vault01` | OpenBao (Vault) secrets management |
|
| `vault01` | OpenBao (Vault) secrets management |
|
||||||
| `template1`, `template2` | VM templates for cloning new hosts |
|
| `template1`, `template2` | VM templates for cloning new hosts |
|
||||||
|
|||||||
120
ansible/README.md
Normal file
120
ansible/README.md
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
# Ansible Configuration
|
||||||
|
|
||||||
|
This directory contains Ansible configuration for fleet management tasks.
|
||||||
|
|
||||||
|
## Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
ansible/
|
||||||
|
├── ansible.cfg # Ansible configuration
|
||||||
|
├── inventory/
|
||||||
|
│ ├── dynamic_flake.py # Dynamic inventory from NixOS flake
|
||||||
|
│ ├── static.yml # Non-flake hosts (Proxmox, etc.)
|
||||||
|
│ └── group_vars/
|
||||||
|
│ └── all.yml # Common variables
|
||||||
|
└── playbooks/
|
||||||
|
├── build-and-deploy-template.yml
|
||||||
|
├── provision-approle.yml
|
||||||
|
├── restart-service.yml
|
||||||
|
└── run-upgrade.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
The devshell automatically configures `ANSIBLE_CONFIG`, so commands work without extra flags:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List inventory groups
|
||||||
|
nix develop -c ansible-inventory --graph
|
||||||
|
|
||||||
|
# List hosts in a specific group
|
||||||
|
nix develop -c ansible-inventory --list | jq '.role_dns'
|
||||||
|
|
||||||
|
# Run a playbook
|
||||||
|
nix develop -c ansible-playbook ansible/playbooks/run-upgrade.yml -l tier_test
|
||||||
|
```
|
||||||
|
|
||||||
|
## Inventory
|
||||||
|
|
||||||
|
The inventory combines dynamic and static sources automatically.
|
||||||
|
|
||||||
|
### Dynamic Inventory (from flake)
|
||||||
|
|
||||||
|
The `dynamic_flake.py` script extracts hosts from the NixOS flake using `homelab.host.*` options:
|
||||||
|
|
||||||
|
**Groups generated:**
|
||||||
|
- `flake_hosts` - All NixOS hosts from the flake
|
||||||
|
- `tier_test`, `tier_prod` - By `homelab.host.tier`
|
||||||
|
- `role_dns`, `role_vault`, `role_monitoring`, etc. - By `homelab.host.role`
|
||||||
|
|
||||||
|
**Host variables set:**
|
||||||
|
- `tier` - Deployment tier (test/prod)
|
||||||
|
- `role` - Host role
|
||||||
|
- `short_hostname` - Hostname without domain
|
||||||
|
|
||||||
|
### Static Inventory
|
||||||
|
|
||||||
|
Non-flake hosts are defined in `inventory/static.yml`:
|
||||||
|
|
||||||
|
- `proxmox` - Proxmox hypervisors
|
||||||
|
|
||||||
|
## Playbooks
|
||||||
|
|
||||||
|
| Playbook | Description | Example |
|
||||||
|
|----------|-------------|---------|
|
||||||
|
| `run-upgrade.yml` | Trigger nixos-upgrade on hosts | `-l tier_prod` |
|
||||||
|
| `restart-service.yml` | Restart a systemd service | `-l role_dns -e service=unbound` |
|
||||||
|
| `reboot.yml` | Rolling reboot (one host at a time) | `-l tier_test` |
|
||||||
|
| `provision-approle.yml` | Deploy Vault credentials (single host only) | `-l testvm01` |
|
||||||
|
| `build-and-deploy-template.yml` | Build and deploy Proxmox template | (no limit needed) |
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restart unbound on all DNS servers
|
||||||
|
nix develop -c ansible-playbook ansible/playbooks/restart-service.yml \
|
||||||
|
-l role_dns -e service=unbound
|
||||||
|
|
||||||
|
# Trigger upgrade on all test hosts
|
||||||
|
nix develop -c ansible-playbook ansible/playbooks/run-upgrade.yml -l tier_test
|
||||||
|
|
||||||
|
# Provision Vault credentials for a specific host
|
||||||
|
nix develop -c ansible-playbook ansible/playbooks/provision-approle.yml -l testvm01
|
||||||
|
|
||||||
|
# Build and deploy Proxmox template
|
||||||
|
nix develop -c ansible-playbook ansible/playbooks/build-and-deploy-template.yml
|
||||||
|
|
||||||
|
# Rolling reboot of test hosts (one at a time, waits for each to come back)
|
||||||
|
nix develop -c ansible-playbook ansible/playbooks/reboot.yml -l tier_test
|
||||||
|
```
|
||||||
|
|
||||||
|
## Excluding Flake Hosts
|
||||||
|
|
||||||
|
To exclude a flake host from the dynamic inventory, add the `ansible = "false"` label in the host's configuration:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
homelab.host.labels.ansible = "false";
|
||||||
|
```
|
||||||
|
|
||||||
|
Hosts with `homelab.dns.enable = false` are also excluded automatically.
|
||||||
|
|
||||||
|
## Adding Non-Flake Hosts
|
||||||
|
|
||||||
|
Edit `inventory/static.yml` to add hosts not managed by the NixOS flake:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
all:
|
||||||
|
children:
|
||||||
|
my_group:
|
||||||
|
hosts:
|
||||||
|
host1.example.com:
|
||||||
|
ansible_user: admin
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Variables
|
||||||
|
|
||||||
|
Variables in `inventory/group_vars/all.yml` apply to all hosts:
|
||||||
|
|
||||||
|
- `ansible_user` - Default SSH user (root)
|
||||||
|
- `domain` - Domain name (home.2rjus.net)
|
||||||
|
- `vault_addr` - Vault server URL
|
||||||
17
ansible/ansible.cfg
Normal file
17
ansible/ansible.cfg
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
[defaults]
|
||||||
|
inventory = inventory/
|
||||||
|
remote_user = root
|
||||||
|
host_key_checking = False
|
||||||
|
|
||||||
|
# Reduce SSH connection overhead
|
||||||
|
forks = 10
|
||||||
|
pipelining = True
|
||||||
|
|
||||||
|
# Output formatting (YAML output via builtin default callback)
|
||||||
|
stdout_callback = default
|
||||||
|
callbacks_enabled = profile_tasks
|
||||||
|
result_format = yaml
|
||||||
|
|
||||||
|
[ssh_connection]
|
||||||
|
# Reuse SSH connections
|
||||||
|
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
|
||||||
162
ansible/inventory/dynamic_flake.py
Executable file
162
ansible/inventory/dynamic_flake.py
Executable file
@@ -0,0 +1,162 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Dynamic Ansible inventory script that extracts host information from the NixOS flake.
|
||||||
|
|
||||||
|
Generates groups:
|
||||||
|
- flake_hosts: All hosts defined in the flake
|
||||||
|
- tier_test, tier_prod: Hosts by deployment tier
|
||||||
|
- role_<name>: Hosts by role (dns, vault, monitoring, etc.)
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
./dynamic_flake.py --list # Return full inventory
|
||||||
|
./dynamic_flake.py --host X # Return host vars (not used, but required by Ansible)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def get_flake_dir() -> Path:
|
||||||
|
"""Find the flake root directory."""
|
||||||
|
script_dir = Path(__file__).resolve().parent
|
||||||
|
# ansible/inventory/dynamic_flake.py -> repo root
|
||||||
|
return script_dir.parent.parent
|
||||||
|
|
||||||
|
|
||||||
|
def evaluate_flake() -> dict:
|
||||||
|
"""Evaluate the flake and extract host metadata."""
|
||||||
|
flake_dir = get_flake_dir()
|
||||||
|
|
||||||
|
# Nix expression to extract relevant config from each host
|
||||||
|
nix_expr = """
|
||||||
|
configs: builtins.mapAttrs (name: cfg: {
|
||||||
|
hostname = cfg.config.networking.hostName;
|
||||||
|
domain = cfg.config.networking.domain or "home.2rjus.net";
|
||||||
|
tier = cfg.config.homelab.host.tier;
|
||||||
|
role = cfg.config.homelab.host.role;
|
||||||
|
labels = cfg.config.homelab.host.labels;
|
||||||
|
dns_enabled = cfg.config.homelab.dns.enable;
|
||||||
|
}) configs
|
||||||
|
"""
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
[
|
||||||
|
"nix",
|
||||||
|
"eval",
|
||||||
|
"--json",
|
||||||
|
f"{flake_dir}#nixosConfigurations",
|
||||||
|
"--apply",
|
||||||
|
nix_expr,
|
||||||
|
],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
check=True,
|
||||||
|
cwd=flake_dir,
|
||||||
|
)
|
||||||
|
return json.loads(result.stdout)
|
||||||
|
except subprocess.CalledProcessError as e:
|
||||||
|
print(f"Error evaluating flake: {e.stderr}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
print(f"Error parsing nix output: {e}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_group_name(name: str) -> str:
|
||||||
|
"""Sanitize a string for use as an Ansible group name.
|
||||||
|
|
||||||
|
Ansible group names should contain only alphanumeric characters and underscores.
|
||||||
|
"""
|
||||||
|
return name.replace("-", "_")
|
||||||
|
|
||||||
|
|
||||||
|
def build_inventory(hosts_data: dict) -> dict:
|
||||||
|
"""Build Ansible inventory structure from host data."""
|
||||||
|
inventory = {
|
||||||
|
"_meta": {"hostvars": {}},
|
||||||
|
"flake_hosts": {"hosts": []},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Track groups we need to create
|
||||||
|
tier_groups: dict[str, list[str]] = {}
|
||||||
|
role_groups: dict[str, list[str]] = {}
|
||||||
|
|
||||||
|
for _config_name, host_info in hosts_data.items():
|
||||||
|
hostname = host_info["hostname"]
|
||||||
|
domain = host_info["domain"]
|
||||||
|
tier = host_info["tier"]
|
||||||
|
role = host_info["role"]
|
||||||
|
labels = host_info["labels"]
|
||||||
|
dns_enabled = host_info["dns_enabled"]
|
||||||
|
|
||||||
|
# Skip hosts that have DNS disabled (like templates)
|
||||||
|
if not dns_enabled:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Skip hosts with ansible = "false" label
|
||||||
|
if labels.get("ansible") == "false":
|
||||||
|
continue
|
||||||
|
|
||||||
|
fqdn = f"{hostname}.{domain}"
|
||||||
|
|
||||||
|
# Use short hostname as inventory name, FQDN for connection
|
||||||
|
inventory_name = hostname
|
||||||
|
|
||||||
|
# Add to flake_hosts group
|
||||||
|
inventory["flake_hosts"]["hosts"].append(inventory_name)
|
||||||
|
|
||||||
|
# Add host variables
|
||||||
|
inventory["_meta"]["hostvars"][inventory_name] = {
|
||||||
|
"ansible_host": fqdn, # Connect using FQDN
|
||||||
|
"fqdn": fqdn,
|
||||||
|
"tier": tier,
|
||||||
|
"role": role,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Group by tier
|
||||||
|
tier_group = f"tier_{sanitize_group_name(tier)}"
|
||||||
|
if tier_group not in tier_groups:
|
||||||
|
tier_groups[tier_group] = []
|
||||||
|
tier_groups[tier_group].append(inventory_name)
|
||||||
|
|
||||||
|
# Group by role (if set)
|
||||||
|
if role:
|
||||||
|
role_group = f"role_{sanitize_group_name(role)}"
|
||||||
|
if role_group not in role_groups:
|
||||||
|
role_groups[role_group] = []
|
||||||
|
role_groups[role_group].append(inventory_name)
|
||||||
|
|
||||||
|
# Add tier groups to inventory
|
||||||
|
for group_name, hosts in tier_groups.items():
|
||||||
|
inventory[group_name] = {"hosts": hosts}
|
||||||
|
|
||||||
|
# Add role groups to inventory
|
||||||
|
for group_name, hosts in role_groups.items():
|
||||||
|
inventory[group_name] = {"hosts": hosts}
|
||||||
|
|
||||||
|
return inventory
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: dynamic_flake.py --list | --host <hostname>", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
if sys.argv[1] == "--list":
|
||||||
|
hosts_data = evaluate_flake()
|
||||||
|
inventory = build_inventory(hosts_data)
|
||||||
|
print(json.dumps(inventory, indent=2))
|
||||||
|
elif sys.argv[1] == "--host":
|
||||||
|
# Ansible calls this to get vars for a specific host
|
||||||
|
# We provide all vars in _meta.hostvars, so just return empty
|
||||||
|
print(json.dumps({}))
|
||||||
|
else:
|
||||||
|
print(f"Unknown option: {sys.argv[1]}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
5
ansible/inventory/group_vars/all.yml
Normal file
5
ansible/inventory/group_vars/all.yml
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
# Common variables for all hosts
|
||||||
|
|
||||||
|
ansible_user: root
|
||||||
|
domain: home.2rjus.net
|
||||||
|
vault_addr: https://vault01.home.2rjus.net:8200
|
||||||
13
ansible/inventory/static.yml
Normal file
13
ansible/inventory/static.yml
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
# Static inventory for non-flake hosts
|
||||||
|
#
|
||||||
|
# Hosts defined here are merged with the dynamic flake inventory.
|
||||||
|
# Use this for infrastructure that isn't managed by NixOS.
|
||||||
|
#
|
||||||
|
# Use short hostnames as inventory names with ansible_host for FQDN.
|
||||||
|
|
||||||
|
all:
|
||||||
|
children:
|
||||||
|
proxmox:
|
||||||
|
hosts:
|
||||||
|
pve1:
|
||||||
|
ansible_host: pve1.home.2rjus.net
|
||||||
@@ -15,13 +15,13 @@
|
|||||||
- name: Build NixOS image
|
- name: Build NixOS image
|
||||||
ansible.builtin.command:
|
ansible.builtin.command:
|
||||||
cmd: "nixos-rebuild build-image --image-variant proxmox --flake .#template2"
|
cmd: "nixos-rebuild build-image --image-variant proxmox --flake .#template2"
|
||||||
chdir: "{{ playbook_dir }}/.."
|
chdir: "{{ playbook_dir }}/../.."
|
||||||
register: build_result
|
register: build_result
|
||||||
changed_when: true
|
changed_when: true
|
||||||
|
|
||||||
- name: Find built image file
|
- name: Find built image file
|
||||||
ansible.builtin.find:
|
ansible.builtin.find:
|
||||||
paths: "{{ playbook_dir}}/../result"
|
paths: "{{ playbook_dir}}/../../result"
|
||||||
patterns: "*.vma.zst"
|
patterns: "*.vma.zst"
|
||||||
recurse: true
|
recurse: true
|
||||||
register: image_files
|
register: image_files
|
||||||
@@ -99,3 +99,48 @@
|
|||||||
- name: Display success message
|
- name: Display success message
|
||||||
ansible.builtin.debug:
|
ansible.builtin.debug:
|
||||||
msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}"
|
msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}"
|
||||||
|
|
||||||
|
- name: Update Terraform template name
|
||||||
|
hosts: localhost
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
vars:
|
||||||
|
terraform_dir: "{{ playbook_dir }}/../../terraform"
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Get image filename from earlier play
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
image_filename: "{{ hostvars['localhost']['image_filename'] }}"
|
||||||
|
|
||||||
|
- name: Extract template name from image filename
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
new_template_name: "{{ image_filename | regex_replace('\\.vma\\.zst$', '') | regex_replace('^vzdump-qemu-', '') }}"
|
||||||
|
|
||||||
|
- name: Read current Terraform variables file
|
||||||
|
ansible.builtin.slurp:
|
||||||
|
src: "{{ terraform_dir }}/variables.tf"
|
||||||
|
register: variables_tf_content
|
||||||
|
|
||||||
|
- name: Extract current template name from variables.tf
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
current_template_name: "{{ (variables_tf_content.content | b64decode) | regex_search('variable \"default_template_name\"[^}]+default\\s*=\\s*\"([^\"]+)\"', '\\1') | first }}"
|
||||||
|
|
||||||
|
- name: Check if template name has changed
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
template_name_changed: "{{ current_template_name != new_template_name }}"
|
||||||
|
|
||||||
|
- name: Display template name status
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "Template name: {{ current_template_name }} -> {{ new_template_name }} ({{ 'changed' if template_name_changed else 'unchanged' }})"
|
||||||
|
|
||||||
|
- name: Update default_template_name in variables.tf
|
||||||
|
ansible.builtin.replace:
|
||||||
|
path: "{{ terraform_dir }}/variables.tf"
|
||||||
|
regexp: '(variable "default_template_name"[^}]+default\s*=\s*)"[^"]+"'
|
||||||
|
replace: '\1"{{ new_template_name }}"'
|
||||||
|
when: template_name_changed
|
||||||
|
|
||||||
|
- name: Display update result
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "Updated terraform/variables.tf with new template name: {{ new_template_name }}"
|
||||||
|
when: template_name_changed
|
||||||
@@ -1,7 +1,27 @@
|
|||||||
---
|
---
|
||||||
# Provision OpenBao AppRole credentials to an existing host
|
# Provision OpenBao AppRole credentials to a host
|
||||||
# Usage: nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=ha1
|
#
|
||||||
|
# Usage: ansible-playbook ansible/playbooks/provision-approle.yml -l <hostname>
|
||||||
# Requires: BAO_ADDR and BAO_TOKEN environment variables set
|
# Requires: BAO_ADDR and BAO_TOKEN environment variables set
|
||||||
|
#
|
||||||
|
# IMPORTANT: This playbook must target exactly one host to prevent
|
||||||
|
# accidentally regenerating credentials for multiple hosts.
|
||||||
|
|
||||||
|
- name: Validate single host target
|
||||||
|
hosts: all
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Fail if targeting multiple hosts
|
||||||
|
ansible.builtin.fail:
|
||||||
|
msg: |
|
||||||
|
This playbook must target exactly one host.
|
||||||
|
Use: ansible-playbook provision-approle.yml -l <hostname>
|
||||||
|
|
||||||
|
Targeting multiple hosts would regenerate credentials for all of them,
|
||||||
|
potentially breaking existing services.
|
||||||
|
when: ansible_play_hosts | length != 1
|
||||||
|
run_once: true
|
||||||
|
|
||||||
- name: Fetch AppRole credentials from OpenBao
|
- name: Fetch AppRole credentials from OpenBao
|
||||||
hosts: localhost
|
hosts: localhost
|
||||||
@@ -9,18 +29,17 @@
|
|||||||
gather_facts: false
|
gather_facts: false
|
||||||
|
|
||||||
vars:
|
vars:
|
||||||
vault_addr: "{{ lookup('env', 'BAO_ADDR') | default('https://vault01.home.2rjus.net:8200', true) }}"
|
target_host: "{{ groups['all'] | first }}"
|
||||||
domain: "home.2rjus.net"
|
target_hostname: "{{ hostvars[target_host]['short_hostname'] | default(target_host.split('.')[0]) }}"
|
||||||
|
|
||||||
tasks:
|
tasks:
|
||||||
- name: Validate hostname is provided
|
- name: Display target host
|
||||||
ansible.builtin.fail:
|
ansible.builtin.debug:
|
||||||
msg: "hostname variable is required. Use: -e hostname=<name>"
|
msg: "Provisioning AppRole credentials for: {{ target_hostname }}"
|
||||||
when: hostname is not defined
|
|
||||||
|
|
||||||
- name: Get role-id for host
|
- name: Get role-id for host
|
||||||
ansible.builtin.command:
|
ansible.builtin.command:
|
||||||
cmd: "bao read -field=role_id auth/approle/role/{{ hostname }}/role-id"
|
cmd: "bao read -field=role_id auth/approle/role/{{ target_hostname }}/role-id"
|
||||||
environment:
|
environment:
|
||||||
BAO_ADDR: "{{ vault_addr }}"
|
BAO_ADDR: "{{ vault_addr }}"
|
||||||
BAO_SKIP_VERIFY: "1"
|
BAO_SKIP_VERIFY: "1"
|
||||||
@@ -29,25 +48,26 @@
|
|||||||
|
|
||||||
- name: Generate secret-id for host
|
- name: Generate secret-id for host
|
||||||
ansible.builtin.command:
|
ansible.builtin.command:
|
||||||
cmd: "bao write -field=secret_id -f auth/approle/role/{{ hostname }}/secret-id"
|
cmd: "bao write -field=secret_id -f auth/approle/role/{{ target_hostname }}/secret-id"
|
||||||
environment:
|
environment:
|
||||||
BAO_ADDR: "{{ vault_addr }}"
|
BAO_ADDR: "{{ vault_addr }}"
|
||||||
BAO_SKIP_VERIFY: "1"
|
BAO_SKIP_VERIFY: "1"
|
||||||
register: secret_id_result
|
register: secret_id_result
|
||||||
changed_when: true
|
changed_when: true
|
||||||
|
|
||||||
- name: Add target host to inventory
|
- name: Store credentials for next play
|
||||||
ansible.builtin.add_host:
|
ansible.builtin.set_fact:
|
||||||
name: "{{ hostname }}.{{ domain }}"
|
|
||||||
groups: vault_target
|
|
||||||
ansible_user: root
|
|
||||||
vault_role_id: "{{ role_id_result.stdout }}"
|
vault_role_id: "{{ role_id_result.stdout }}"
|
||||||
vault_secret_id: "{{ secret_id_result.stdout }}"
|
vault_secret_id: "{{ secret_id_result.stdout }}"
|
||||||
|
|
||||||
- name: Deploy AppRole credentials to host
|
- name: Deploy AppRole credentials to host
|
||||||
hosts: vault_target
|
hosts: all
|
||||||
gather_facts: false
|
gather_facts: false
|
||||||
|
|
||||||
|
vars:
|
||||||
|
vault_role_id: "{{ hostvars['localhost']['vault_role_id'] }}"
|
||||||
|
vault_secret_id: "{{ hostvars['localhost']['vault_secret_id'] }}"
|
||||||
|
|
||||||
tasks:
|
tasks:
|
||||||
- name: Create AppRole directory
|
- name: Create AppRole directory
|
||||||
ansible.builtin.file:
|
ansible.builtin.file:
|
||||||
48
ansible/playbooks/reboot.yml
Normal file
48
ansible/playbooks/reboot.yml
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
---
|
||||||
|
# Reboot hosts with rolling strategy to avoid taking down redundant services
|
||||||
|
#
|
||||||
|
# Usage examples:
|
||||||
|
# # Reboot a single host
|
||||||
|
# ansible-playbook reboot.yml -l testvm01
|
||||||
|
#
|
||||||
|
# # Reboot all test hosts (one at a time)
|
||||||
|
# ansible-playbook reboot.yml -l tier_test
|
||||||
|
#
|
||||||
|
# # Reboot all DNS servers safely (one at a time)
|
||||||
|
# ansible-playbook reboot.yml -l role_dns
|
||||||
|
#
|
||||||
|
# Safety features:
|
||||||
|
# - serial: 1 ensures only one host reboots at a time
|
||||||
|
# - Waits for host to come back online before proceeding
|
||||||
|
# - Groups hosts by role to avoid rebooting same-role hosts consecutively
|
||||||
|
|
||||||
|
- name: Reboot hosts (rolling)
|
||||||
|
hosts: all
|
||||||
|
serial: 1
|
||||||
|
order: shuffle # Randomize to spread out same-role hosts
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
vars:
|
||||||
|
reboot_timeout: 300 # 5 minutes to wait for host to come back
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Display reboot target
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "Rebooting {{ inventory_hostname }} (role: {{ role | default('none') }})"
|
||||||
|
|
||||||
|
- name: Reboot the host
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: reboot.target
|
||||||
|
state: started
|
||||||
|
async: 1
|
||||||
|
poll: 0
|
||||||
|
ignore_errors: true
|
||||||
|
|
||||||
|
- name: Wait for host to come back online
|
||||||
|
ansible.builtin.wait_for_connection:
|
||||||
|
delay: 5
|
||||||
|
timeout: "{{ reboot_timeout }}"
|
||||||
|
|
||||||
|
- name: Display reboot result
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "{{ inventory_hostname }} rebooted successfully"
|
||||||
40
ansible/playbooks/restart-service.yml
Normal file
40
ansible/playbooks/restart-service.yml
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
---
|
||||||
|
# Restart a systemd service on target hosts
|
||||||
|
#
|
||||||
|
# Usage examples:
|
||||||
|
# # Restart unbound on all DNS servers
|
||||||
|
# ansible-playbook restart-service.yml -l role_dns -e service=unbound
|
||||||
|
#
|
||||||
|
# # Restart nginx on a specific host
|
||||||
|
# ansible-playbook restart-service.yml -l http-proxy.home.2rjus.net -e service=nginx
|
||||||
|
#
|
||||||
|
# # Restart promtail on all prod hosts
|
||||||
|
# ansible-playbook restart-service.yml -l tier_prod -e service=promtail
|
||||||
|
|
||||||
|
- name: Restart systemd service
|
||||||
|
hosts: all
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Validate service name provided
|
||||||
|
ansible.builtin.fail:
|
||||||
|
msg: |
|
||||||
|
The 'service' variable is required.
|
||||||
|
Usage: ansible-playbook restart-service.yml -l <target> -e service=<name>
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
-e service=nginx
|
||||||
|
-e service=unbound
|
||||||
|
-e service=promtail
|
||||||
|
when: service is not defined
|
||||||
|
run_once: true
|
||||||
|
|
||||||
|
- name: Restart {{ service }}
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: "{{ service }}"
|
||||||
|
state: restarted
|
||||||
|
register: restart_result
|
||||||
|
|
||||||
|
- name: Display result
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "Service {{ service }} restarted on {{ inventory_hostname }}"
|
||||||
21
common/ssh-audit.nix
Normal file
21
common/ssh-audit.nix
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
# SSH session command auditing
|
||||||
|
#
|
||||||
|
# Logs all commands executed by users who logged in interactively (SSH).
|
||||||
|
# System services and nix builds are excluded via auid filter.
|
||||||
|
#
|
||||||
|
# Logs are sent to journald and forwarded to Loki via promtail.
|
||||||
|
# Query with: {host="<hostname>"} |= "EXECVE"
|
||||||
|
{
|
||||||
|
# Enable Linux audit subsystem
|
||||||
|
security.audit.enable = true;
|
||||||
|
security.auditd.enable = true;
|
||||||
|
|
||||||
|
# Log execve syscalls only from interactive login sessions
|
||||||
|
# auid!=4294967295 means "audit login uid is set" (excludes system services, nix builds)
|
||||||
|
security.audit.rules = [
|
||||||
|
"-a exit,always -F arch=b64 -S execve -F auid!=4294967295"
|
||||||
|
];
|
||||||
|
|
||||||
|
# Forward audit logs to journald (so promtail ships them to Loki)
|
||||||
|
services.journald.audit = true;
|
||||||
|
}
|
||||||
217
docs/host-creation.md
Normal file
217
docs/host-creation.md
Normal file
@@ -0,0 +1,217 @@
|
|||||||
|
# Host Creation Pipeline
|
||||||
|
|
||||||
|
This document describes the process for creating new hosts in the homelab infrastructure.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
We use the `create-host` script to create new hosts, which generates default configurations from a template. We then use OpenTofu to deploy both secrets and VMs. The VMs boot using a template image (built from `hosts/template2`), which starts a bootstrap process. This bootstrap process applies the host's NixOS configuration and then reboots into the new config.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
All tools are available in the devshell: `create-host`, `bao` (OpenBao CLI), `tofu`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nix develop
|
||||||
|
```
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
Steps marked with **USER** must be performed by the user due to credential requirements.
|
||||||
|
|
||||||
|
1. **USER**: Run `create-host --hostname <name> --ip <ip/prefix>`
|
||||||
|
2. Edit the auto-generated configurations in `hosts/<hostname>/` to import whatever modules are needed for its purpose
|
||||||
|
3. Add any secrets needed to `terraform/vault/`
|
||||||
|
4. Edit the VM specs in `terraform/vms.tf` if needed. To deploy from a branch other than master, add `flake_branch = "<branch>"` to the VM definition
|
||||||
|
5. Push configuration to master (or the branch specified by `flake_branch`)
|
||||||
|
6. **USER**: Apply terraform:
|
||||||
|
```bash
|
||||||
|
nix develop -c tofu -chdir=terraform/vault apply
|
||||||
|
nix develop -c tofu -chdir=terraform apply
|
||||||
|
```
|
||||||
|
7. Once terraform completes, a VM boots in Proxmox using the template image
|
||||||
|
8. The VM runs the `nixos-bootstrap` service, which applies the host config and reboots
|
||||||
|
9. After reboot, the host should be operational
|
||||||
|
10. Trigger auto-upgrade on `ns1` and `ns2` to propagate DNS records for the new host
|
||||||
|
11. Trigger auto-upgrade on `monitoring01` to add the host to Prometheus scrape targets
|
||||||
|
|
||||||
|
## Tier Specification
|
||||||
|
|
||||||
|
New hosts should set `homelab.host.tier` in their configuration:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
homelab.host.tier = "test"; # or "prod"
|
||||||
|
```
|
||||||
|
|
||||||
|
- **test** - Test-tier hosts can receive remote deployments via the `homelab-deploy` MCP server and have different credential access. Use for staging/testing.
|
||||||
|
- **prod** - Production hosts. Deployments require direct access or the CLI with appropriate credentials.
|
||||||
|
|
||||||
|
## Observability
|
||||||
|
|
||||||
|
During the bootstrap process, status updates are sent to Loki. Query bootstrap logs with:
|
||||||
|
|
||||||
|
```
|
||||||
|
{job="bootstrap", hostname="<hostname>"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bootstrap Stages
|
||||||
|
|
||||||
|
The bootstrap process reports these stages via the `stage` label:
|
||||||
|
|
||||||
|
| Stage | Message | Meaning |
|
||||||
|
|-------|---------|---------|
|
||||||
|
| `starting` | Bootstrap starting for \<host\> (branch: \<branch\>) | Bootstrap service has started |
|
||||||
|
| `network_ok` | Network connectivity confirmed | Can reach git server |
|
||||||
|
| `vault_ok` | Vault credentials unwrapped and stored | AppRole credentials provisioned |
|
||||||
|
| `vault_skip` | No Vault token provided - skipping credential setup | No wrapped token was provided |
|
||||||
|
| `vault_warn` | Failed to unwrap Vault token - continuing without secrets | Token unwrap failed (expired/used) |
|
||||||
|
| `building` | Starting nixos-rebuild boot | NixOS build starting |
|
||||||
|
| `success` | Build successful - rebooting into new configuration | Build complete, rebooting |
|
||||||
|
| `failed` | nixos-rebuild failed - manual intervention required | Build failed |
|
||||||
|
|
||||||
|
### Useful Queries
|
||||||
|
|
||||||
|
```
|
||||||
|
# All bootstrap activity for a host
|
||||||
|
{job="bootstrap", hostname="myhost"}
|
||||||
|
|
||||||
|
# Track all failures
|
||||||
|
{job="bootstrap", stage="failed"}
|
||||||
|
|
||||||
|
# Monitor builds in progress
|
||||||
|
{job="bootstrap", stage=~"building|success"}
|
||||||
|
```
|
||||||
|
|
||||||
|
Once the VM reboots with its full configuration, it will start publishing metrics to Prometheus and logs to Loki via Promtail.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
1. Check bootstrap completed successfully:
|
||||||
|
```
|
||||||
|
{job="bootstrap", hostname="<hostname>", stage="success"}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify the host is up and reporting metrics:
|
||||||
|
```promql
|
||||||
|
up{instance=~"<hostname>.*"}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Verify the correct flake revision is deployed:
|
||||||
|
```promql
|
||||||
|
nixos_flake_info{instance=~"<hostname>.*"}
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Check logs are flowing:
|
||||||
|
```
|
||||||
|
{hostname="<hostname>"}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Confirm expected services are running and producing logs
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Bootstrap Failed
|
||||||
|
|
||||||
|
#### Common Issues
|
||||||
|
|
||||||
|
* VM has trouble running initial nixos-rebuild. Usually caused if it needs to compile packages from scratch if they are not available in our local nix-cache.
|
||||||
|
|
||||||
|
#### Troubleshooting
|
||||||
|
|
||||||
|
1. Check bootstrap logs in Loki - if they never progress past `building`, the rebuild likely consumed all resources:
|
||||||
|
```
|
||||||
|
{job="bootstrap", hostname="<hostname>"}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **USER**: SSH into the host and check the bootstrap service:
|
||||||
|
```bash
|
||||||
|
ssh root@<hostname>
|
||||||
|
journalctl -u nixos-bootstrap.service
|
||||||
|
```
|
||||||
|
|
||||||
|
3. If the build failed due to resource constraints, increase VM specs in `terraform/vms.tf` and redeploy, or manually run the rebuild:
|
||||||
|
```bash
|
||||||
|
nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git#<hostname>
|
||||||
|
```
|
||||||
|
|
||||||
|
4. If the host config doesn't exist in the flake, ensure step 5 was completed (config pushed to the correct branch).
|
||||||
|
|
||||||
|
### Vault Credentials Not Working
|
||||||
|
|
||||||
|
Usually caused by running the `create-host` script without proper credentials, or the wrapped token has expired/already been used.
|
||||||
|
|
||||||
|
#### Troubleshooting
|
||||||
|
|
||||||
|
1. Check if credentials exist on the host:
|
||||||
|
```bash
|
||||||
|
ssh root@<hostname>
|
||||||
|
ls -la /var/lib/vault/approle/
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check bootstrap logs for vault-related stages:
|
||||||
|
```
|
||||||
|
{job="bootstrap", hostname="<hostname>", stage=~"vault.*"}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **USER**: Regenerate and provision credentials manually:
|
||||||
|
```bash
|
||||||
|
nix develop -c ansible-playbook playbooks/provision-approle.yml -e hostname=<hostname>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Host Not Appearing in DNS
|
||||||
|
|
||||||
|
Usually caused by not having deployed the commit with the new host to ns1/ns2.
|
||||||
|
|
||||||
|
#### Troubleshooting
|
||||||
|
|
||||||
|
1. Verify the host config has a static IP configured in `systemd.network.networks`
|
||||||
|
|
||||||
|
2. Check that `homelab.dns.enable` is not set to `false`
|
||||||
|
|
||||||
|
3. **USER**: Trigger auto-upgrade on DNS servers:
|
||||||
|
```bash
|
||||||
|
ssh root@ns1 systemctl start nixos-upgrade.service
|
||||||
|
ssh root@ns2 systemctl start nixos-upgrade.service
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Verify DNS resolution after upgrade completes:
|
||||||
|
```bash
|
||||||
|
dig @ns1.home.2rjus.net <hostname>.home.2rjus.net
|
||||||
|
```
|
||||||
|
|
||||||
|
### Host Not Being Scraped by Prometheus
|
||||||
|
|
||||||
|
Usually caused by not having deployed the commit with the new host to the monitoring host.
|
||||||
|
|
||||||
|
#### Troubleshooting
|
||||||
|
|
||||||
|
1. Check that `homelab.monitoring.enable` is not set to `false`
|
||||||
|
|
||||||
|
2. **USER**: Trigger auto-upgrade on monitoring01:
|
||||||
|
```bash
|
||||||
|
ssh root@monitoring01 systemctl start nixos-upgrade.service
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Verify the target appears in Prometheus:
|
||||||
|
```promql
|
||||||
|
up{instance=~"<hostname>.*"}
|
||||||
|
```
|
||||||
|
|
||||||
|
4. If the target is down, check that node-exporter is running on the host:
|
||||||
|
```bash
|
||||||
|
ssh root@<hostname> systemctl status prometheus-node-exporter.service
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related Files
|
||||||
|
|
||||||
|
| Path | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `scripts/create-host/` | The `create-host` script that generates host configurations |
|
||||||
|
| `hosts/template2/` | Template VM configuration (base image for new VMs) |
|
||||||
|
| `hosts/template2/bootstrap.nix` | Bootstrap service that applies NixOS config on first boot |
|
||||||
|
| `terraform/vms.tf` | VM definitions (specs, IPs, branch overrides) |
|
||||||
|
| `terraform/cloud-init.tf` | Cloud-init configuration (passes hostname, branch, vault token) |
|
||||||
|
| `terraform/vault/approle.tf` | AppRole policies for each host |
|
||||||
|
| `terraform/vault/secrets.tf` | Secret definitions in Vault |
|
||||||
|
| `terraform/vault/hosts-generated.tf` | Auto-generated wrapped tokens for VM bootstrap |
|
||||||
|
| `playbooks/provision-approle.yml` | Ansible playbook for manual credential provisioning |
|
||||||
|
| `flake.nix` | Flake with all host configurations (add new hosts here) |
|
||||||
@@ -1,192 +0,0 @@
|
|||||||
# Authentication System Replacement Plan
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
Replace the current auth01 setup (LLDAP + Authelia) with a modern, unified authentication solution. The current setup is not in active use, making this a good time to evaluate alternatives.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
1. **Central user database** - Manage users across all homelab hosts from a single source
|
|
||||||
2. **Linux PAM/NSS integration** - Users can SSH into hosts using central credentials
|
|
||||||
3. **UID/GID consistency** - Proper POSIX attributes for NAS share permissions
|
|
||||||
4. **OIDC provider** - Single sign-on for homelab web services (Grafana, etc.)
|
|
||||||
|
|
||||||
## Options Evaluated
|
|
||||||
|
|
||||||
### OpenLDAP (raw)
|
|
||||||
|
|
||||||
- **NixOS Support:** Good (`services.openldap` with `declarativeContents`)
|
|
||||||
- **Pros:** Most widely supported, very flexible
|
|
||||||
- **Cons:** LDIF format is painful, schema management is complex, no built-in OIDC, requires SSSD on each client
|
|
||||||
- **Verdict:** Doesn't address LDAP complexity concerns
|
|
||||||
|
|
||||||
### LLDAP + Authelia (current)
|
|
||||||
|
|
||||||
- **NixOS Support:** Both have good modules
|
|
||||||
- **Pros:** Already configured, lightweight, nice web UIs
|
|
||||||
- **Cons:** Two services to manage, limited POSIX attribute support in LLDAP, requires SSSD on every client host
|
|
||||||
- **Verdict:** Workable but has friction for NAS/UID goals
|
|
||||||
|
|
||||||
### FreeIPA
|
|
||||||
|
|
||||||
- **NixOS Support:** None
|
|
||||||
- **Pros:** Full enterprise solution (LDAP + Kerberos + DNS + CA)
|
|
||||||
- **Cons:** Extremely heavy, wants to own DNS, designed for Red Hat ecosystems, massive overkill for homelab
|
|
||||||
- **Verdict:** Overkill, no NixOS support
|
|
||||||
|
|
||||||
### Keycloak
|
|
||||||
|
|
||||||
- **NixOS Support:** None
|
|
||||||
- **Pros:** Good OIDC/SAML, nice UI
|
|
||||||
- **Cons:** Primarily an identity broker not a user directory, poor POSIX support, heavy (Java)
|
|
||||||
- **Verdict:** Wrong tool for Linux user management
|
|
||||||
|
|
||||||
### Authentik
|
|
||||||
|
|
||||||
- **NixOS Support:** None (would need Docker)
|
|
||||||
- **Pros:** All-in-one with LDAP outpost and OIDC, modern UI
|
|
||||||
- **Cons:** Heavy stack (Python + PostgreSQL + Redis), LDAP is a separate component
|
|
||||||
- **Verdict:** Would work but requires Docker and is heavy
|
|
||||||
|
|
||||||
### Kanidm
|
|
||||||
|
|
||||||
- **NixOS Support:** Excellent - first-class module with PAM/NSS integration
|
|
||||||
- **Pros:**
|
|
||||||
- Native PAM/NSS module (no SSSD needed)
|
|
||||||
- Built-in OIDC provider
|
|
||||||
- Optional LDAP interface for legacy services
|
|
||||||
- Declarative provisioning via NixOS (users, groups, OAuth2 clients)
|
|
||||||
- Modern, written in Rust
|
|
||||||
- Single service handles everything
|
|
||||||
- **Cons:** Newer project, smaller community than LDAP
|
|
||||||
- **Verdict:** Best fit for requirements
|
|
||||||
|
|
||||||
### Pocket-ID
|
|
||||||
|
|
||||||
- **NixOS Support:** Unknown
|
|
||||||
- **Pros:** Very lightweight, passkey-first
|
|
||||||
- **Cons:** No LDAP, no PAM/NSS integration - purely OIDC for web apps
|
|
||||||
- **Verdict:** Doesn't solve Linux user management goal
|
|
||||||
|
|
||||||
## Recommendation: Kanidm
|
|
||||||
|
|
||||||
Kanidm is the recommended solution for the following reasons:
|
|
||||||
|
|
||||||
| Requirement | Kanidm Support |
|
|
||||||
|-------------|----------------|
|
|
||||||
| Central user database | Native |
|
|
||||||
| Linux PAM/NSS (host login) | Native NixOS module |
|
|
||||||
| UID/GID for NAS | POSIX attributes supported |
|
|
||||||
| OIDC for services | Built-in |
|
|
||||||
| Declarative config | Excellent NixOS provisioning |
|
|
||||||
| Simplicity | Modern API, LDAP optional |
|
|
||||||
| NixOS integration | First-class |
|
|
||||||
|
|
||||||
### Key NixOS Features
|
|
||||||
|
|
||||||
**Server configuration:**
|
|
||||||
```nix
|
|
||||||
services.kanidm.enableServer = true;
|
|
||||||
services.kanidm.serverSettings = {
|
|
||||||
domain = "home.2rjus.net";
|
|
||||||
origin = "https://auth.home.2rjus.net";
|
|
||||||
ldapbindaddress = "0.0.0.0:636"; # Optional LDAP interface
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
**Declarative user provisioning:**
|
|
||||||
```nix
|
|
||||||
services.kanidm.provision.enable = true;
|
|
||||||
services.kanidm.provision.persons.torjus = {
|
|
||||||
displayName = "Torjus";
|
|
||||||
groups = [ "admins" "nas-users" ];
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
**Declarative OAuth2 clients:**
|
|
||||||
```nix
|
|
||||||
services.kanidm.provision.systems.oauth2.grafana = {
|
|
||||||
displayName = "Grafana";
|
|
||||||
originUrl = "https://grafana.home.2rjus.net/login/generic_oauth";
|
|
||||||
originLanding = "https://grafana.home.2rjus.net";
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
**Client host configuration (add to system/):**
|
|
||||||
```nix
|
|
||||||
services.kanidm.enableClient = true;
|
|
||||||
services.kanidm.enablePam = true;
|
|
||||||
services.kanidm.clientSettings.uri = "https://auth.home.2rjus.net";
|
|
||||||
```
|
|
||||||
|
|
||||||
## NAS Integration
|
|
||||||
|
|
||||||
### Current: TrueNAS CORE (FreeBSD)
|
|
||||||
|
|
||||||
TrueNAS CORE has a built-in LDAP client. Kanidm's read-only LDAP interface will work for NFS share permissions:
|
|
||||||
|
|
||||||
- **NFS shares**: Only need consistent UID/GID mapping - Kanidm's LDAP provides this
|
|
||||||
- **No SMB requirement**: SMB would need Samba schema attributes (deprecated in TrueNAS 13.0+), but we're NFS-only
|
|
||||||
|
|
||||||
Configuration approach:
|
|
||||||
1. Enable Kanidm's LDAP interface (`ldapbindaddress = "0.0.0.0:636"`)
|
|
||||||
2. Import internal CA certificate into TrueNAS
|
|
||||||
3. Configure TrueNAS LDAP client with Kanidm's Base DN and bind credentials
|
|
||||||
4. Users/groups appear in TrueNAS permission dropdowns
|
|
||||||
|
|
||||||
Note: Kanidm's LDAP is read-only and uses LDAPS only (no StartTLS). This is fine for our use case.
|
|
||||||
|
|
||||||
### Future: NixOS NAS
|
|
||||||
|
|
||||||
When the NAS is migrated to NixOS, it becomes a first-class citizen:
|
|
||||||
|
|
||||||
- Native Kanidm PAM/NSS integration (same as other hosts)
|
|
||||||
- No LDAP compatibility layer needed
|
|
||||||
- Full integration with the rest of the homelab
|
|
||||||
|
|
||||||
This future migration path is a strong argument for Kanidm over LDAP-only solutions.
|
|
||||||
|
|
||||||
## Implementation Steps
|
|
||||||
|
|
||||||
1. **Create Kanidm service module** in `services/kanidm/`
|
|
||||||
- Server configuration
|
|
||||||
- TLS via internal ACME
|
|
||||||
- Vault secrets for admin passwords
|
|
||||||
|
|
||||||
2. **Configure declarative provisioning**
|
|
||||||
- Define initial users and groups
|
|
||||||
- Set up POSIX attributes (UID/GID ranges)
|
|
||||||
|
|
||||||
3. **Add OIDC clients** for homelab services
|
|
||||||
- Grafana
|
|
||||||
- Other services as needed
|
|
||||||
|
|
||||||
4. **Create client module** in `system/` for PAM/NSS
|
|
||||||
- Enable on all hosts that need central auth
|
|
||||||
- Configure trusted CA
|
|
||||||
|
|
||||||
5. **Test NAS integration**
|
|
||||||
- Configure TrueNAS LDAP client to connect to Kanidm
|
|
||||||
- Verify UID/GID mapping works with NFS shares
|
|
||||||
|
|
||||||
6. **Migrate auth01**
|
|
||||||
- Remove LLDAP and Authelia services
|
|
||||||
- Deploy Kanidm
|
|
||||||
- Update DNS CNAMEs if needed
|
|
||||||
|
|
||||||
7. **Documentation**
|
|
||||||
- User management procedures
|
|
||||||
- Adding new OAuth2 clients
|
|
||||||
- Troubleshooting PAM/NSS issues
|
|
||||||
|
|
||||||
## Open Questions
|
|
||||||
|
|
||||||
- What UID/GID range should be reserved for Kanidm-managed users?
|
|
||||||
- Which hosts should have PAM/NSS enabled initially?
|
|
||||||
- What OAuth2 clients are needed at launch?
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- [Kanidm Documentation](https://kanidm.github.io/kanidm/stable/)
|
|
||||||
- [NixOS Kanidm Module](https://search.nixos.org/options?query=services.kanidm)
|
|
||||||
- [Kanidm PAM/NSS Integration](https://kanidm.github.io/kanidm/stable/pam_and_nsswitch.html)
|
|
||||||
183
docs/plans/completed/auth-system-replacement.md
Normal file
183
docs/plans/completed/auth-system-replacement.md
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
# Authentication System Replacement Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Deploy a modern, unified authentication solution for the homelab. Provides central user management, SSO for web services, and consistent UID/GID mapping for NAS permissions.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
1. **Central user database** - Manage users across all homelab hosts from a single source
|
||||||
|
2. **Linux PAM/NSS integration** - Users can SSH into hosts using central credentials
|
||||||
|
3. **UID/GID consistency** - Proper POSIX attributes for NAS share permissions
|
||||||
|
4. **OIDC provider** - Single sign-on for homelab web services (Grafana, etc.)
|
||||||
|
|
||||||
|
## Solution: Kanidm
|
||||||
|
|
||||||
|
Kanidm was chosen for the following reasons:
|
||||||
|
|
||||||
|
| Requirement | Kanidm Support |
|
||||||
|
|-------------|----------------|
|
||||||
|
| Central user database | Native |
|
||||||
|
| Linux PAM/NSS (host login) | Native NixOS module |
|
||||||
|
| UID/GID for NAS | POSIX attributes supported |
|
||||||
|
| OIDC for services | Built-in |
|
||||||
|
| Declarative config | Excellent NixOS provisioning |
|
||||||
|
| Simplicity | Modern API, LDAP optional |
|
||||||
|
| NixOS integration | First-class |
|
||||||
|
|
||||||
|
### Configuration Files
|
||||||
|
|
||||||
|
- **Host configuration:** `hosts/kanidm01/`
|
||||||
|
- **Service module:** `services/kanidm/default.nix`
|
||||||
|
|
||||||
|
## NAS Integration
|
||||||
|
|
||||||
|
### Current: TrueNAS CORE (FreeBSD)
|
||||||
|
|
||||||
|
TrueNAS CORE has a built-in LDAP client. Kanidm's read-only LDAP interface will work for NFS share permissions:
|
||||||
|
|
||||||
|
- **NFS shares**: Only need consistent UID/GID mapping - Kanidm's LDAP provides this
|
||||||
|
- **No SMB requirement**: SMB would need Samba schema attributes (deprecated in TrueNAS 13.0+), but we're NFS-only
|
||||||
|
|
||||||
|
Configuration approach:
|
||||||
|
1. Enable Kanidm's LDAP interface (`ldapbindaddress = "0.0.0.0:636"`)
|
||||||
|
2. Import internal CA certificate into TrueNAS
|
||||||
|
3. Configure TrueNAS LDAP client with Kanidm's Base DN and bind credentials
|
||||||
|
4. Users/groups appear in TrueNAS permission dropdowns
|
||||||
|
|
||||||
|
Note: Kanidm's LDAP is read-only and uses LDAPS only (no StartTLS). This is fine for our use case.
|
||||||
|
|
||||||
|
### Future: NixOS NAS
|
||||||
|
|
||||||
|
When the NAS is migrated to NixOS, it becomes a first-class citizen:
|
||||||
|
|
||||||
|
- Native Kanidm PAM/NSS integration (same as other hosts)
|
||||||
|
- No LDAP compatibility layer needed
|
||||||
|
- Full integration with the rest of the homelab
|
||||||
|
|
||||||
|
This future migration path is a strong argument for Kanidm over LDAP-only solutions.
|
||||||
|
|
||||||
|
## Implementation Steps
|
||||||
|
|
||||||
|
1. **Create kanidm01 host and service module** ✅
|
||||||
|
- Host: `kanidm01.home.2rjus.net` (10.69.13.23, test tier)
|
||||||
|
- Service module: `services/kanidm/`
|
||||||
|
- TLS via internal ACME (`auth.home.2rjus.net`)
|
||||||
|
- Vault integration for idm_admin password
|
||||||
|
- LDAPS on port 636
|
||||||
|
|
||||||
|
2. **Configure provisioning** ✅
|
||||||
|
- Groups provisioned declaratively: `admins`, `users`, `ssh-users`
|
||||||
|
- Users managed imperatively via CLI (allows setting POSIX passwords in one step)
|
||||||
|
- POSIX attributes enabled (UID/GID range 65,536-69,999)
|
||||||
|
|
||||||
|
3. **Test NAS integration** (in progress)
|
||||||
|
- ✅ LDAP interface verified working
|
||||||
|
- Configure TrueNAS LDAP client to connect to Kanidm
|
||||||
|
- Verify UID/GID mapping works with NFS shares
|
||||||
|
|
||||||
|
4. **Add OIDC clients** for homelab services
|
||||||
|
- Grafana
|
||||||
|
- Other services as needed
|
||||||
|
|
||||||
|
5. **Create client module** in `system/` for PAM/NSS ✅
|
||||||
|
- Module: `system/kanidm-client.nix`
|
||||||
|
- `homelab.kanidm.enable = true` enables PAM/NSS
|
||||||
|
- Short usernames (not SPN format)
|
||||||
|
- Home directory symlinks via `home_alias`
|
||||||
|
- Enabled on test tier: testvm01, testvm02, testvm03
|
||||||
|
|
||||||
|
6. **Documentation** ✅
|
||||||
|
- `docs/user-management.md` - CLI workflows, troubleshooting
|
||||||
|
- User/group creation procedures verified working
|
||||||
|
|
||||||
|
## Progress
|
||||||
|
|
||||||
|
### Completed (2026-02-08)
|
||||||
|
|
||||||
|
**Kanidm server deployed on kanidm01 (test tier):**
|
||||||
|
- Host: `kanidm01.home.2rjus.net` (10.69.13.23)
|
||||||
|
- WebUI: `https://auth.home.2rjus.net`
|
||||||
|
- LDAPS: port 636
|
||||||
|
- Valid certificate from internal CA
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
- Kanidm 1.8 with secret provisioning support
|
||||||
|
- Daily backups at 22:00 (7 versions retained)
|
||||||
|
- Vault integration for idm_admin password
|
||||||
|
- Prometheus monitoring scrape target configured
|
||||||
|
|
||||||
|
**Provisioned entities:**
|
||||||
|
- Groups: `admins`, `users`, `ssh-users` (declarative)
|
||||||
|
- Users managed via CLI (imperative)
|
||||||
|
|
||||||
|
**Verified working:**
|
||||||
|
- WebUI login with idm_admin
|
||||||
|
- LDAP bind and search with POSIX-enabled user
|
||||||
|
- LDAPS with valid internal CA certificate
|
||||||
|
|
||||||
|
### Completed (2026-02-08) - PAM/NSS Client
|
||||||
|
|
||||||
|
**Client module deployed (`system/kanidm-client.nix`):**
|
||||||
|
- `homelab.kanidm.enable = true` enables PAM/NSS integration
|
||||||
|
- Connects to auth.home.2rjus.net
|
||||||
|
- Short usernames (`torjus` instead of `torjus@home.2rjus.net`)
|
||||||
|
- Home directory symlinks (`/home/torjus` → UUID-based dir)
|
||||||
|
- Login restricted to `ssh-users` group
|
||||||
|
|
||||||
|
**Enabled on test tier:**
|
||||||
|
- testvm01, testvm02, testvm03
|
||||||
|
|
||||||
|
**Verified working:**
|
||||||
|
- User/group resolution via `getent`
|
||||||
|
- SSH login with Kanidm unix passwords
|
||||||
|
- Home directory creation with symlinks
|
||||||
|
- Imperative user/group creation via CLI
|
||||||
|
|
||||||
|
**Documentation:**
|
||||||
|
- `docs/user-management.md` with full CLI workflows
|
||||||
|
- Password requirements (min 10 chars)
|
||||||
|
- Troubleshooting guide (nscd, cache invalidation)
|
||||||
|
|
||||||
|
### UID/GID Range (Resolved)
|
||||||
|
|
||||||
|
**Range: 65,536 - 69,999** (manually allocated)
|
||||||
|
|
||||||
|
- Users: 65,536 - 67,999 (up to ~2500 users)
|
||||||
|
- Groups: 68,000 - 69,999 (up to ~2000 groups)
|
||||||
|
|
||||||
|
Rationale:
|
||||||
|
- Starts at Kanidm's recommended minimum (65,536)
|
||||||
|
- Well above NixOS system users (typically <1000)
|
||||||
|
- Avoids Podman/container issues with very high GIDs
|
||||||
|
|
||||||
|
### Completed (2026-02-08) - OAuth2/OIDC for Grafana
|
||||||
|
|
||||||
|
**OAuth2 client deployed for Grafana on monitoring02:**
|
||||||
|
- Client ID: `grafana`
|
||||||
|
- Redirect URL: `https://grafana-test.home.2rjus.net/login/generic_oauth`
|
||||||
|
- Scope maps: `openid`, `profile`, `email`, `groups` for `users` group
|
||||||
|
- Role mapping: `admins` group → Grafana Admin, others → Viewer
|
||||||
|
|
||||||
|
**Configuration locations:**
|
||||||
|
- Kanidm OAuth2 client: `services/kanidm/default.nix`
|
||||||
|
- Grafana OIDC config: `services/grafana/default.nix`
|
||||||
|
- Vault secret: `services/grafana/oauth2-client-secret`
|
||||||
|
|
||||||
|
**Key findings:**
|
||||||
|
- PKCE is required by Kanidm - enable `use_pkce = true` in Grafana
|
||||||
|
- Must set `email_attribute_path`, `login_attribute_path`, `name_attribute_path` to extract from userinfo
|
||||||
|
- Users need: primary credential (password + TOTP for MFA), membership in `users` group, email address set
|
||||||
|
- Unix password is separate from primary credential (web login requires primary credential)
|
||||||
|
|
||||||
|
### Next Steps
|
||||||
|
|
||||||
|
1. Enable PAM/NSS on production hosts (after test tier validation)
|
||||||
|
2. Configure TrueNAS LDAP client for NAS integration testing
|
||||||
|
3. Add OAuth2 clients for other services as needed
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Kanidm Documentation](https://kanidm.github.io/kanidm/stable/)
|
||||||
|
- [NixOS Kanidm Module](https://search.nixos.org/options?query=services.kanidm)
|
||||||
|
- [Kanidm PAM/NSS Integration](https://kanidm.github.io/kanidm/stable/pam_and_nsswitch.html)
|
||||||
35
docs/plans/completed/bootstrap-cache.md
Normal file
35
docs/plans/completed/bootstrap-cache.md
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
# Plan: Configure Template2 to Use Nix Cache
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
New VMs bootstrapped from template2 don't use our local nix cache (nix-cache.home.2rjus.net) during the initial `nixos-rebuild boot`. This means the first build downloads everything from cache.nixos.org, which is slower and uses more bandwidth.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
Update the template2 base image to include the nix cache configuration, so new VMs immediately benefit from cached builds during bootstrap.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
1. Add nix cache configuration to `hosts/template2/configuration.nix`:
|
||||||
|
```nix
|
||||||
|
nix.settings = {
|
||||||
|
substituters = [ "https://nix-cache.home.2rjus.net" "https://cache.nixos.org" ];
|
||||||
|
trusted-public-keys = [
|
||||||
|
"nix-cache.home.2rjus.net:..." # Add the cache's public key
|
||||||
|
"cache.nixos.org-1:..."
|
||||||
|
];
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Rebuild and redeploy the Proxmox template:
|
||||||
|
```bash
|
||||||
|
nix develop -c ansible-playbook -i playbooks/inventory.ini playbooks/build-and-deploy-template.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Update `default_template_name` in `terraform/variables.tf` if the template name changed
|
||||||
|
|
||||||
|
## Benefits
|
||||||
|
|
||||||
|
- Faster VM bootstrap times
|
||||||
|
- Reduced bandwidth to external cache
|
||||||
|
- Most derivations will already be cached from other hosts
|
||||||
72
docs/plans/completed/cert-monitoring.md
Normal file
72
docs/plans/completed/cert-monitoring.md
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
# Certificate Monitoring Plan
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
This document describes the removal of labmon certificate monitoring and outlines future needs for certificate monitoring in the homelab.
|
||||||
|
|
||||||
|
## What Was Removed
|
||||||
|
|
||||||
|
### labmon Service
|
||||||
|
|
||||||
|
The `labmon` service was a custom Go application that provided:
|
||||||
|
|
||||||
|
1. **StepMonitor**: Monitoring for step-ca (Smallstep CA) certificate provisioning and health
|
||||||
|
2. **TLSConnectionMonitor**: Periodic TLS connection checks to verify certificate validity and expiration
|
||||||
|
|
||||||
|
The service exposed Prometheus metrics at `:9969` including:
|
||||||
|
- `labmon_tlsconmon_certificate_seconds_left` - Time until certificate expiration
|
||||||
|
- `labmon_tlsconmon_certificate_check_error` - Whether the TLS check failed
|
||||||
|
- `labmon_stepmon_certificate_seconds_left` - Step-CA internal certificate expiration
|
||||||
|
|
||||||
|
### Affected Files
|
||||||
|
|
||||||
|
- `hosts/monitoring01/configuration.nix` - Removed labmon configuration block
|
||||||
|
- `services/monitoring/prometheus.nix` - Removed labmon scrape target
|
||||||
|
- `services/monitoring/rules.yml` - Removed `certificate_rules` alert group
|
||||||
|
- `services/monitoring/alloy.nix` - Deleted (was only used for labmon profiling)
|
||||||
|
- `services/monitoring/default.nix` - Removed alloy.nix import
|
||||||
|
|
||||||
|
### Removed Alerts
|
||||||
|
|
||||||
|
- `certificate_expiring_soon` - Warned when any monitored TLS cert had < 24h validity
|
||||||
|
- `step_ca_serving_cert_expiring` - Critical alert for step-ca's own serving certificate
|
||||||
|
- `certificate_check_error` - Warned when TLS connection check failed
|
||||||
|
- `step_ca_certificate_expiring` - Critical alert for step-ca issued certificates
|
||||||
|
|
||||||
|
## Why It Was Removed
|
||||||
|
|
||||||
|
1. **step-ca decommissioned**: The primary monitoring target (step-ca) is no longer in use
|
||||||
|
2. **Outdated codebase**: labmon was a custom tool that required maintenance
|
||||||
|
3. **Limited value**: With ACME auto-renewal, certificates should renew automatically
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
ACME certificates are now issued by OpenBao PKI at `vault.home.2rjus.net:8200`. The ACME protocol handles automatic renewal, and certificates are typically renewed well before expiration.
|
||||||
|
|
||||||
|
## Future Needs
|
||||||
|
|
||||||
|
While ACME handles renewal automatically, we should consider monitoring for:
|
||||||
|
|
||||||
|
1. **ACME renewal failures**: Alert when a certificate fails to renew
|
||||||
|
- Could monitor ACME client logs (via Loki queries)
|
||||||
|
- Could check certificate file modification times
|
||||||
|
|
||||||
|
2. **Certificate expiration as backup**: Even with auto-renewal, a last-resort alert for certificates approaching expiration would catch renewal failures
|
||||||
|
|
||||||
|
3. **Certificate transparency**: Monitor for unexpected certificate issuance
|
||||||
|
|
||||||
|
### Potential Solutions
|
||||||
|
|
||||||
|
1. **Prometheus blackbox_exporter**: Can probe TLS endpoints and export certificate expiration metrics
|
||||||
|
- `probe_ssl_earliest_cert_expiry` metric
|
||||||
|
- Already a standard tool, well-maintained
|
||||||
|
|
||||||
|
2. **Custom Loki alerting**: Query ACME service logs for renewal failures
|
||||||
|
- Works with existing infrastructure
|
||||||
|
- No additional services needed
|
||||||
|
|
||||||
|
3. **Node-exporter textfile collector**: Script that checks local certificate files and writes expiration metrics
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
**Not yet implemented.** This document serves as a placeholder for future work on certificate monitoring.
|
||||||
46
docs/plans/completed/garage-s3-storage.md
Normal file
46
docs/plans/completed/garage-s3-storage.md
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
# Garage S3 Storage Server
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Deploy a Garage instance for self-hosted S3-compatible object storage.
|
||||||
|
|
||||||
|
## Garage Basics
|
||||||
|
|
||||||
|
- S3-compatible distributed object storage designed for self-hosting
|
||||||
|
- Supports per-key, per-bucket permissions (read/write/owner)
|
||||||
|
- Keys without explicit grants have no access
|
||||||
|
|
||||||
|
## NixOS Module
|
||||||
|
|
||||||
|
Available as `services.garage` with these key options:
|
||||||
|
|
||||||
|
- `services.garage.enable` - Enable the service
|
||||||
|
- `services.garage.package` - Must be set explicitly
|
||||||
|
- `services.garage.settings` - Freeform TOML config (replication mode, ports, RPC, etc.)
|
||||||
|
- `services.garage.settings.metadata_dir` - Metadata storage (SSD recommended)
|
||||||
|
- `services.garage.settings.data_dir` - Data block storage (supports multiple dirs since v0.9)
|
||||||
|
- `services.garage.environmentFile` - For secrets like `GARAGE_RPC_SECRET`
|
||||||
|
- `services.garage.logLevel` - error/warn/info/debug/trace
|
||||||
|
|
||||||
|
The NixOS module only manages the server daemon. Buckets and keys are managed externally.
|
||||||
|
|
||||||
|
## Bucket/Key Management
|
||||||
|
|
||||||
|
No declarative NixOS options for buckets or keys. Two options:
|
||||||
|
|
||||||
|
1. **Terraform provider** - `jkossis/terraform-provider-garage` manages buckets, keys, and permissions via the Garage Admin API v2. Could live in `terraform/garage/` similar to `terraform/vault/`.
|
||||||
|
2. **CLI** - `garage key create`, `garage bucket create`, `garage bucket allow`
|
||||||
|
|
||||||
|
## Integration Ideas
|
||||||
|
|
||||||
|
- Store Garage API keys in Vault, fetch via `vault.secrets` on consuming hosts
|
||||||
|
- Terraform manages both Vault secrets and Garage buckets/keys
|
||||||
|
- Enable admin API with token for Terraform provider access
|
||||||
|
- Add Prometheus metrics scraping (Garage exposes metrics endpoint)
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- Single-node or multi-node replication?
|
||||||
|
- Which host to deploy on?
|
||||||
|
- What to store? (backups, media, app data)
|
||||||
|
- Expose via HTTP proxy or direct S3 API only?
|
||||||
135
docs/plans/completed/monitoring02-reboot-alert-investigation.md
Normal file
135
docs/plans/completed/monitoring02-reboot-alert-investigation.md
Normal file
@@ -0,0 +1,135 @@
|
|||||||
|
# monitoring02 Reboot Alert Investigation
|
||||||
|
|
||||||
|
**Date:** 2026-02-10
|
||||||
|
**Status:** Completed - False positive identified
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
A `host_reboot` alert fired for monitoring02 at 16:27:36 UTC. Investigation determined this was a **false positive** caused by NTP clock adjustments, not an actual reboot.
|
||||||
|
|
||||||
|
## Alert Details
|
||||||
|
|
||||||
|
- **Alert:** `host_reboot`
|
||||||
|
- **Rule:** `changes(node_boot_time_seconds[10m]) > 0`
|
||||||
|
- **Host:** monitoring02
|
||||||
|
- **Time:** 2026-02-10T16:27:36Z
|
||||||
|
|
||||||
|
## Investigation Findings
|
||||||
|
|
||||||
|
### Evidence Against Actual Reboot
|
||||||
|
|
||||||
|
1. **Uptime:** System had been up for ~40 hours (143,751 seconds) at time of alert
|
||||||
|
2. **Consistent BOOT_ID:** All logs showed the same systemd BOOT_ID (`fd26e7f3d86f4cd688d1b1d7af62f2ad`) from Feb 9 through the alert time
|
||||||
|
3. **No log gaps:** Logs were continuous - no shutdown/restart cycle visible
|
||||||
|
4. **Prometheus metrics:** `node_boot_time_seconds` showed a 1-second fluctuation, then returned to normal
|
||||||
|
|
||||||
|
### Root Cause: NTP Clock Adjustment
|
||||||
|
|
||||||
|
The `node_boot_time_seconds` metric fluctuated by 1 second due to how Linux calculates boot time:
|
||||||
|
|
||||||
|
```
|
||||||
|
btime = current_wall_clock_time - monotonic_uptime
|
||||||
|
```
|
||||||
|
|
||||||
|
When NTP adjusts the wall clock, `btime` shifts by the same amount. The `node_timex_*` metrics confirmed this:
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|--------|-------|
|
||||||
|
| `node_timex_maxerror_seconds` (max in 3h) | 1.02 seconds |
|
||||||
|
| `node_timex_maxerror_seconds` (max in 24h) | 2.05 seconds |
|
||||||
|
| `node_timex_sync_status` | 1 (synced) |
|
||||||
|
| Current `node_timex_offset_seconds` | ~9ms (normal) |
|
||||||
|
|
||||||
|
The kernel's estimated maximum clock error spiked to over 1 second, causing the boot time calculation to drift momentarily.
|
||||||
|
|
||||||
|
Additionally, `systemd-resolved` logged "Clock change detected. Flushing caches." at 16:26:53Z, corroborating the NTP adjustment.
|
||||||
|
|
||||||
|
## Current Time Sync Configuration
|
||||||
|
|
||||||
|
### NixOS Guests
|
||||||
|
- **NTP client:** systemd-timesyncd (NixOS default)
|
||||||
|
- **No explicit configuration** in the codebase
|
||||||
|
- Uses default NixOS NTP server pool
|
||||||
|
|
||||||
|
### Proxmox VMs
|
||||||
|
- **Clocksource:** `kvm-clock` (optimal for KVM VMs)
|
||||||
|
- **QEMU guest agent:** Enabled
|
||||||
|
- **No additional QEMU timing args** configured
|
||||||
|
|
||||||
|
## Potential Improvements
|
||||||
|
|
||||||
|
### 1. Improve Alert Rule (Recommended)
|
||||||
|
|
||||||
|
Add tolerance to filter out small NTP adjustments:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Current rule (triggers on any change)
|
||||||
|
expr: changes(node_boot_time_seconds[10m]) > 0
|
||||||
|
|
||||||
|
# Improved rule (requires >60 second shift)
|
||||||
|
expr: changes(node_boot_time_seconds[10m]) > 0 and abs(delta(node_boot_time_seconds[10m])) > 60
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Switch to Chrony (Optional)
|
||||||
|
|
||||||
|
Chrony handles time adjustments more gracefully than systemd-timesyncd:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
# In common/vm/qemu-guest.nix
|
||||||
|
{
|
||||||
|
services.qemuGuest.enable = true;
|
||||||
|
|
||||||
|
services.timesyncd.enable = false;
|
||||||
|
services.chrony = {
|
||||||
|
enable = true;
|
||||||
|
extraConfig = ''
|
||||||
|
makestep 1 3
|
||||||
|
rtcsync
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Add QEMU Timing Args (Optional)
|
||||||
|
|
||||||
|
In `terraform/vms.tf`:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
args = "-global kvm-pit.lost_tick_policy=delay -rtc driftfix=slew"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Local NTP Server (Optional)
|
||||||
|
|
||||||
|
Running a local NTP server (e.g., on ns1/ns2) would reduce latency and improve sync stability across all hosts.
|
||||||
|
|
||||||
|
## Monitoring NTP Health
|
||||||
|
|
||||||
|
The `node_timex_*` metrics from node_exporter provide visibility into NTP health:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
# Clock offset from reference
|
||||||
|
node_timex_offset_seconds
|
||||||
|
|
||||||
|
# Sync status (1 = synced)
|
||||||
|
node_timex_sync_status
|
||||||
|
|
||||||
|
# Maximum estimated error - useful for alerting
|
||||||
|
node_timex_maxerror_seconds
|
||||||
|
```
|
||||||
|
|
||||||
|
A potential alert for NTP issues:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- alert: ntp_clock_drift
|
||||||
|
expr: node_timex_maxerror_seconds > 1
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "High clock drift on {{ $labels.hostname }}"
|
||||||
|
description: "NTP max error is {{ $value }}s on {{ $labels.hostname }}"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
No action required for the alert itself - the system was healthy. Consider implementing the improved alert rule to prevent future false positives from NTP adjustments.
|
||||||
156
docs/plans/completed/nix-cache-reprovision.md
Normal file
156
docs/plans/completed/nix-cache-reprovision.md
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
# Nix Cache Host Reprovision
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Reprovision `nix-cache01` using the OpenTofu workflow, and improve the build/cache system with:
|
||||||
|
1. NATS-based remote build triggering (replacing the current bash script)
|
||||||
|
2. Safer flake update workflow that validates builds before pushing to master
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
**Phase 1: New Build Host** - COMPLETE
|
||||||
|
**Phase 2: NATS Build Triggering** - COMPLETE
|
||||||
|
**Phase 3: Safe Flake Update Workflow** - NOT STARTED
|
||||||
|
**Phase 4: Complete Migration** - COMPLETE
|
||||||
|
**Phase 5: Scheduled Builds** - COMPLETE
|
||||||
|
|
||||||
|
## Completed Work
|
||||||
|
|
||||||
|
### New Build Host (nix-cache02)
|
||||||
|
|
||||||
|
Instead of reprovisioning nix-cache01 in-place, we created a new host `nix-cache02` at 10.69.13.25:
|
||||||
|
|
||||||
|
- **Specs**: 8 CPU cores, 16GB RAM (temporarily, will increase to 24GB after nix-cache01 decommissioned), 200GB disk
|
||||||
|
- **Provisioned via OpenTofu** with automatic Vault credential bootstrapping
|
||||||
|
- **Builder service** configured with two repos:
|
||||||
|
- `nixos-servers` → `git+https://git.t-juice.club/torjus/nixos-servers.git`
|
||||||
|
- `nixos` (gunter) → `git+https://git.t-juice.club/torjus/nixos.git`
|
||||||
|
|
||||||
|
### NATS-Based Build Triggering
|
||||||
|
|
||||||
|
The `homelab-deploy` tool was extended with a builder mode:
|
||||||
|
|
||||||
|
**NATS Subjects:**
|
||||||
|
- `build.<repo>.<target>` - e.g., `build.nixos-servers.all` or `build.nixos-servers.ns1`
|
||||||
|
|
||||||
|
**NATS Permissions (in DEPLOY account):**
|
||||||
|
| User | Publish | Subscribe |
|
||||||
|
|------|---------|-----------|
|
||||||
|
| Builder | `build.responses.>` | `build.>` |
|
||||||
|
| Test deployer | `deploy.test.>`, `deploy.discover`, `build.>` | `deploy.responses.>`, `deploy.discover`, `build.responses.>` |
|
||||||
|
| Admin deployer | `deploy.>`, `build.>` | `deploy.>`, `build.responses.>` |
|
||||||
|
|
||||||
|
**Vault Secrets:**
|
||||||
|
- `shared/homelab-deploy/builder-nkey` - NKey seed for builder authentication
|
||||||
|
|
||||||
|
**NixOS Configuration:**
|
||||||
|
- `hosts/nix-cache02/builder.nix` - Builder service configuration
|
||||||
|
- `services/nats/default.nix` - Updated with builder NATS user
|
||||||
|
|
||||||
|
**MCP Integration:**
|
||||||
|
- `.mcp.json` updated with `--enable-builds` flag
|
||||||
|
- Build tool available via MCP for Claude Code
|
||||||
|
|
||||||
|
**Tested:**
|
||||||
|
- Single host build: `build nixos-servers testvm01` (~30s)
|
||||||
|
- All hosts build: `build nixos-servers all` (16 hosts in ~226s)
|
||||||
|
|
||||||
|
### Harmonia Binary Cache
|
||||||
|
|
||||||
|
- Parameterized `services/nix-cache/harmonia.nix` to use hostname-based Vault paths
|
||||||
|
- Parameterized `services/nix-cache/proxy.nix` for hostname-based domain
|
||||||
|
- New signing key: `nix-cache02.home.2rjus.net-1`
|
||||||
|
- Vault secret: `hosts/nix-cache02/cache-secret`
|
||||||
|
- Removed unused Gitea Actions runner from nix-cache01
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
### nix-cache02 (Active)
|
||||||
|
- Running at 10.69.13.25
|
||||||
|
- Serving `https://nix-cache.home.2rjus.net` (canonical URL)
|
||||||
|
- Builder service active, responding to NATS build requests
|
||||||
|
- Metrics exposed on port 9973 (`homelab-deploy-builder` job)
|
||||||
|
- Harmonia binary cache server running
|
||||||
|
- Signing key: `nix-cache02.home.2rjus.net-1`
|
||||||
|
- Prod tier with `build-host` role
|
||||||
|
|
||||||
|
### nix-cache01 (Decommissioned)
|
||||||
|
- VM deleted from Proxmox
|
||||||
|
- Host configuration removed from repo
|
||||||
|
- Vault AppRole and secrets removed
|
||||||
|
- Old signing key removed from trusted-public-keys
|
||||||
|
|
||||||
|
## Remaining Work
|
||||||
|
|
||||||
|
### Phase 3: Safe Flake Update Workflow
|
||||||
|
|
||||||
|
1. Create `.github/workflows/flake-update-safe.yaml`
|
||||||
|
2. Disable or remove old `flake-update.yaml`
|
||||||
|
3. Test manually with `workflow_dispatch`
|
||||||
|
4. Monitor first automated run
|
||||||
|
|
||||||
|
### Phase 4: Complete Migration ✅
|
||||||
|
|
||||||
|
1. ~~**Add Harmonia to nix-cache02**~~ ✅ Done - new signing key, parameterized service
|
||||||
|
2. ~~**Add trusted public key to all hosts**~~ ✅ Done - `system/nix.nix` updated
|
||||||
|
3. ~~**Test cache from other hosts**~~ ✅ Done - verified from testvm01
|
||||||
|
4. ~~**Update proxy and DNS**~~ ✅ Done - `nix-cache.home.2rjus.net` CNAME now points to nix-cache02
|
||||||
|
5. ~~**Deploy to all hosts**~~ ✅ Done - all hosts have new trusted key
|
||||||
|
6. ~~**Decommission nix-cache01**~~ ✅ Done - 2026-02-10:
|
||||||
|
- Removed `hosts/nix-cache01/` directory
|
||||||
|
- Removed `services/nix-cache/build-flakes.{nix,sh}`
|
||||||
|
- Removed Vault AppRole and secrets
|
||||||
|
- Removed old signing key from `system/nix.nix`
|
||||||
|
- Removed from `flake.nix`
|
||||||
|
- Deleted VM from Proxmox
|
||||||
|
|
||||||
|
### Phase 5: Scheduled Builds ✅
|
||||||
|
|
||||||
|
Implemented a systemd timer on nix-cache02 that triggers builds every 2 hours:
|
||||||
|
|
||||||
|
- **Timer**: `scheduled-build.timer` runs every 2 hours with 5m random jitter
|
||||||
|
- **Service**: `scheduled-build.service` calls `homelab-deploy build` for both repos
|
||||||
|
- **Authentication**: Dedicated scheduler NKey stored in Vault
|
||||||
|
- **NATS user**: Added to DEPLOY account with publish `build.>` and subscribe `build.responses.>`
|
||||||
|
|
||||||
|
Files:
|
||||||
|
- `hosts/nix-cache02/scheduler.nix` - Timer and service configuration
|
||||||
|
- `services/nats/default.nix` - Scheduler NATS user
|
||||||
|
- `terraform/vault/secrets.tf` - Scheduler NKey secret
|
||||||
|
- `terraform/vault/variables.tf` - Variable for scheduler NKey
|
||||||
|
|
||||||
|
## Resolved Questions
|
||||||
|
|
||||||
|
- **Parallel vs sequential builds?** Sequential - hosts share packages, subsequent builds are fast after first
|
||||||
|
- **What about gunter?** Configured as `nixos` repo in builder settings
|
||||||
|
- **Disk size?** 200GB for new host
|
||||||
|
- **Build host specs?** 8 cores, 16-24GB RAM matches current nix-cache01
|
||||||
|
|
||||||
|
### Phase 6: Observability
|
||||||
|
|
||||||
|
1. **Alerting rules** for build failures:
|
||||||
|
```promql
|
||||||
|
# Alert if any build fails
|
||||||
|
increase(homelab_deploy_build_host_total{status="failure"}[1h]) > 0
|
||||||
|
|
||||||
|
# Alert if no successful builds in 24h (scheduled builds stopped)
|
||||||
|
time() - homelab_deploy_build_last_success_timestamp > 86400
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Grafana dashboard** for build metrics:
|
||||||
|
- Build success/failure rate over time
|
||||||
|
- Average build duration per host (histogram)
|
||||||
|
- Build frequency (builds per hour/day)
|
||||||
|
- Last successful build timestamp per repo
|
||||||
|
|
||||||
|
Available metrics:
|
||||||
|
- `homelab_deploy_builds_total{repo, status}` - total builds by repo and status
|
||||||
|
- `homelab_deploy_build_host_total{repo, host, status}` - per-host build counts
|
||||||
|
- `homelab_deploy_build_duration_seconds_{bucket,sum,count}` - build duration histogram
|
||||||
|
- `homelab_deploy_build_last_timestamp{repo}` - last build attempt
|
||||||
|
- `homelab_deploy_build_last_success_timestamp{repo}` - last successful build
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- [x] ~~When to cut over DNS from nix-cache01 to nix-cache02?~~ Done - 2026-02-10
|
||||||
|
- [ ] Implement safe flake update workflow before or after full migration?
|
||||||
107
docs/plans/completed/ns1-recreation.md
Normal file
107
docs/plans/completed/ns1-recreation.md
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
# ns1 Recreation Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Recreate ns1 using the OpenTofu workflow after the existing VM entered emergency mode due to incorrect hardware-configuration.nix (hardcoded UUIDs that don't match actual disk layout).
|
||||||
|
|
||||||
|
## Current ns1 Configuration to Preserve
|
||||||
|
|
||||||
|
- **IP:** 10.69.13.5/24
|
||||||
|
- **Gateway:** 10.69.13.1
|
||||||
|
- **Role:** Primary DNS (authoritative + resolver)
|
||||||
|
- **Services:**
|
||||||
|
- `../../services/ns/master-authorative.nix`
|
||||||
|
- `../../services/ns/resolver.nix`
|
||||||
|
- **Metadata:**
|
||||||
|
- `homelab.host.role = "dns"`
|
||||||
|
- `homelab.host.labels.dns_role = "primary"`
|
||||||
|
- **Vault:** enabled
|
||||||
|
- **Deploy:** enabled
|
||||||
|
|
||||||
|
## Execution Steps
|
||||||
|
|
||||||
|
### Phase 1: Remove Old Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nix develop -c create-host --remove --hostname ns1 --force
|
||||||
|
```
|
||||||
|
|
||||||
|
This removes:
|
||||||
|
- `hosts/ns1/` directory
|
||||||
|
- Entry from `flake.nix`
|
||||||
|
- Any terraform entries (none exist currently)
|
||||||
|
|
||||||
|
### Phase 2: Create New Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nix develop -c create-host --hostname ns1 --ip 10.69.13.5/24
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates:
|
||||||
|
- `hosts/ns1/` with template2-based configuration
|
||||||
|
- Entry in `flake.nix`
|
||||||
|
- Entry in `terraform/vms.tf`
|
||||||
|
- Vault wrapped token for bootstrap
|
||||||
|
|
||||||
|
### Phase 3: Customize Configuration
|
||||||
|
|
||||||
|
After create-host, manually update `hosts/ns1/configuration.nix` to add:
|
||||||
|
|
||||||
|
1. DNS service imports:
|
||||||
|
```nix
|
||||||
|
../../services/ns/master-authorative.nix
|
||||||
|
../../services/ns/resolver.nix
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Host metadata:
|
||||||
|
```nix
|
||||||
|
homelab.host = {
|
||||||
|
tier = "prod";
|
||||||
|
role = "dns";
|
||||||
|
labels.dns_role = "primary";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Disable resolved (conflicts with Unbound):
|
||||||
|
```nix
|
||||||
|
services.resolved.enable = false;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 4: Commit Changes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add -A
|
||||||
|
git commit -m "ns1: recreate with OpenTofu workflow
|
||||||
|
|
||||||
|
Old VM had incorrect hardware-configuration.nix with hardcoded UUIDs
|
||||||
|
that didn't match actual disk layout, causing boot failure.
|
||||||
|
|
||||||
|
Recreated using template2-based configuration for OpenTofu provisioning."
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 5: Infrastructure
|
||||||
|
|
||||||
|
1. Delete old ns1 VM in Proxmox (it's broken anyway)
|
||||||
|
2. Run `nix develop -c tofu -chdir=terraform apply`
|
||||||
|
3. Wait for bootstrap to complete
|
||||||
|
4. Verify ns1 is functional:
|
||||||
|
- DNS resolution working
|
||||||
|
- Zone transfer to ns2 working
|
||||||
|
- All exporters responding
|
||||||
|
|
||||||
|
### Phase 6: Finalize
|
||||||
|
|
||||||
|
- Push to master
|
||||||
|
- Move this plan to `docs/plans/completed/`
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
If the new VM fails:
|
||||||
|
1. ns2 is still operational as secondary DNS
|
||||||
|
2. Can recreate with different settings if needed
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- ns2 will continue serving DNS during the migration
|
||||||
|
- Zone data is generated from flake, so no data loss
|
||||||
|
- The old VM's disk can be kept briefly in Proxmox as backup if desired
|
||||||
87
docs/plans/completed/openbao-kanidm-oidc.md
Normal file
87
docs/plans/completed/openbao-kanidm-oidc.md
Normal file
@@ -0,0 +1,87 @@
|
|||||||
|
# OpenBao + Kanidm OIDC Integration
|
||||||
|
|
||||||
|
## Status: Completed
|
||||||
|
|
||||||
|
Implemented 2026-02-09.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Enable Kanidm users to authenticate to OpenBao (Vault) using OIDC for Web UI access. Members of the `admins` group get full read/write access to secrets.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Files Modified
|
||||||
|
|
||||||
|
| File | Changes |
|
||||||
|
|------|---------|
|
||||||
|
| `terraform/vault/oidc.tf` | New - OIDC auth backend and roles |
|
||||||
|
| `terraform/vault/policies.tf` | Added oidc-admin and oidc-default policies |
|
||||||
|
| `terraform/vault/secrets.tf` | Added OAuth2 client secret |
|
||||||
|
| `terraform/vault/approle.tf` | Granted kanidm01 access to openbao secrets |
|
||||||
|
| `services/kanidm/default.nix` | Added openbao OAuth2 client, enabled imperative group membership |
|
||||||
|
|
||||||
|
### Kanidm Configuration
|
||||||
|
|
||||||
|
OAuth2 client `openbao` with:
|
||||||
|
- Confidential client (uses client secret)
|
||||||
|
- Web UI callback only: `https://vault.home.2rjus.net:8200/ui/vault/auth/oidc/oidc/callback`
|
||||||
|
- Legacy crypto enabled (RS256 for OpenBao compatibility)
|
||||||
|
- Scope maps for `admins` and `users` groups
|
||||||
|
|
||||||
|
Group membership is now managed imperatively (`overwriteMembers = false`) to prevent provisioning from resetting group memberships on service restart.
|
||||||
|
|
||||||
|
### OpenBao Configuration
|
||||||
|
|
||||||
|
OIDC auth backend at `/oidc` with two roles:
|
||||||
|
|
||||||
|
| Role | Bound Claims | Policy | Access |
|
||||||
|
|------|--------------|--------|--------|
|
||||||
|
| `admin` | `groups = admins@home.2rjus.net` | `oidc-admin` | Full read/write to secrets, system health/metrics |
|
||||||
|
| `default` | (none) | `oidc-default` | Token lookup-self, system health |
|
||||||
|
|
||||||
|
Both roles request scopes: `openid`, `profile`, `email`, `groups`
|
||||||
|
|
||||||
|
### Policies
|
||||||
|
|
||||||
|
**oidc-admin:**
|
||||||
|
- `secret/*` - create, read, update, delete, list
|
||||||
|
- `sys/health` - read
|
||||||
|
- `sys/metrics` - read
|
||||||
|
- `sys/auth` - read
|
||||||
|
- `sys/mounts` - read
|
||||||
|
|
||||||
|
**oidc-default:**
|
||||||
|
- `auth/token/lookup-self` - read
|
||||||
|
- `sys/health` - read
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Web UI Login
|
||||||
|
1. Navigate to https://vault.home.2rjus.net:8200
|
||||||
|
2. Select "OIDC" authentication method
|
||||||
|
3. Enter role: `admin` (for admins) or `default` (for any user)
|
||||||
|
4. Click "Sign in with OIDC"
|
||||||
|
5. Authenticate with Kanidm
|
||||||
|
|
||||||
|
### Group Management
|
||||||
|
Add users to admins group for full access:
|
||||||
|
```bash
|
||||||
|
kanidm group add-members admins <username>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
**CLI login not supported:** Kanidm requires HTTPS for all redirect URIs on confidential (non-public) OAuth2 clients. OpenBao CLI uses `http://localhost:8250/oidc/callback` which Kanidm rejects. Public clients would allow localhost redirects, but OpenBao requires a client secret for OIDC auth.
|
||||||
|
|
||||||
|
## Lessons Learned
|
||||||
|
|
||||||
|
1. **Kanidm group names:** Groups are returned as `groupname@domain` (e.g., `admins@home.2rjus.net`), not just the short name
|
||||||
|
2. **RS256 required:** OpenBao only supports RS256 for JWT signing; Kanidm defaults to ES256, requiring `enableLegacyCrypto = true`
|
||||||
|
3. **Scope request:** OIDC roles must explicitly request the `groups` scope via `oidc_scopes`
|
||||||
|
4. **Provisioning resets:** Kanidm provisioning with default `overwriteMembers = true` resets group memberships on restart
|
||||||
|
5. **Two-phase Terraform:** Secret must exist before OIDC backend can validate discovery URL
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [OpenBao JWT/OIDC Auth Method](https://openbao.org/docs/auth/jwt/)
|
||||||
|
- [Kanidm OAuth2 Documentation](https://kanidm.github.io/kanidm/stable/integrations/oauth2.html)
|
||||||
113
docs/plans/completed/pgdb1-decommission.md
Normal file
113
docs/plans/completed/pgdb1-decommission.md
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
# pgdb1 Decommissioning Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Decommission the pgdb1 PostgreSQL server. The only consumer was Open WebUI on gunter, which has been migrated to use a local PostgreSQL instance.
|
||||||
|
|
||||||
|
## Pre-flight Verification
|
||||||
|
|
||||||
|
Before proceeding, verify that gunter is no longer using pgdb1:
|
||||||
|
|
||||||
|
1. Check Open WebUI on gunter is configured for local PostgreSQL (not 10.69.13.16)
|
||||||
|
2. Optionally: Check pgdb1 for recent connection activity:
|
||||||
|
```bash
|
||||||
|
ssh pgdb1 'sudo -u postgres psql -c "SELECT * FROM pg_stat_activity WHERE datname IS NOT NULL;"'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Files to Remove
|
||||||
|
|
||||||
|
### Host Configuration
|
||||||
|
- `hosts/pgdb1/default.nix`
|
||||||
|
- `hosts/pgdb1/configuration.nix`
|
||||||
|
- `hosts/pgdb1/hardware-configuration.nix`
|
||||||
|
- `hosts/pgdb1/` (directory)
|
||||||
|
|
||||||
|
### Service Module
|
||||||
|
- `services/postgres/postgres.nix`
|
||||||
|
- `services/postgres/default.nix`
|
||||||
|
- `services/postgres/` (directory)
|
||||||
|
|
||||||
|
Note: This service module is only used by pgdb1, so it can be removed entirely.
|
||||||
|
|
||||||
|
### Flake Entry
|
||||||
|
Remove from `flake.nix` (lines 131-138):
|
||||||
|
```nix
|
||||||
|
pgdb1 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/pgdb1
|
||||||
|
];
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### Vault AppRole
|
||||||
|
Remove from `terraform/vault/approle.tf` (lines 69-73):
|
||||||
|
```hcl
|
||||||
|
"pgdb1" = {
|
||||||
|
paths = [
|
||||||
|
"secret/data/hosts/pgdb1/*",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Monitoring Rules
|
||||||
|
Remove from `services/monitoring/rules.yml` the `postgres_down` alert (lines 359-365):
|
||||||
|
```yaml
|
||||||
|
- name: postgres_rules
|
||||||
|
rules:
|
||||||
|
- alert: postgres_down
|
||||||
|
expr: node_systemd_unit_state{instance="pgdb1.home.2rjus.net:9100", name="postgresql.service", state="active"} == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
```
|
||||||
|
|
||||||
|
### Utility Scripts
|
||||||
|
Delete `rebuild-all.sh` entirely (obsolete script).
|
||||||
|
|
||||||
|
## Execution Steps
|
||||||
|
|
||||||
|
### Phase 1: Verification
|
||||||
|
- [ ] Confirm Open WebUI on gunter uses local PostgreSQL
|
||||||
|
- [ ] Verify no active connections to pgdb1
|
||||||
|
|
||||||
|
### Phase 2: Code Cleanup
|
||||||
|
- [ ] Create feature branch: `git checkout -b decommission-pgdb1`
|
||||||
|
- [ ] Remove `hosts/pgdb1/` directory
|
||||||
|
- [ ] Remove `services/postgres/` directory
|
||||||
|
- [ ] Remove pgdb1 entry from `flake.nix`
|
||||||
|
- [ ] Remove postgres alert from `services/monitoring/rules.yml`
|
||||||
|
- [ ] Delete `rebuild-all.sh` (obsolete)
|
||||||
|
- [ ] Run `nix flake check` to verify no broken references
|
||||||
|
- [ ] Commit changes
|
||||||
|
|
||||||
|
### Phase 3: Terraform Cleanup
|
||||||
|
- [ ] Remove pgdb1 from `terraform/vault/approle.tf`
|
||||||
|
- [ ] Run `tofu plan` in `terraform/vault/` to preview changes
|
||||||
|
- [ ] Run `tofu apply` to remove the AppRole
|
||||||
|
- [ ] Commit terraform changes
|
||||||
|
|
||||||
|
### Phase 4: Infrastructure Cleanup
|
||||||
|
- [ ] Shut down pgdb1 VM in Proxmox
|
||||||
|
- [ ] Delete the VM from Proxmox
|
||||||
|
- [ ] (Optional) Remove any DNS entries if not auto-generated
|
||||||
|
|
||||||
|
### Phase 5: Finalize
|
||||||
|
- [ ] Merge feature branch to master
|
||||||
|
- [ ] Trigger auto-upgrade on DNS servers (ns1, ns2) to remove DNS entry
|
||||||
|
- [ ] Move this plan to `docs/plans/completed/`
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
If issues arise after decommissioning:
|
||||||
|
1. The VM can be recreated from template using the git history
|
||||||
|
2. Database data would need to be restored from backup (if any exists)
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- pgdb1 IP: 10.69.13.16
|
||||||
|
- The postgres service allowed connections from gunter (10.69.30.105)
|
||||||
|
- No restic backup was configured for this host
|
||||||
@@ -1,10 +1,38 @@
|
|||||||
# Prometheus Scrape Target Labels
|
# Prometheus Scrape Target Labels
|
||||||
|
|
||||||
|
## Implementation Status
|
||||||
|
|
||||||
|
| Step | Status | Notes |
|
||||||
|
|------|--------|-------|
|
||||||
|
| 1. Create `homelab.host` module | ✅ Complete | `modules/homelab/host.nix` |
|
||||||
|
| 2. Update `lib/monitoring.nix` | ✅ Complete | Labels extracted and propagated |
|
||||||
|
| 3. Update Prometheus config | ✅ Complete | Uses structured static_configs |
|
||||||
|
| 4. Set metadata on hosts | ✅ Complete | All relevant hosts configured |
|
||||||
|
| 5. Update alert rules | ✅ Complete | Role-based filtering implemented |
|
||||||
|
| 6. Labels for service targets | ✅ Complete | Host labels propagated to all services |
|
||||||
|
| 7. Add hostname label | ✅ Complete | All targets have `hostname` label for easy filtering |
|
||||||
|
|
||||||
|
**Hosts with metadata configured:**
|
||||||
|
- `ns1`, `ns2`: `role = "dns"`, `labels.dns_role = "primary"/"secondary"`
|
||||||
|
- `nix-cache01`: `role = "build-host"`
|
||||||
|
- `vault01`: `role = "vault"`
|
||||||
|
- `testvm01/02/03`: `tier = "test"`
|
||||||
|
|
||||||
|
**Implementation complete.** Branch: `prometheus-scrape-target-labels`
|
||||||
|
|
||||||
|
**Query examples:**
|
||||||
|
- `{hostname="ns1"}` - all metrics from ns1 (any job/port)
|
||||||
|
- `node_cpu_seconds_total{hostname="monitoring01"}` - specific metric by hostname
|
||||||
|
- `up{role="dns"}` - all DNS servers
|
||||||
|
- `up{tier="test"}` - all test-tier hosts
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Goal
|
## Goal
|
||||||
|
|
||||||
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
|
Add support for custom per-host labels on Prometheus scrape targets, enabling alert rules to reference host metadata (priority, role) instead of hardcoding instance names.
|
||||||
|
|
||||||
**Related:** This plan shares the `homelab.host` module with `docs/plans/nats-deploy-service.md`, which uses the same metadata for deployment tier assignment.
|
**Related:** This plan shares the `homelab.host` module with `docs/plans/completed/nats-deploy-service.md`, which uses the same metadata for deployment tier assignment.
|
||||||
|
|
||||||
## Motivation
|
## Motivation
|
||||||
|
|
||||||
@@ -54,12 +82,11 @@ or
|
|||||||
|
|
||||||
## Implementation
|
## Implementation
|
||||||
|
|
||||||
This implementation uses a shared `homelab.host` module that provides host metadata for multiple consumers (Prometheus labels, deployment tiers, etc.). See also `docs/plans/nats-deploy-service.md` which uses the same module for deployment tier assignment.
|
This implementation uses a shared `homelab.host` module that provides host metadata for multiple consumers (Prometheus labels, deployment tiers, etc.). See also `docs/plans/completed/nats-deploy-service.md` which uses the same module for deployment tier assignment.
|
||||||
|
|
||||||
### 1. Create `homelab.host` module
|
### 1. Create `homelab.host` module
|
||||||
|
|
||||||
**Status:** Step 1 (Create `homelab.host` module) is complete. The module is in
|
✅ **Complete.** The module is in `modules/homelab/host.nix`.
|
||||||
`modules/homelab/host.nix` with tier, priority, role, and labels options.
|
|
||||||
|
|
||||||
Create `modules/homelab/host.nix` with shared host metadata options:
|
Create `modules/homelab/host.nix` with shared host metadata options:
|
||||||
|
|
||||||
@@ -98,6 +125,8 @@ Import this module in `modules/homelab/default.nix`.
|
|||||||
|
|
||||||
### 2. Update `lib/monitoring.nix`
|
### 2. Update `lib/monitoring.nix`
|
||||||
|
|
||||||
|
✅ **Complete.** Labels are now extracted and propagated.
|
||||||
|
|
||||||
- `extractHostMonitoring` should also extract `homelab.host` values (priority, role, labels).
|
- `extractHostMonitoring` should also extract `homelab.host` values (priority, role, labels).
|
||||||
- Build the combined label set from `homelab.host`:
|
- Build the combined label set from `homelab.host`:
|
||||||
|
|
||||||
@@ -126,6 +155,8 @@ This requires grouping hosts by their label attrset and producing one `static_co
|
|||||||
|
|
||||||
### 3. Update `services/monitoring/prometheus.nix`
|
### 3. Update `services/monitoring/prometheus.nix`
|
||||||
|
|
||||||
|
✅ **Complete.** Now uses structured static_configs output.
|
||||||
|
|
||||||
Change the node-exporter scrape config to use the new structured output:
|
Change the node-exporter scrape config to use the new structured output:
|
||||||
|
|
||||||
```nix
|
```nix
|
||||||
@@ -138,36 +169,37 @@ static_configs = nodeExporterTargets;
|
|||||||
|
|
||||||
### 4. Set metadata on hosts
|
### 4. Set metadata on hosts
|
||||||
|
|
||||||
|
✅ **Complete.** All relevant hosts have metadata configured. Note: The implementation filters by `role` rather than `priority`, which matches the existing nix-cache01 configuration.
|
||||||
|
|
||||||
Example in `hosts/nix-cache01/configuration.nix`:
|
Example in `hosts/nix-cache01/configuration.nix`:
|
||||||
|
|
||||||
```nix
|
```nix
|
||||||
homelab.host = {
|
homelab.host = {
|
||||||
tier = "test"; # can be deployed by MCP (used by homelab-deploy)
|
|
||||||
priority = "low"; # relaxed alerting thresholds
|
priority = "low"; # relaxed alerting thresholds
|
||||||
role = "build-host";
|
role = "build-host";
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Note:** Current implementation only sets `role = "build-host"`. Consider adding `priority = "low"` when label propagation is implemented.
|
||||||
|
|
||||||
Example in `hosts/ns1/configuration.nix`:
|
Example in `hosts/ns1/configuration.nix`:
|
||||||
|
|
||||||
```nix
|
```nix
|
||||||
homelab.host = {
|
homelab.host = {
|
||||||
tier = "prod";
|
|
||||||
priority = "high";
|
|
||||||
role = "dns";
|
role = "dns";
|
||||||
labels.dns_role = "primary";
|
labels.dns_role = "primary";
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Note:** `tier` and `priority` use defaults ("prod" and "high"), which is the intended behavior. The current ns1/ns2 configurations match this pattern.
|
||||||
|
|
||||||
### 5. Update alert rules
|
### 5. Update alert rules
|
||||||
|
|
||||||
After implementing labels, review and update `services/monitoring/rules.yml`:
|
✅ **Complete.** Updated `services/monitoring/rules.yml`:
|
||||||
|
|
||||||
- Replace instance-name exclusions with label-based filters (e.g. `{priority!="low"}` instead of `{instance!="nix-cache01.home.2rjus.net:9100"}`).
|
- `high_cpu_load`: Replaced `instance!="nix-cache01..."` with `role!="build-host"` for standard hosts (15m duration) and `role="build-host"` for build hosts (2h duration).
|
||||||
- Consider whether any other rules should differentiate by priority or role.
|
- `unbound_low_cache_hit_ratio`: Added `dns_role="primary"` filter to only alert on the primary DNS resolver (secondary has a cold cache).
|
||||||
|
|
||||||
Specifically, the `high_cpu_load` rule currently has a nix-cache01 exclusion that should be replaced with a `priority`-based filter.
|
### 6. Labels for `generateScrapeConfigs` (service targets)
|
||||||
|
|
||||||
### 6. Consider labels for `generateScrapeConfigs` (service targets)
|
✅ **Complete.** Host labels are now propagated to all auto-generated service scrape targets (unbound, homelab-deploy, nixos-exporter, etc.). This enables semantic filtering on any service metric, such as using `dns_role="primary"` with the unbound job.
|
||||||
|
|
||||||
The same label propagation could be applied to service-level scrape targets. This is optional and can be deferred -- service targets are more specialized and less likely to need generic label-based filtering.
|
|
||||||
@@ -9,24 +9,23 @@ hosts are decommissioned or deferred.
|
|||||||
|
|
||||||
## Current State
|
## Current State
|
||||||
|
|
||||||
Hosts already managed by OpenTofu: `vault01`, `testvm01`, `vaulttest01`
|
Hosts already managed by OpenTofu: `vault01`, `testvm01`, `testvm02`, `testvm03`, `ns2`, `ns1`
|
||||||
|
|
||||||
Hosts to migrate:
|
Hosts to migrate:
|
||||||
|
|
||||||
| Host | Category | Notes |
|
| Host | Category | Notes |
|
||||||
|------|----------|-------|
|
|------|----------|-------|
|
||||||
| ns1 | Stateless | Primary DNS, recreate |
|
| ~~ns1~~ | ~~Stateless~~ | ✓ Complete |
|
||||||
| ns2 | Stateless | Secondary DNS, recreate |
|
|
||||||
| nix-cache01 | Stateless | Binary cache, recreate |
|
| nix-cache01 | Stateless | Binary cache, recreate |
|
||||||
| http-proxy | Stateless | Reverse proxy, recreate |
|
| http-proxy | Stateless | Reverse proxy, recreate |
|
||||||
| nats1 | Stateless | Messaging, recreate |
|
| nats1 | Stateless | Messaging, recreate |
|
||||||
| auth01 | Decommission | No longer in use |
|
|
||||||
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
| ha1 | Stateful | Home Assistant + Zigbee2MQTT + Mosquitto |
|
||||||
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
| monitoring01 | Stateful | Prometheus, Grafana, Loki |
|
||||||
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
| jelly01 | Stateful | Jellyfin metadata, watch history, config |
|
||||||
| pgdb1 | Stateful | PostgreSQL databases |
|
| pgdb1 | Decommission | Only used by Open WebUI on gunter, migrating to local postgres |
|
||||||
| jump | Decommission | No longer needed |
|
| ~~jump~~ | ~~Decommission~~ | ✓ Complete |
|
||||||
| ca | Deferred | Pending Phase 4c PKI migration to OpenBao |
|
| ~~auth01~~ | ~~Decommission~~ | ✓ Complete |
|
||||||
|
| ~~ca~~ | ~~Deferred~~ | ✓ Complete |
|
||||||
|
|
||||||
## Phase 1: Backup Preparation
|
## Phase 1: Backup Preparation
|
||||||
|
|
||||||
@@ -46,39 +45,19 @@ No backup currently exists. Add a restic backup job for `/var/lib/jellyfin/` whi
|
|||||||
Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup.
|
Media files are on the NAS (`nas.home.2rjus.net:/mnt/hdd-pool/media`) and do not need backup.
|
||||||
The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates.
|
The cache directory (`/var/cache/jellyfin/`) does not need backup — it regenerates.
|
||||||
|
|
||||||
### 1c. Add PostgreSQL Backup to pgdb1
|
### 1c. Verify Existing ha1 Backup
|
||||||
|
|
||||||
No backup currently exists. Add a restic backup job with a `pg_dumpall` pre-hook to capture
|
|
||||||
all databases and roles. The dump should be piped through restic's stdin backup (similar to
|
|
||||||
the Grafana DB dump pattern on monitoring01).
|
|
||||||
|
|
||||||
### 1d. Verify Existing ha1 Backup
|
|
||||||
|
|
||||||
ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify
|
ha1 already backs up `/var/lib/hass`, `/var/lib/zigbee2mqtt`, `/var/lib/mosquitto`. Verify
|
||||||
these backups are current and restorable before proceeding with migration.
|
these backups are current and restorable before proceeding with migration.
|
||||||
|
|
||||||
### 1e. Verify All Backups
|
### 1d. Verify All Backups
|
||||||
|
|
||||||
After adding/expanding backup jobs:
|
After adding/expanding backup jobs:
|
||||||
1. Trigger a manual backup run on each host
|
1. Trigger a manual backup run on each host
|
||||||
2. Verify backup integrity with `restic check`
|
2. Verify backup integrity with `restic check`
|
||||||
3. Test a restore to a temporary location to confirm data is recoverable
|
3. Test a restore to a temporary location to confirm data is recoverable
|
||||||
|
|
||||||
## Phase 2: Declare pgdb1 Databases in Nix
|
## Phase 2: Stateless Host Migration
|
||||||
|
|
||||||
Before migrating pgdb1, audit the manually-created databases and users on the running
|
|
||||||
instance, then declare them in the Nix configuration using `ensureDatabases` and
|
|
||||||
`ensureUsers`. This makes the PostgreSQL setup reproducible on the new host.
|
|
||||||
|
|
||||||
Steps:
|
|
||||||
1. SSH to pgdb1, run `\l` and `\du` in psql to list databases and roles
|
|
||||||
2. Add `ensureDatabases` and `ensureUsers` to `services/postgres/postgres.nix`
|
|
||||||
3. Document any non-default PostgreSQL settings or extensions per database
|
|
||||||
|
|
||||||
After reprovisioning, the databases will be created by NixOS, and data restored from the
|
|
||||||
`pg_dumpall` backup.
|
|
||||||
|
|
||||||
## Phase 3: Stateless Host Migration
|
|
||||||
|
|
||||||
These hosts have no meaningful state and can be recreated fresh. For each host:
|
These hosts have no meaningful state and can be recreated fresh. For each host:
|
||||||
|
|
||||||
@@ -95,13 +74,14 @@ Migrate stateless hosts in an order that minimizes disruption:
|
|||||||
|
|
||||||
1. **nix-cache01** — low risk, no downstream dependencies during migration
|
1. **nix-cache01** — low risk, no downstream dependencies during migration
|
||||||
2. **nats1** — low risk, verify no persistent JetStream streams first
|
2. **nats1** — low risk, verify no persistent JetStream streams first
|
||||||
4. **http-proxy** — brief disruption to proxied services, migrate during low-traffic window
|
3. **http-proxy** — brief disruption to proxied services, migrate during low-traffic window
|
||||||
5. **ns1, ns2** — migrate one at a time, verify DNS resolution between each
|
4. ~~**ns1** — ns2 already migrated, verify AXFR works after ns1 migration~~ ✓ Complete
|
||||||
|
|
||||||
For ns1/ns2: migrate ns2 first (secondary), verify AXFR works, then migrate ns1. All hosts
|
~~For ns1/ns2: migrate ns2 first (secondary), verify AXFR works, then migrate ns1.~~ Both ns1
|
||||||
use both ns1 and ns2 as resolvers, so one being down briefly is tolerable.
|
and ns2 migration complete. Zone transfer (AXFR) verified working between ns1 (primary) and
|
||||||
|
ns2 (secondary).
|
||||||
|
|
||||||
## Phase 4: Stateful Host Migration
|
## Phase 3: Stateful Host Migration
|
||||||
|
|
||||||
For each stateful host, the procedure is:
|
For each stateful host, the procedure is:
|
||||||
|
|
||||||
@@ -114,17 +94,7 @@ For each stateful host, the procedure is:
|
|||||||
7. Start services and verify functionality
|
7. Start services and verify functionality
|
||||||
8. Decommission the old VM
|
8. Decommission the old VM
|
||||||
|
|
||||||
### 4a. pgdb1
|
### 3a. monitoring01
|
||||||
|
|
||||||
1. Run final `pg_dumpall` backup via restic
|
|
||||||
2. Stop PostgreSQL on the old host
|
|
||||||
3. Provision new pgdb1 via OpenTofu
|
|
||||||
4. After bootstrap, NixOS creates the declared databases/users
|
|
||||||
5. Restore data with `pg_restore` or `psql < dumpall.sql`
|
|
||||||
6. Verify database connectivity from gunter (`10.69.30.105`)
|
|
||||||
7. Decommission old VM
|
|
||||||
|
|
||||||
### 4b. monitoring01
|
|
||||||
|
|
||||||
1. Run final Grafana backup
|
1. Run final Grafana backup
|
||||||
2. Provision new monitoring01 via OpenTofu
|
2. Provision new monitoring01 via OpenTofu
|
||||||
@@ -134,7 +104,7 @@ For each stateful host, the procedure is:
|
|||||||
6. Verify all scrape targets are being collected
|
6. Verify all scrape targets are being collected
|
||||||
7. Decommission old VM
|
7. Decommission old VM
|
||||||
|
|
||||||
### 4c. jelly01
|
### 3b. jelly01
|
||||||
|
|
||||||
1. Run final Jellyfin backup
|
1. Run final Jellyfin backup
|
||||||
2. Provision new jelly01 via OpenTofu
|
2. Provision new jelly01 via OpenTofu
|
||||||
@@ -143,7 +113,7 @@ For each stateful host, the procedure is:
|
|||||||
5. Start Jellyfin, verify watch history and library metadata are present
|
5. Start Jellyfin, verify watch history and library metadata are present
|
||||||
6. Decommission old VM
|
6. Decommission old VM
|
||||||
|
|
||||||
### 4d. ha1
|
### 3c. ha1
|
||||||
|
|
||||||
1. Verify latest restic backup is current
|
1. Verify latest restic backup is current
|
||||||
2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host
|
2. Stop Home Assistant, Zigbee2MQTT, and Mosquitto on old host
|
||||||
@@ -167,47 +137,69 @@ OpenTofu/Proxmox. Verify the USB device ID on the hypervisor and add the appropr
|
|||||||
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
`usb` block to the VM definition in `terraform/vms.tf`. The USB device must be passed
|
||||||
through before starting Zigbee2MQTT on the new host.
|
through before starting Zigbee2MQTT on the new host.
|
||||||
|
|
||||||
## Phase 5: Decommission jump and auth01 Hosts
|
## Phase 4: Decommission Hosts
|
||||||
|
|
||||||
### jump
|
### jump ✓ COMPLETE
|
||||||
1. Verify nothing depends on the jump host (no SSH proxy configs pointing to it, etc.)
|
|
||||||
2. Remove host configuration from `hosts/jump/`
|
~~1. Verify nothing depends on the jump host (no SSH proxy configs pointing to it, etc.)~~
|
||||||
3. Remove from `flake.nix`
|
~~2. Remove host configuration from `hosts/jump/`~~
|
||||||
4. Remove any secrets in `secrets/jump/`
|
~~3. Remove from `flake.nix`~~
|
||||||
5. Remove from `.sops.yaml`
|
~~4. Remove any secrets in `secrets/jump/`~~
|
||||||
|
~~5. Remove from `.sops.yaml`~~
|
||||||
|
~~6. Destroy the VM in Proxmox~~
|
||||||
|
~~7. Commit cleanup~~
|
||||||
|
|
||||||
|
Host was already removed from flake.nix and VM destroyed. Configuration cleaned up in ba9f47f.
|
||||||
|
|
||||||
|
### auth01 ✓ COMPLETE
|
||||||
|
|
||||||
|
~~1. Remove host configuration from `hosts/auth01/`~~
|
||||||
|
~~2. Remove from `flake.nix`~~
|
||||||
|
~~3. Remove any secrets in `secrets/auth01/`~~
|
||||||
|
~~4. Remove from `.sops.yaml`~~
|
||||||
|
~~5. Remove `services/authelia/` and `services/lldap/` (only used by auth01)~~
|
||||||
|
~~6. Destroy the VM in Proxmox~~
|
||||||
|
~~7. Commit cleanup~~
|
||||||
|
|
||||||
|
Host configuration, services, and VM already removed.
|
||||||
|
|
||||||
|
### pgdb1 (in progress)
|
||||||
|
|
||||||
|
Only consumer was Open WebUI on gunter, which has been migrated to use local PostgreSQL.
|
||||||
|
|
||||||
|
1. ~~Verify Open WebUI on gunter is using local PostgreSQL (not pgdb1)~~ ✓
|
||||||
|
2. ~~Remove host configuration from `hosts/pgdb1/`~~ ✓
|
||||||
|
3. ~~Remove `services/postgres/` (only used by pgdb1)~~ ✓
|
||||||
|
4. ~~Remove from `flake.nix`~~ ✓
|
||||||
|
5. ~~Remove Vault AppRole from `terraform/vault/approle.tf`~~ ✓
|
||||||
6. Destroy the VM in Proxmox
|
6. Destroy the VM in Proxmox
|
||||||
7. Commit cleanup
|
7. ~~Commit cleanup~~ ✓
|
||||||
|
|
||||||
### auth01
|
See `docs/plans/pgdb1-decommission.md` for detailed plan.
|
||||||
1. Remove host configuration from `hosts/auth01/`
|
|
||||||
2. Remove from `flake.nix`
|
|
||||||
3. Remove any secrets in `secrets/auth01/`
|
|
||||||
4. Remove from `.sops.yaml`
|
|
||||||
5. Remove `services/authelia/` and `services/lldap/` (only used by auth01)
|
|
||||||
6. Destroy the VM in Proxmox
|
|
||||||
7. Commit cleanup
|
|
||||||
|
|
||||||
## Phase 6: Decommission ca Host (Deferred)
|
## Phase 5: Decommission ca Host ✓ COMPLETE
|
||||||
|
|
||||||
Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
~~Deferred until Phase 4c (PKI migration to OpenBao) is complete. Once all hosts use the
|
||||||
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
OpenBao ACME endpoint for certificates, the step-ca host can be decommissioned following
|
||||||
the same cleanup steps as the jump host.
|
the same cleanup steps as the jump host.~~
|
||||||
|
|
||||||
## Phase 7: Remove sops-nix
|
PKI migration to OpenBao complete. Host configuration, `services/ca/`, and VM removed.
|
||||||
|
|
||||||
Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
## Phase 6: Remove sops-nix ✓ COMPLETE
|
||||||
all remnants:
|
|
||||||
- `sops-nix` input from `flake.nix` and `flake.lock`
|
|
||||||
- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`
|
|
||||||
- `inherit sops-nix` from all specialArgs in `flake.nix`
|
|
||||||
- `system/sops.nix` and its import in `system/default.nix`
|
|
||||||
- `.sops.yaml`
|
|
||||||
- `secrets/` directory
|
|
||||||
- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`
|
|
||||||
- Template scripts that generate age keys for sops (`hosts/template/scripts.nix`,
|
|
||||||
`hosts/template2/scripts.nix`)
|
|
||||||
|
|
||||||
See `docs/plans/completed/sops-to-openbao-migration.md` for full context.
|
~~Once `ca` is decommissioned (Phase 6), `sops-nix` is no longer used by any host. Remove
|
||||||
|
all remnants:~~
|
||||||
|
~~- `sops-nix` input from `flake.nix` and `flake.lock`~~
|
||||||
|
~~- `sops-nix.nixosModules.sops` from all host module lists in `flake.nix`~~
|
||||||
|
~~- `inherit sops-nix` from all specialArgs in `flake.nix`~~
|
||||||
|
~~- `system/sops.nix` and its import in `system/default.nix`~~
|
||||||
|
~~- `.sops.yaml`~~
|
||||||
|
~~- `secrets/` directory~~
|
||||||
|
~~- All `sops.secrets.*` declarations in `services/ca/`, `services/authelia/`, `services/lldap/`~~
|
||||||
|
~~- Template scripts that generate age keys for sops (`hosts/template/scripts.nix`,
|
||||||
|
`hosts/template2/scripts.nix`)~~
|
||||||
|
|
||||||
|
All sops-nix remnants removed. See `docs/plans/completed/sops-to-openbao-migration.md` for context.
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
@@ -216,7 +208,7 @@ See `docs/plans/completed/sops-to-openbao-migration.md` for full context.
|
|||||||
- The old VMs use IPs that the new VMs need, so the old VM must be shut down before
|
- The old VMs use IPs that the new VMs need, so the old VM must be shut down before
|
||||||
the new one is provisioned (or use a temporary IP and swap after verification)
|
the new one is provisioned (or use a temporary IP and swap after verification)
|
||||||
- Stateful migrations should be done during low-usage windows
|
- Stateful migrations should be done during low-usage windows
|
||||||
- After all migrations are complete, the only hosts not in OpenTofu will be ca (deferred)
|
- After all migrations are complete, all decommissioned hosts (jump, auth01, ca) have been removed
|
||||||
- Since many hosts are being recreated, this is a good opportunity to establish consistent
|
- Since many hosts are being recreated, this is a good opportunity to establish consistent
|
||||||
hostname naming conventions before provisioning the new VMs. Current naming is inconsistent
|
hostname naming conventions before provisioning the new VMs. Current naming is inconsistent
|
||||||
(e.g. `ns1` vs `nix-cache01`, `ha1` vs `auth01`, `pgdb1` vs `http-proxy`). Decide on a
|
(e.g. `ns1` vs `nix-cache01`, `ha1` vs `auth01`, `pgdb1` vs `http-proxy`). Decide on a
|
||||||
|
|||||||
190
docs/plans/loki-improvements.md
Normal file
190
docs/plans/loki-improvements.md
Normal file
@@ -0,0 +1,190 @@
|
|||||||
|
# Loki Setup Improvements
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The current Loki deployment on monitoring01 is functional but minimal. It lacks retention policies, rate limiting, and uses local filesystem storage. This plan evaluates improvement options across several dimensions: retention management, storage backend, resource limits, and operational improvements.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
**Loki** on monitoring01 (`services/monitoring/loki.nix`):
|
||||||
|
- Single-node deployment, no HA
|
||||||
|
- Filesystem storage at `/var/lib/loki/chunks`
|
||||||
|
- TSDB index (v13 schema, 24h period)
|
||||||
|
- No retention policy configured (logs grow indefinitely)
|
||||||
|
- No `limits_config` (no rate limiting, stream limits, or query guards)
|
||||||
|
- No caching layer
|
||||||
|
- Auth disabled (trusted network)
|
||||||
|
|
||||||
|
**Promtail** on all 16 hosts (`system/monitoring/logs.nix`):
|
||||||
|
- Ships systemd journal (JSON) + `/var/log/**/*.log`
|
||||||
|
- Labels: `host`, `job` (systemd-journal/varlog), `systemd_unit`
|
||||||
|
- Hardcoded to `http://monitoring01.home.2rjus.net:3100`
|
||||||
|
|
||||||
|
**Additional log sources:**
|
||||||
|
- `pipe-to-loki` script (manual log submission, `job=pipe-to-loki`)
|
||||||
|
- Bootstrap logs from template2 (`job=bootstrap`)
|
||||||
|
|
||||||
|
**Context:** The VictoriaMetrics migration plan (`docs/plans/monitoring-migration-victoriametrics.md`) includes moving Loki to monitoring02 with "same configuration as current". These improvements could be applied either before or after that migration.
|
||||||
|
|
||||||
|
## Improvement Areas
|
||||||
|
|
||||||
|
### 1. Retention Policy
|
||||||
|
|
||||||
|
**Problem:** No retention configured. Logs accumulate until disk fills up.
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
|
||||||
|
| Approach | Config Location | How It Works |
|
||||||
|
|----------|----------------|--------------|
|
||||||
|
| **Compactor retention** | `compactor` + `limits_config` | Compactor runs periodic retention sweeps, deleting chunks older than threshold |
|
||||||
|
| **Table manager** | `table_manager` | Legacy approach, not recommended for TSDB |
|
||||||
|
|
||||||
|
**Recommendation:** Use compactor-based retention (the modern approach for TSDB/filesystem):
|
||||||
|
|
||||||
|
```nix
|
||||||
|
compactor = {
|
||||||
|
working_directory = "/var/lib/loki/compactor";
|
||||||
|
compaction_interval = "10m";
|
||||||
|
retention_enabled = true;
|
||||||
|
retention_delete_delay = "2h";
|
||||||
|
retention_delete_worker_count = 150;
|
||||||
|
};
|
||||||
|
|
||||||
|
limits_config = {
|
||||||
|
retention_period = "30d"; # Default retention for all tenants
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
30 days aligns with the Prometheus retention and is reasonable for a homelab. Older logs are rarely useful, and anything important can be found in journal archives on the hosts themselves.
|
||||||
|
|
||||||
|
### 2. Storage Backend
|
||||||
|
|
||||||
|
**Decision:** Stay with filesystem storage for now. Garage S3 was considered but ruled out - the current single-node Garage (replication_factor=1) offers no real durability benefit over local disk. S3 storage can be revisited after the NAS migration, when a more robust S3-compatible solution will likely be available.
|
||||||
|
|
||||||
|
### 3. Limits Configuration
|
||||||
|
|
||||||
|
**Problem:** No rate limiting or stream cardinality protection. A misbehaving service could generate excessive logs and overwhelm Loki.
|
||||||
|
|
||||||
|
**Recommendation:** Add basic guardrails:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
limits_config = {
|
||||||
|
retention_period = "30d";
|
||||||
|
ingestion_rate_mb = 10; # MB/s per tenant
|
||||||
|
ingestion_burst_size_mb = 20; # Burst allowance
|
||||||
|
max_streams_per_user = 10000; # Prevent label explosion
|
||||||
|
max_query_series = 500; # Limit query resource usage
|
||||||
|
max_query_parallelism = 8;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
These are generous limits that shouldn't affect normal operation but protect against runaway log generators.
|
||||||
|
|
||||||
|
### 4. Promtail Label Improvements
|
||||||
|
|
||||||
|
**Problem:** Label inconsistencies and missing useful metadata:
|
||||||
|
- The `varlog` scrape config uses `hostname` while journal uses `host` (different label name)
|
||||||
|
- No `tier` or `role` labels, making it hard to filter logs by deployment tier or host function
|
||||||
|
|
||||||
|
**Implemented:** Standardized on `hostname` to match Prometheus labels. The journal scrape previously used a relabel from `__journal__hostname` to `host`; now both scrape configs use a static `hostname` label from `config.networking.hostName`. Also updated `pipe-to-loki` and bootstrap scripts to use `hostname` instead of `host`.
|
||||||
|
|
||||||
|
1. **Standardized label:** Both scrape configs use `hostname` (matching Prometheus) via shared `hostLabels`
|
||||||
|
2. **Added `tier` label:** Static label from `config.homelab.host.tier` (`test`/`prod`) on both scrape configs
|
||||||
|
3. **Added `role` label:** Static label from `config.homelab.host.role` on both scrape configs (conditionally, only when non-null)
|
||||||
|
|
||||||
|
No cardinality impact - `tier` and `role` are 1:1 with `hostname`, so they add metadata to existing streams without creating new ones.
|
||||||
|
|
||||||
|
This enables queries like:
|
||||||
|
- `{tier="prod"} |= "error"` - all errors on prod hosts
|
||||||
|
- `{role="dns"}` - all DNS server logs
|
||||||
|
- `{tier="test", job="systemd-journal"}` - journal logs from test hosts
|
||||||
|
|
||||||
|
### 5. Journal Priority → Level Label
|
||||||
|
|
||||||
|
**Problem:** Loki 3.6.3 auto-detects a `detected_level` label by parsing log message text for keywords like "INFO", "ERROR", etc. This works for applications that embed level strings in messages (Go apps, Loki itself), but **fails for traditional Unix services** that use the journal `PRIORITY` field without level text in the message.
|
||||||
|
|
||||||
|
Example: NSD logs `"signal received, shutting down..."` with `PRIORITY="4"` (warning), but Loki sets `detected_level="unknown"` because the message has no level keyword. Querying `{detected_level="warn"}` misses these entirely.
|
||||||
|
|
||||||
|
**Recommendation:** Add a Promtail pipeline stage to the journal scrape config that maps the `PRIORITY` field to a `level` label:
|
||||||
|
|
||||||
|
| PRIORITY | level |
|
||||||
|
|----------|-------|
|
||||||
|
| 0-2 | critical |
|
||||||
|
| 3 | error |
|
||||||
|
| 4 | warning |
|
||||||
|
| 5 | notice |
|
||||||
|
| 6 | info |
|
||||||
|
| 7 | debug |
|
||||||
|
|
||||||
|
This can be done with a `json` stage to extract PRIORITY, then a `template` + `labels` stage to map and attach it. The journal `PRIORITY` field is always present, so this gives reliable level filtering for all journal logs.
|
||||||
|
|
||||||
|
**Cardinality impact:** Moderate. Adds up to ~6 label values per host+unit combination. In practice most services log at 1-2 levels, so the stream count increase is manageable for 16 hosts. The filtering benefit (e.g., `{level="error"}` to find all errors across the fleet) outweighs the cost.
|
||||||
|
|
||||||
|
This enables queries like:
|
||||||
|
- `{level="error"}` - all errors across the fleet
|
||||||
|
- `{level=~"critical|error", tier="prod"}` - prod errors and criticals
|
||||||
|
- `{level="warning", role="dns"}` - warnings from DNS servers
|
||||||
|
|
||||||
|
### 6. Enable JSON Logging on Services
|
||||||
|
|
||||||
|
**Problem:** Many services support structured JSON log output but may be using plain text by default. JSON logs are significantly easier to query in Loki - `| json` cleanly extracts all fields, whereas plain text requires fragile regex or pattern matching.
|
||||||
|
|
||||||
|
**Recommendation:** Audit all configured services and enable JSON logging where supported. Candidates to check include:
|
||||||
|
- Caddy (already JSON by default)
|
||||||
|
- Prometheus / Alertmanager / Loki / Tempo
|
||||||
|
- Grafana
|
||||||
|
- NSD / Unbound
|
||||||
|
- Home Assistant
|
||||||
|
- NATS
|
||||||
|
- Jellyfin
|
||||||
|
- OpenBao (Vault)
|
||||||
|
- Kanidm
|
||||||
|
- Garage
|
||||||
|
|
||||||
|
For each service, check whether it supports a JSON log format option and whether enabling it would break anything (e.g., log volume increase from verbose JSON, or dashboards that parse text format).
|
||||||
|
|
||||||
|
### 7. Monitoring CNAME for Promtail Target
|
||||||
|
|
||||||
|
**Problem:** Promtail hardcodes `monitoring01.home.2rjus.net:3100`. The VictoriaMetrics migration plan already addresses this by switching to a `monitoring` CNAME.
|
||||||
|
|
||||||
|
**Recommendation:** This should happen as part of the monitoring02 migration, not independently. If we do Loki improvements before that migration, keep pointing to monitoring01.
|
||||||
|
|
||||||
|
## Priority Ranking
|
||||||
|
|
||||||
|
| # | Improvement | Effort | Impact | Recommendation |
|
||||||
|
|---|-------------|--------|--------|----------------|
|
||||||
|
| 1 | **Retention policy** | Low | High | Do first - prevents disk exhaustion |
|
||||||
|
| 2 | **Limits config** | Low | Medium | Do with retention - minimal additional effort |
|
||||||
|
| 3 | **Promtail label fix** | Trivial | Low | Quick fix, do with other label changes |
|
||||||
|
| 4 | **Journal priority → level** | Low-medium | Medium | Reliable level filtering across the fleet |
|
||||||
|
| 5 | **JSON logging audit** | Low-medium | Medium | Audit services, enable JSON where supported |
|
||||||
|
| 6 | **Monitoring CNAME** | Low | Medium | Part of monitoring02 migration |
|
||||||
|
|
||||||
|
## Implementation Steps
|
||||||
|
|
||||||
|
### Phase 1: Retention + Limits (quick win)
|
||||||
|
|
||||||
|
1. Add `compactor` section to `services/monitoring/loki.nix`
|
||||||
|
2. Add `limits_config` with 30-day retention and basic rate limits
|
||||||
|
3. Update `system/monitoring/logs.nix`:
|
||||||
|
- ~~Fix `hostname` → `host` label in varlog scrape config~~ Done: standardized on `hostname` (matching Prometheus)
|
||||||
|
- ~~Add `tier` static label from `config.homelab.host.tier` to both scrape configs~~ Done
|
||||||
|
- ~~Add `role` static label from `config.homelab.host.role` (conditionally, only when set) to both scrape configs~~ Done
|
||||||
|
- ~~Add pipeline stages to journal scrape config: `json` to extract PRIORITY, `template` to map to level name, `labels` to attach as `level`~~ Done
|
||||||
|
4. Deploy to monitoring01, verify compactor runs and old data gets cleaned
|
||||||
|
5. Verify `level` label works: `{level="error"}` should return results, and match cases where `detected_level="unknown"`
|
||||||
|
|
||||||
|
### Phase 2 (future): S3 Storage Migration
|
||||||
|
|
||||||
|
Revisit after NAS migration when a proper S3-compatible storage solution is available. At that point, add a new schema period with `object_store = "s3"` - the old filesystem period will continue serving historical data until it ages out past retention.
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- [ ] What retention period makes sense? 30 days suggested, but could be 14d or 60d depending on disk/storage budget
|
||||||
|
- [ ] Do we want per-stream retention (e.g., keep bootstrap/pipe-to-loki longer)?
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Loki schema changes require adding a new period entry (not modifying existing ones). The old period continues serving historical data.
|
||||||
|
- The compactor is already part of single-process Loki in recent versions - it just needs to be configured.
|
||||||
|
- S3 storage deferred until post-NAS migration when a proper solution is available.
|
||||||
116
docs/plans/memory-issues-follow-up.md
Normal file
116
docs/plans/memory-issues-follow-up.md
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
# Memory Issues Follow-up
|
||||||
|
|
||||||
|
Tracking the zram change to verify it resolves OOM issues during nixos-upgrade on low-memory hosts.
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
On 2026-02-08, ns2 (2GB RAM) experienced an OOM kill during nixos-upgrade. The Nix evaluation process consumed ~1.6GB before being killed by the kernel. ns1 (manually increased to 4GB) succeeded with the same upgrade.
|
||||||
|
|
||||||
|
Root cause: 2GB RAM is insufficient for Nix flake evaluation without swap.
|
||||||
|
|
||||||
|
## Fix Applied
|
||||||
|
|
||||||
|
**Commit:** `1674b6a` - system: enable zram swap for all hosts
|
||||||
|
|
||||||
|
**Merged:** 2026-02-08 ~12:15 UTC
|
||||||
|
|
||||||
|
**Change:** Added `zramSwap.enable = true` to `system/zram.nix`, providing ~2GB compressed swap on all hosts.
|
||||||
|
|
||||||
|
## Timeline
|
||||||
|
|
||||||
|
| Time (UTC) | Event |
|
||||||
|
|------------|-------|
|
||||||
|
| 05:00:46 | ns2 nixos-upgrade OOM killed |
|
||||||
|
| 05:01:47 | `nixos_upgrade_failed` alert fired |
|
||||||
|
| 12:15 | zram commit merged to master |
|
||||||
|
| 12:19 | ns2 rebooted with zram enabled |
|
||||||
|
| 12:20 | ns1 rebooted (memory reduced to 2GB via tofu) |
|
||||||
|
|
||||||
|
## Hosts Affected
|
||||||
|
|
||||||
|
All 2GB VMs that run nixos-upgrade:
|
||||||
|
- ns1, ns2 (DNS)
|
||||||
|
- vault01
|
||||||
|
- testvm01, testvm02, testvm03
|
||||||
|
- kanidm01
|
||||||
|
|
||||||
|
## Metrics to Monitor
|
||||||
|
|
||||||
|
Check these in Grafana or via PromQL to verify the fix:
|
||||||
|
|
||||||
|
### Swap availability (should be ~2GB after upgrade)
|
||||||
|
```promql
|
||||||
|
node_memory_SwapTotal_bytes / 1024 / 1024
|
||||||
|
```
|
||||||
|
|
||||||
|
### Swap usage during upgrades
|
||||||
|
```promql
|
||||||
|
(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / 1024 / 1024
|
||||||
|
```
|
||||||
|
|
||||||
|
### Zswap compressed bytes (active compression)
|
||||||
|
```promql
|
||||||
|
node_memory_Zswap_bytes / 1024 / 1024
|
||||||
|
```
|
||||||
|
|
||||||
|
### Upgrade failures (should be 0)
|
||||||
|
```promql
|
||||||
|
node_systemd_unit_state{name="nixos-upgrade.service", state="failed"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Memory available during upgrades
|
||||||
|
```promql
|
||||||
|
node_memory_MemAvailable_bytes / 1024 / 1024
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verification Steps
|
||||||
|
|
||||||
|
After a few days (allow auto-upgrades to run on all hosts):
|
||||||
|
|
||||||
|
1. Check all hosts have swap enabled:
|
||||||
|
```promql
|
||||||
|
node_memory_SwapTotal_bytes > 0
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check for any upgrade failures since the fix:
|
||||||
|
```promql
|
||||||
|
count_over_time(ALERTS{alertname="nixos_upgrade_failed"}[7d])
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Review if any hosts used swap during upgrades (check historical graphs)
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- No `nixos_upgrade_failed` alerts due to OOM after 2026-02-08
|
||||||
|
- All hosts show ~2GB swap available
|
||||||
|
- Upgrades complete successfully on 2GB VMs
|
||||||
|
|
||||||
|
## Fallback Options
|
||||||
|
|
||||||
|
If zram is insufficient:
|
||||||
|
|
||||||
|
1. **Increase VM memory** - Update `terraform/vms.tf` to 4GB for affected hosts
|
||||||
|
2. **Enable memory ballooning** - Configure VMs with dynamic memory allocation (see below)
|
||||||
|
3. **Use remote builds** - Configure `nix.buildMachines` to offload evaluation
|
||||||
|
4. **Reduce flake size** - Split configurations to reduce evaluation memory
|
||||||
|
|
||||||
|
### Memory Ballooning
|
||||||
|
|
||||||
|
Proxmox supports memory ballooning, which allows VMs to dynamically grow/shrink memory allocation based on demand. The balloon driver inside the guest communicates with the hypervisor to release or reclaim memory pages.
|
||||||
|
|
||||||
|
Configuration in `terraform/vms.tf`:
|
||||||
|
```hcl
|
||||||
|
memory = 4096 # maximum memory
|
||||||
|
balloon = 2048 # minimum memory (shrinks to this when idle)
|
||||||
|
```
|
||||||
|
|
||||||
|
Pros:
|
||||||
|
- VMs get memory on-demand without reboots
|
||||||
|
- Better host memory utilization
|
||||||
|
- Solves upgrade OOM without permanently allocating 4GB
|
||||||
|
|
||||||
|
Cons:
|
||||||
|
- Requires QEMU guest agent running in guest
|
||||||
|
- Guest can experience memory pressure if host is overcommitted
|
||||||
|
|
||||||
|
Ballooning and zram are complementary - ballooning provides headroom from the host, zram provides overflow within the guest.
|
||||||
241
docs/plans/monitoring-migration-victoriametrics.md
Normal file
241
docs/plans/monitoring-migration-victoriametrics.md
Normal file
@@ -0,0 +1,241 @@
|
|||||||
|
# Monitoring Stack Migration to VictoriaMetrics
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Migrate from Prometheus to VictoriaMetrics on a new host (monitoring02) to gain better compression
|
||||||
|
and longer retention. Run in parallel with monitoring01 until validated, then switch over using
|
||||||
|
a `monitoring` CNAME for seamless transition.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
**monitoring01** (10.69.13.13):
|
||||||
|
- 4 CPU cores, 4GB RAM, 33GB disk
|
||||||
|
- Prometheus with 30-day retention (15s scrape interval)
|
||||||
|
- Alertmanager (routes to alerttonotify webhook)
|
||||||
|
- Grafana (dashboards, datasources)
|
||||||
|
- Loki (log aggregation from all hosts via Promtail)
|
||||||
|
- Tempo (distributed tracing)
|
||||||
|
- Pyroscope (continuous profiling)
|
||||||
|
|
||||||
|
**Hardcoded References to monitoring01:**
|
||||||
|
- `system/monitoring/logs.nix` - Promtail sends logs to `http://monitoring01.home.2rjus.net:3100`
|
||||||
|
- `hosts/template2/bootstrap.nix` - Bootstrap logs to Loki (keep as-is until decommission)
|
||||||
|
- `services/http-proxy/proxy.nix` - Caddy proxies Prometheus, Alertmanager, Grafana, Pyroscope, Pushgateway
|
||||||
|
|
||||||
|
**Auto-generated:**
|
||||||
|
- Prometheus scrape targets (from `lib/monitoring.nix` + `homelab.monitoring.scrapeTargets`)
|
||||||
|
- Node-exporter targets (from all hosts with static IPs)
|
||||||
|
|
||||||
|
## Decision: VictoriaMetrics
|
||||||
|
|
||||||
|
Per `docs/plans/long-term-metrics-storage.md`, VictoriaMetrics is the recommended starting point:
|
||||||
|
- Single binary replacement for Prometheus
|
||||||
|
- 5-10x better compression (30 days could become 180+ days in same space)
|
||||||
|
- Same PromQL query language (Grafana dashboards work unchanged)
|
||||||
|
- Same scrape config format (existing auto-generated configs work)
|
||||||
|
|
||||||
|
If multi-year retention with downsampling becomes necessary later, Thanos can be evaluated.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐
|
||||||
|
│ monitoring02 │
|
||||||
|
│ VictoriaMetrics│
|
||||||
|
│ + Grafana │
|
||||||
|
monitoring │ + Loki │
|
||||||
|
CNAME ──────────│ + Tempo │
|
||||||
|
│ + Pyroscope │
|
||||||
|
│ + Alertmanager │
|
||||||
|
│ (vmalert) │
|
||||||
|
└─────────────────┘
|
||||||
|
▲
|
||||||
|
│ scrapes
|
||||||
|
┌───────────────┼───────────────┐
|
||||||
|
│ │ │
|
||||||
|
┌────┴────┐ ┌─────┴────┐ ┌─────┴────┐
|
||||||
|
│ ns1 │ │ ha1 │ │ ... │
|
||||||
|
│ :9100 │ │ :9100 │ │ :9100 │
|
||||||
|
└─────────┘ └──────────┘ └──────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1: Create monitoring02 Host
|
||||||
|
|
||||||
|
Use `create-host` script which handles flake.nix and terraform/vms.tf automatically.
|
||||||
|
|
||||||
|
1. **Run create-host**: `nix develop -c create-host monitoring02 10.69.13.24`
|
||||||
|
2. **Update VM resources** in `terraform/vms.tf`:
|
||||||
|
- 4 cores (same as monitoring01)
|
||||||
|
- 8GB RAM (double, for VictoriaMetrics headroom)
|
||||||
|
- 100GB disk (for 3+ months retention with compression)
|
||||||
|
3. **Update host configuration**: Import monitoring services
|
||||||
|
4. **Create Vault AppRole**: Add to `terraform/vault/approle.tf`
|
||||||
|
|
||||||
|
### Phase 2: Set Up VictoriaMetrics Stack
|
||||||
|
|
||||||
|
Create new service module at `services/monitoring/victoriametrics/` for testing alongside existing
|
||||||
|
Prometheus config. Once validated, this can replace the Prometheus module.
|
||||||
|
|
||||||
|
1. **VictoriaMetrics** (port 8428):
|
||||||
|
- `services.victoriametrics.enable = true`
|
||||||
|
- `services.victoriametrics.retentionPeriod = "3m"` (3 months, increase later based on disk usage)
|
||||||
|
- Migrate scrape configs via `prometheusConfig`
|
||||||
|
- Use native push support (replaces Pushgateway)
|
||||||
|
|
||||||
|
2. **vmalert** for alerting rules:
|
||||||
|
- `services.vmalert.enable = true`
|
||||||
|
- Point to VictoriaMetrics for metrics evaluation
|
||||||
|
- Keep rules in separate `rules.yml` file (same format as Prometheus)
|
||||||
|
- No receiver configured during parallel operation (prevents duplicate alerts)
|
||||||
|
|
||||||
|
3. **Alertmanager** (port 9093):
|
||||||
|
- Keep existing configuration (alerttonotify webhook routing)
|
||||||
|
- Only enable receiver after cutover from monitoring01
|
||||||
|
|
||||||
|
4. **Loki** (port 3100):
|
||||||
|
- Same configuration as current
|
||||||
|
|
||||||
|
5. **Grafana** (port 3000):
|
||||||
|
- Define dashboards declaratively via NixOS options (not imported from monitoring01)
|
||||||
|
- Reference existing dashboards on monitoring01 for content inspiration
|
||||||
|
- Configure VictoriaMetrics datasource (port 8428)
|
||||||
|
- Configure Loki datasource
|
||||||
|
|
||||||
|
6. **Tempo** (ports 3200, 3201):
|
||||||
|
- Same configuration
|
||||||
|
|
||||||
|
7. **Pyroscope** (port 4040):
|
||||||
|
- Same Docker-based deployment
|
||||||
|
|
||||||
|
### Phase 3: Parallel Operation
|
||||||
|
|
||||||
|
Run both monitoring01 and monitoring02 simultaneously:
|
||||||
|
|
||||||
|
1. **Dual scraping**: Both hosts scrape the same targets
|
||||||
|
- Validates VictoriaMetrics is collecting data correctly
|
||||||
|
|
||||||
|
2. **Dual log shipping**: Configure Promtail to send logs to both Loki instances
|
||||||
|
- Add second client in `system/monitoring/logs.nix` pointing to monitoring02
|
||||||
|
|
||||||
|
3. **Validate dashboards**: Access Grafana on monitoring02, verify dashboards work
|
||||||
|
|
||||||
|
4. **Validate alerts**: Verify vmalert evaluates rules correctly (no receiver = no notifications)
|
||||||
|
|
||||||
|
5. **Compare resource usage**: Monitor disk/memory consumption between hosts
|
||||||
|
|
||||||
|
### Phase 4: Add monitoring CNAME
|
||||||
|
|
||||||
|
Add CNAME to monitoring02 once validated:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
# hosts/monitoring02/configuration.nix
|
||||||
|
homelab.dns.cnames = [ "monitoring" ];
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates `monitoring.home.2rjus.net` pointing to monitoring02.
|
||||||
|
|
||||||
|
### Phase 5: Update References
|
||||||
|
|
||||||
|
Update hardcoded references to use the CNAME:
|
||||||
|
|
||||||
|
1. **system/monitoring/logs.nix**:
|
||||||
|
- Remove dual-shipping, point only to `http://monitoring.home.2rjus.net:3100`
|
||||||
|
|
||||||
|
2. **services/http-proxy/proxy.nix**: Update reverse proxy backends:
|
||||||
|
- prometheus.home.2rjus.net -> monitoring.home.2rjus.net:8428
|
||||||
|
- alertmanager.home.2rjus.net -> monitoring.home.2rjus.net:9093
|
||||||
|
- grafana.home.2rjus.net -> monitoring.home.2rjus.net:3000
|
||||||
|
- pyroscope.home.2rjus.net -> monitoring.home.2rjus.net:4040
|
||||||
|
|
||||||
|
Note: `hosts/template2/bootstrap.nix` stays pointed at monitoring01 until decommission.
|
||||||
|
|
||||||
|
### Phase 6: Enable Alerting
|
||||||
|
|
||||||
|
Once ready to cut over:
|
||||||
|
1. Enable Alertmanager receiver on monitoring02
|
||||||
|
2. Verify test alerts route correctly
|
||||||
|
|
||||||
|
### Phase 7: Cutover and Decommission
|
||||||
|
|
||||||
|
1. **Stop monitoring01**: Prevent duplicate alerts during transition
|
||||||
|
2. **Update bootstrap.nix**: Point to `monitoring.home.2rjus.net`
|
||||||
|
3. **Verify all targets scraped**: Check VictoriaMetrics UI
|
||||||
|
4. **Verify logs flowing**: Check Loki on monitoring02
|
||||||
|
5. **Decommission monitoring01**:
|
||||||
|
- Remove from flake.nix
|
||||||
|
- Remove host configuration
|
||||||
|
- Destroy VM in Proxmox
|
||||||
|
- Remove from terraform state
|
||||||
|
|
||||||
|
## Current Progress
|
||||||
|
|
||||||
|
### monitoring02 Host Created (2026-02-08)
|
||||||
|
|
||||||
|
Host deployed at 10.69.13.24 (test tier) with:
|
||||||
|
- 4 CPU cores, 8GB RAM, 60GB disk
|
||||||
|
- Vault integration enabled
|
||||||
|
- NATS-based remote deployment enabled
|
||||||
|
|
||||||
|
### Grafana with Kanidm OIDC (2026-02-08)
|
||||||
|
|
||||||
|
Grafana deployed on monitoring02 as a test instance (`grafana-test.home.2rjus.net`):
|
||||||
|
- Kanidm OIDC authentication (PKCE enabled)
|
||||||
|
- Role mapping: `admins` → Admin, others → Viewer
|
||||||
|
- Declarative datasources pointing to monitoring01 (Prometheus, Loki)
|
||||||
|
- Local Caddy for TLS termination via internal ACME CA
|
||||||
|
|
||||||
|
This validates the Grafana + OIDC pattern before the full VictoriaMetrics migration. The existing
|
||||||
|
`services/monitoring/grafana.nix` on monitoring01 can be replaced with the new `services/grafana/`
|
||||||
|
module once monitoring02 becomes the primary monitoring host.
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- [ ] What disk size for monitoring02? Current 60GB may need expansion for 3+ months with VictoriaMetrics
|
||||||
|
- [ ] Which dashboards to recreate declaratively? (Review monitoring01 Grafana for current set)
|
||||||
|
- [ ] Consider replacing Promtail with Grafana Alloy (`services.alloy`, v1.12.2 in nixpkgs). Promtail is in maintenance mode and Grafana recommends Alloy as the successor. Alloy is a unified collector (logs, metrics, traces, profiles) but uses its own "River" config format instead of YAML, so less Nix-native ergonomics. Could bundle the migration with monitoring02 to consolidate disruption.
|
||||||
|
|
||||||
|
## VictoriaMetrics Service Configuration
|
||||||
|
|
||||||
|
Example NixOS configuration for monitoring02:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
# VictoriaMetrics replaces Prometheus
|
||||||
|
services.victoriametrics = {
|
||||||
|
enable = true;
|
||||||
|
retentionPeriod = "3m"; # 3 months, increase based on disk usage
|
||||||
|
prometheusConfig = {
|
||||||
|
global.scrape_interval = "15s";
|
||||||
|
scrape_configs = [
|
||||||
|
# Auto-generated node-exporter targets
|
||||||
|
# Service-specific scrape targets
|
||||||
|
# External targets
|
||||||
|
];
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
# vmalert for alerting rules (no receiver during parallel operation)
|
||||||
|
services.vmalert = {
|
||||||
|
enable = true;
|
||||||
|
datasource.url = "http://localhost:8428";
|
||||||
|
# notifier.alertmanager.url = "http://localhost:9093"; # Enable after cutover
|
||||||
|
rule = [ ./rules.yml ];
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rollback Plan
|
||||||
|
|
||||||
|
If issues arise after cutover:
|
||||||
|
1. Move `monitoring` CNAME back to monitoring01
|
||||||
|
2. Restart monitoring01 services
|
||||||
|
3. Revert Promtail config to point only to monitoring01
|
||||||
|
4. Revert http-proxy backends
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- VictoriaMetrics uses port 8428 vs Prometheus 9090
|
||||||
|
- PromQL compatibility is excellent
|
||||||
|
- VictoriaMetrics native push replaces Pushgateway (remove from http-proxy if not needed)
|
||||||
|
- monitoring02 deployed via OpenTofu using `create-host` script
|
||||||
|
- Grafana dashboards defined declaratively via NixOS, not imported from monitoring01 state
|
||||||
224
docs/plans/security-hardening.md
Normal file
224
docs/plans/security-hardening.md
Normal file
@@ -0,0 +1,224 @@
|
|||||||
|
# Security Hardening Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Address security gaps identified in infrastructure review. Focus areas: SSH hardening, network security, logging improvements, and secrets management.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
- SSH allows password auth and unrestricted root login (`system/sshd.nix`)
|
||||||
|
- Firewall disabled on all hosts (`networking.firewall.enable = false`)
|
||||||
|
- Promtail ships logs over HTTP to Loki
|
||||||
|
- Loki has no authentication (`auth_enabled = false`)
|
||||||
|
- AppRole secret-IDs never expire (`secret_id_ttl = 0`)
|
||||||
|
- Vault TLS verification disabled by default (`skipTlsVerify = true`)
|
||||||
|
- Audit logging exists (`common/ssh-audit.nix`) but not applied globally
|
||||||
|
- Alert rules focus on availability, no security event detection
|
||||||
|
|
||||||
|
## Priority Matrix
|
||||||
|
|
||||||
|
| Issue | Severity | Effort | Priority |
|
||||||
|
|-------|----------|--------|----------|
|
||||||
|
| SSH password auth | High | Low | **P1** |
|
||||||
|
| Firewall disabled | High | Medium | **P1** |
|
||||||
|
| Promtail HTTP (no TLS) | High | Medium | **P2** |
|
||||||
|
| No security alerting | Medium | Low | **P2** |
|
||||||
|
| Audit logging not global | Low | Low | **P2** |
|
||||||
|
| Loki no auth | Medium | Medium | **P3** |
|
||||||
|
| Secret-ID TTL | Medium | Medium | **P3** |
|
||||||
|
| Vault skipTlsVerify | Medium | Low | **P3** |
|
||||||
|
|
||||||
|
## Phase 1: Quick Wins (P1)
|
||||||
|
|
||||||
|
### 1.1 SSH Hardening
|
||||||
|
|
||||||
|
Edit `system/sshd.nix`:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
services.openssh = {
|
||||||
|
enable = true;
|
||||||
|
settings = {
|
||||||
|
PermitRootLogin = "prohibit-password"; # Key-only root login
|
||||||
|
PasswordAuthentication = false;
|
||||||
|
KbdInteractiveAuthentication = false;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prerequisite:** Verify all hosts have SSH keys deployed for root.
|
||||||
|
|
||||||
|
### 1.2 Enable Firewall
|
||||||
|
|
||||||
|
Create `system/firewall.nix` with default deny policy:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
{ ... }: {
|
||||||
|
networking.firewall.enable = true;
|
||||||
|
|
||||||
|
# Use openssh's built-in firewall integration
|
||||||
|
services.openssh.openFirewall = true;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Useful firewall options:**
|
||||||
|
|
||||||
|
| Option | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `networking.firewall.trustedInterfaces` | Accept all traffic from these interfaces (e.g., `[ "lo" ]`) |
|
||||||
|
| `networking.firewall.interfaces.<name>.allowedTCPPorts` | Per-interface port rules |
|
||||||
|
| `networking.firewall.extraInputRules` | Custom nftables rules (for complex filtering) |
|
||||||
|
|
||||||
|
**Network range restrictions:** Consider restricting SSH to the infrastructure subnet (`10.69.13.0/24`) using `extraInputRules` for defense in depth. However, this adds complexity and may not be necessary given the trusted network model.
|
||||||
|
|
||||||
|
#### Per-Interface Rules (http-proxy WireGuard)
|
||||||
|
|
||||||
|
The `http-proxy` host has a WireGuard interface (`wg0`) that may need different rules than the LAN interface. Use `networking.firewall.interfaces` to apply per-interface policies:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
# Example: http-proxy with different rules per interface
|
||||||
|
networking.firewall = {
|
||||||
|
enable = true;
|
||||||
|
|
||||||
|
# Default: only SSH (via openFirewall)
|
||||||
|
allowedTCPPorts = [ ];
|
||||||
|
|
||||||
|
# LAN interface: allow HTTP/HTTPS
|
||||||
|
interfaces.ens18 = {
|
||||||
|
allowedTCPPorts = [ 80 443 ];
|
||||||
|
};
|
||||||
|
|
||||||
|
# WireGuard interface: restrict to specific services or trust fully
|
||||||
|
interfaces.wg0 = {
|
||||||
|
allowedTCPPorts = [ 80 443 ];
|
||||||
|
# Or use trustedInterfaces = [ "wg0" ] if fully trusted
|
||||||
|
};
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**TODO:** Investigate current WireGuard usage on http-proxy to determine appropriate rules.
|
||||||
|
|
||||||
|
Then per-host, open required ports:
|
||||||
|
|
||||||
|
| Host | Additional Ports |
|
||||||
|
|------|------------------|
|
||||||
|
| ns1/ns2 | 53 (TCP/UDP) |
|
||||||
|
| vault01 | 8200 |
|
||||||
|
| monitoring01 | 3100, 9090, 3000, 9093 |
|
||||||
|
| http-proxy | 80, 443 |
|
||||||
|
| nats1 | 4222 |
|
||||||
|
| ha1 | 1883, 8123 |
|
||||||
|
| jelly01 | 8096 |
|
||||||
|
| nix-cache01 | 5000 |
|
||||||
|
|
||||||
|
## Phase 2: Logging & Detection (P2)
|
||||||
|
|
||||||
|
### 2.1 Enable TLS for Promtail → Loki
|
||||||
|
|
||||||
|
Update `system/monitoring/logs.nix`:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
clients = [{
|
||||||
|
url = "https://monitoring01.home.2rjus.net:3100/loki/api/v1/push";
|
||||||
|
tls_config = {
|
||||||
|
ca_file = "/etc/ssl/certs/homelab-root-ca.pem";
|
||||||
|
};
|
||||||
|
}];
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires:
|
||||||
|
- Configure Loki with TLS certificate (use internal ACME)
|
||||||
|
- Ensure all hosts trust root CA (already done via `system/pki/root-ca.nix`)
|
||||||
|
|
||||||
|
### 2.2 Security Alert Rules
|
||||||
|
|
||||||
|
Add to `services/monitoring/rules.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: security_rules
|
||||||
|
rules:
|
||||||
|
- alert: ssh_auth_failures
|
||||||
|
expr: increase(node_logind_sessions_total[5m]) > 20
|
||||||
|
for: 0m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Unusual login activity on {{ $labels.instance }}"
|
||||||
|
|
||||||
|
- alert: vault_secret_fetch_failure
|
||||||
|
expr: increase(vault_secret_failures[5m]) > 5
|
||||||
|
for: 0m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Vault secret fetch failures on {{ $labels.instance }}"
|
||||||
|
```
|
||||||
|
|
||||||
|
Also add Loki-based alerts for:
|
||||||
|
- Failed SSH attempts: `{job="systemd-journal"} |= "Failed password"`
|
||||||
|
- sudo usage: `{job="systemd-journal"} |= "sudo"`
|
||||||
|
|
||||||
|
### 2.3 Global Audit Logging
|
||||||
|
|
||||||
|
Add `./common/ssh-audit.nix` import to `system/default.nix`:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
imports = [
|
||||||
|
# ... existing imports
|
||||||
|
../common/ssh-audit.nix
|
||||||
|
];
|
||||||
|
```
|
||||||
|
|
||||||
|
## Phase 3: Defense in Depth (P3)
|
||||||
|
|
||||||
|
### 3.1 Loki Authentication
|
||||||
|
|
||||||
|
Options:
|
||||||
|
1. **Basic auth via reverse proxy** - Put Loki behind Caddy with auth
|
||||||
|
2. **Loki multi-tenancy** - Enable `auth_enabled = true` and use tenant IDs
|
||||||
|
3. **Network isolation** - Bind Loki only to localhost, expose via authenticated proxy
|
||||||
|
|
||||||
|
Recommendation: Option 1 (reverse proxy) is simplest for homelab.
|
||||||
|
|
||||||
|
### 3.2 AppRole Secret Rotation
|
||||||
|
|
||||||
|
Update `terraform/vault/approle.tf`:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
secret_id_ttl = 2592000 # 30 days
|
||||||
|
```
|
||||||
|
|
||||||
|
Add documentation for manual rotation procedure or implement automated rotation via the existing `restartTrigger` mechanism in `vault-secrets.nix`.
|
||||||
|
|
||||||
|
### 3.3 Enable Vault TLS Verification
|
||||||
|
|
||||||
|
Change default in `system/vault-secrets.nix`:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
skipTlsVerify = mkOption {
|
||||||
|
type = types.bool;
|
||||||
|
default = false; # Changed from true
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prerequisite:** Verify all hosts trust the internal CA that signed the Vault certificate.
|
||||||
|
|
||||||
|
## Implementation Order
|
||||||
|
|
||||||
|
1. **Test on test-tier first** - Deploy phases 1-2 to testvm01/02/03
|
||||||
|
2. **Validate SSH access** - Ensure key-based login works before disabling passwords
|
||||||
|
3. **Document firewall ports** - Create reference of ports per host before enabling
|
||||||
|
4. **Phase prod rollout** - Deploy to prod hosts one at a time, verify each
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- [ ] Do all hosts have SSH keys configured for root access?
|
||||||
|
- [ ] Should firewall rules be per-host or use a central definition with roles?
|
||||||
|
- [ ] Should Loki authentication use the existing Kanidm setup?
|
||||||
|
|
||||||
|
**Resolved:** Password-based SSH access for recovery is not required - most hosts have console access through Proxmox or physical access, which provides an out-of-band recovery path if SSH keys fail.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Firewall changes are the highest risk - test thoroughly on test-tier
|
||||||
|
- SSH hardening must not lock out access - verify keys first
|
||||||
|
- Consider creating a "break glass" procedure for emergency access if keys fail
|
||||||
311
docs/user-management.md
Normal file
311
docs/user-management.md
Normal file
@@ -0,0 +1,311 @@
|
|||||||
|
# User Management with Kanidm
|
||||||
|
|
||||||
|
Central authentication for the homelab using Kanidm.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
- **Server**: kanidm01.home.2rjus.net (auth.home.2rjus.net)
|
||||||
|
- **WebUI**: https://auth.home.2rjus.net
|
||||||
|
- **LDAPS**: port 636
|
||||||
|
|
||||||
|
## CLI Setup
|
||||||
|
|
||||||
|
The `kanidm` CLI is available in the devshell:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nix develop
|
||||||
|
|
||||||
|
# Login as idm_admin
|
||||||
|
kanidm login --name idm_admin --url https://auth.home.2rjus.net
|
||||||
|
```
|
||||||
|
|
||||||
|
## User Management
|
||||||
|
|
||||||
|
POSIX users are managed imperatively via the `kanidm` CLI. This allows setting
|
||||||
|
all attributes (including UNIX password) in one workflow.
|
||||||
|
|
||||||
|
### Creating a POSIX User
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create the person
|
||||||
|
kanidm person create <username> "<Display Name>"
|
||||||
|
|
||||||
|
# Add to groups
|
||||||
|
kanidm group add-members ssh-users <username>
|
||||||
|
|
||||||
|
# Enable POSIX (UID is auto-assigned)
|
||||||
|
kanidm person posix set <username>
|
||||||
|
|
||||||
|
# Set UNIX password (required for SSH login, min 10 characters)
|
||||||
|
kanidm person posix set-password <username>
|
||||||
|
|
||||||
|
# Optionally set login shell
|
||||||
|
kanidm person posix set <username> --shell /bin/zsh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Setting Email Address
|
||||||
|
|
||||||
|
Email is required for OAuth2/OIDC login (e.g., Grafana):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm person update <username> --mail <email>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example: Full User Creation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm person create testuser "Test User"
|
||||||
|
kanidm person update testuser --mail testuser@home.2rjus.net
|
||||||
|
kanidm group add-members ssh-users testuser
|
||||||
|
kanidm group add-members users testuser # Required for OAuth2 scopes
|
||||||
|
kanidm person posix set testuser
|
||||||
|
kanidm person posix set-password testuser
|
||||||
|
kanidm person get testuser
|
||||||
|
```
|
||||||
|
|
||||||
|
After creation, verify on a client host:
|
||||||
|
```bash
|
||||||
|
getent passwd testuser
|
||||||
|
ssh testuser@testvm01.home.2rjus.net
|
||||||
|
```
|
||||||
|
|
||||||
|
### Viewing User Details
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm person get <username>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Removing a User
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm person delete <username>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Group Management
|
||||||
|
|
||||||
|
Groups for POSIX access are also managed via CLI.
|
||||||
|
|
||||||
|
### Creating a POSIX Group
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create the group
|
||||||
|
kanidm group create <group-name>
|
||||||
|
|
||||||
|
# Enable POSIX with a specific GID
|
||||||
|
kanidm group posix set <group-name> --gidnumber <gid>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding Members
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm group add-members <group-name> <username>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Viewing Group Details
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm group get <group-name>
|
||||||
|
kanidm group list-members <group-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example: Full Group Creation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm group create testgroup
|
||||||
|
kanidm group posix set testgroup --gidnumber 68010
|
||||||
|
kanidm group add-members testgroup testuser
|
||||||
|
kanidm group get testgroup
|
||||||
|
```
|
||||||
|
|
||||||
|
After creation, verify on a client host:
|
||||||
|
```bash
|
||||||
|
getent group testgroup
|
||||||
|
```
|
||||||
|
|
||||||
|
### Current Groups
|
||||||
|
|
||||||
|
| Group | GID | Purpose |
|
||||||
|
|-------|-----|---------|
|
||||||
|
| ssh-users | 68000 | SSH login access |
|
||||||
|
| admins | 68001 | Administrative access |
|
||||||
|
| users | 68002 | General users |
|
||||||
|
|
||||||
|
### UID/GID Allocation
|
||||||
|
|
||||||
|
Kanidm auto-assigns UIDs/GIDs from its configured range. For manually assigned GIDs:
|
||||||
|
|
||||||
|
| Range | Purpose |
|
||||||
|
|-------|---------|
|
||||||
|
| 65,536+ | Users (auto-assigned) |
|
||||||
|
| 68,000 - 68,999 | Groups (manually assigned) |
|
||||||
|
|
||||||
|
## OAuth2/OIDC Login (Web Services)
|
||||||
|
|
||||||
|
For OAuth2/OIDC login to web services like Grafana, users need:
|
||||||
|
|
||||||
|
1. **Primary credential** - Password set via `credential update` (separate from unix password)
|
||||||
|
2. **MFA** - TOTP or passkey (Kanidm requires MFA for primary credentials)
|
||||||
|
3. **Group membership** - Member of `users` group (for OAuth2 scope mapping)
|
||||||
|
4. **Email address** - Set via `person update --mail`
|
||||||
|
|
||||||
|
### Setting Up Primary Credential (Web Login)
|
||||||
|
|
||||||
|
The primary credential is different from the unix/POSIX password:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Interactive credential setup
|
||||||
|
kanidm person credential update <username>
|
||||||
|
|
||||||
|
# In the interactive prompt:
|
||||||
|
# 1. Type 'password' to set a password
|
||||||
|
# 2. Type 'totp' to add TOTP (scan QR with authenticator app)
|
||||||
|
# 3. Type 'commit' to save
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verifying OAuth2 Readiness
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm person get <username>
|
||||||
|
```
|
||||||
|
|
||||||
|
Check for:
|
||||||
|
- `mail:` - Email address set
|
||||||
|
- `memberof:` - Includes `users@home.2rjus.net`
|
||||||
|
- Primary credential status (check via `credential update` → `status`)
|
||||||
|
|
||||||
|
## PAM/NSS Client Configuration
|
||||||
|
|
||||||
|
Enable central authentication on a host:
|
||||||
|
|
||||||
|
```nix
|
||||||
|
homelab.kanidm.enable = true;
|
||||||
|
```
|
||||||
|
|
||||||
|
This configures:
|
||||||
|
- `services.kanidm.enablePam = true`
|
||||||
|
- Client connection to auth.home.2rjus.net
|
||||||
|
- Login authorization for `ssh-users` group
|
||||||
|
- Short usernames (`torjus` instead of `torjus@home.2rjus.net`)
|
||||||
|
- Home directory symlinks (`/home/torjus` → UUID-based directory)
|
||||||
|
|
||||||
|
### Enabled Hosts
|
||||||
|
|
||||||
|
- testvm01, testvm02, testvm03 (test tier)
|
||||||
|
|
||||||
|
### Options
|
||||||
|
|
||||||
|
```nix
|
||||||
|
homelab.kanidm = {
|
||||||
|
enable = true;
|
||||||
|
server = "https://auth.home.2rjus.net"; # default
|
||||||
|
allowedLoginGroups = [ "ssh-users" ]; # default
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### Home Directories
|
||||||
|
|
||||||
|
Home directories use UUID-based paths for stability (so renaming a user doesn't
|
||||||
|
require moving their home directory). Symlinks provide convenient access:
|
||||||
|
|
||||||
|
```
|
||||||
|
/home/torjus -> /home/e4f4c56c-4aee-4c20-846f-90cb69807733
|
||||||
|
```
|
||||||
|
|
||||||
|
The symlinks are created by `kanidm-unixd-tasks` on first login.
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Verify NSS Resolution
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check user resolution
|
||||||
|
getent passwd <username>
|
||||||
|
|
||||||
|
# Check group resolution
|
||||||
|
getent group <group-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test SSH Login
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh <username>@<hostname>.home.2rjus.net
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "PAM user mismatch" error
|
||||||
|
|
||||||
|
SSH fails with "fatal: PAM user mismatch" in logs. This happens when Kanidm returns
|
||||||
|
usernames in SPN format (`torjus@home.2rjus.net`) but SSH expects short names (`torjus`).
|
||||||
|
|
||||||
|
**Solution**: Configure `uid_attr_map = "name"` in unixSettings (already set in our module).
|
||||||
|
|
||||||
|
Check current format:
|
||||||
|
```bash
|
||||||
|
getent passwd torjus
|
||||||
|
# Should show: torjus:x:65536:...
|
||||||
|
# NOT: torjus@home.2rjus.net:x:65536:...
|
||||||
|
```
|
||||||
|
|
||||||
|
### User resolves but SSH fails immediately
|
||||||
|
|
||||||
|
The user's login group (e.g., `ssh-users`) likely doesn't have POSIX enabled:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check if group has POSIX
|
||||||
|
getent group ssh-users
|
||||||
|
|
||||||
|
# If empty, enable POSIX on the server
|
||||||
|
kanidm group posix set ssh-users --gidnumber 68000
|
||||||
|
```
|
||||||
|
|
||||||
|
### User doesn't resolve via getent
|
||||||
|
|
||||||
|
1. Check kanidm-unixd service is running:
|
||||||
|
```bash
|
||||||
|
systemctl status kanidm-unixd
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check unixd can reach server:
|
||||||
|
```bash
|
||||||
|
kanidm-unix status
|
||||||
|
# Should show: system: online, Kanidm: online
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check client can reach server:
|
||||||
|
```bash
|
||||||
|
curl -s https://auth.home.2rjus.net/status
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Check user has POSIX enabled on server:
|
||||||
|
```bash
|
||||||
|
kanidm person get <username>
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Restart nscd to clear stale cache:
|
||||||
|
```bash
|
||||||
|
systemctl restart nscd
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Invalidate kanidm cache:
|
||||||
|
```bash
|
||||||
|
kanidm-unix cache-invalidate
|
||||||
|
```
|
||||||
|
|
||||||
|
### Changes not taking effect after deployment
|
||||||
|
|
||||||
|
NixOS uses nsncd (a Rust reimplementation of nscd) for NSS caching. After deploying
|
||||||
|
kanidm-unixd config changes, you may need to restart both services:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart kanidm-unixd
|
||||||
|
systemctl restart nscd
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test PAM authentication directly
|
||||||
|
|
||||||
|
Use the kanidm-unix CLI to test PAM auth without SSH:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kanidm-unix auth-test --name <username>
|
||||||
|
```
|
||||||
73
flake.lock
generated
73
flake.lock
generated
@@ -28,11 +28,11 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1770447502,
|
"lastModified": 1771004123,
|
||||||
"narHash": "sha256-xH1PNyE3ydj4udhe1IpK8VQxBPZETGLuORZdSWYRmSU=",
|
"narHash": "sha256-Jw36EzL4IGIc2TmeZGphAAUrJXoWqfvCbybF8bTHgMA=",
|
||||||
"ref": "master",
|
"ref": "master",
|
||||||
"rev": "79db119d1ca6630023947ef0a65896cc3307c2ff",
|
"rev": "e5e8be86ecdcae8a5962ba3bddddfe91b574792b",
|
||||||
"revCount": 22,
|
"revCount": 36,
|
||||||
"type": "git",
|
"type": "git",
|
||||||
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
||||||
},
|
},
|
||||||
@@ -42,27 +42,6 @@
|
|||||||
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"labmon": {
|
|
||||||
"inputs": {
|
|
||||||
"nixpkgs": [
|
|
||||||
"nixpkgs-unstable"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"locked": {
|
|
||||||
"lastModified": 1748983975,
|
|
||||||
"narHash": "sha256-DA5mOqxwLMj/XLb4hvBU1WtE6cuVej7PjUr8N0EZsCE=",
|
|
||||||
"ref": "master",
|
|
||||||
"rev": "040a73e891a70ff06ec7ab31d7167914129dbf7d",
|
|
||||||
"revCount": 17,
|
|
||||||
"type": "git",
|
|
||||||
"url": "https://git.t-juice.club/torjus/labmon"
|
|
||||||
},
|
|
||||||
"original": {
|
|
||||||
"ref": "master",
|
|
||||||
"type": "git",
|
|
||||||
"url": "https://git.t-juice.club/torjus/labmon"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nixos-exporter": {
|
"nixos-exporter": {
|
||||||
"inputs": {
|
"inputs": {
|
||||||
"nixpkgs": [
|
"nixpkgs": [
|
||||||
@@ -70,11 +49,11 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1770422522,
|
"lastModified": 1770593543,
|
||||||
"narHash": "sha256-WmIFnquu4u58v8S2bOVWmknRwHn4x88CRfBFTzJ1inQ=",
|
"narHash": "sha256-hT8Rj6JAwGDFvcxWEcUzTCrWSiupCfBa57pBDnM2C5g=",
|
||||||
"ref": "refs/heads/master",
|
"ref": "refs/heads/master",
|
||||||
"rev": "cf0ce858997af4d8dcc2ce10393ff393e17fc911",
|
"rev": "5aa5f7275b7a08015816171ba06d2cbdc2e02d3e",
|
||||||
"revCount": 11,
|
"revCount": 15,
|
||||||
"type": "git",
|
"type": "git",
|
||||||
"url": "https://git.t-juice.club/torjus/nixos-exporter"
|
"url": "https://git.t-juice.club/torjus/nixos-exporter"
|
||||||
},
|
},
|
||||||
@@ -85,11 +64,11 @@
|
|||||||
},
|
},
|
||||||
"nixpkgs": {
|
"nixpkgs": {
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1770136044,
|
"lastModified": 1770770419,
|
||||||
"narHash": "sha256-tlFqNG/uzz2++aAmn4v8J0vAkV3z7XngeIIB3rM3650=",
|
"narHash": "sha256-iKZMkr6Cm9JzWlRYW/VPoL0A9jVKtZYiU4zSrVeetIs=",
|
||||||
"owner": "nixos",
|
"owner": "nixos",
|
||||||
"repo": "nixpkgs",
|
"repo": "nixpkgs",
|
||||||
"rev": "e576e3c9cf9bad747afcddd9e34f51d18c855b4e",
|
"rev": "6c5e707c6b5339359a9a9e215c5e66d6d802fd7a",
|
||||||
"type": "github"
|
"type": "github"
|
||||||
},
|
},
|
||||||
"original": {
|
"original": {
|
||||||
@@ -101,11 +80,11 @@
|
|||||||
},
|
},
|
||||||
"nixpkgs-unstable": {
|
"nixpkgs-unstable": {
|
||||||
"locked": {
|
"locked": {
|
||||||
"lastModified": 1770197578,
|
"lastModified": 1770562336,
|
||||||
"narHash": "sha256-AYqlWrX09+HvGs8zM6ebZ1pwUqjkfpnv8mewYwAo+iM=",
|
"narHash": "sha256-ub1gpAONMFsT/GU2hV6ZWJjur8rJ6kKxdm9IlCT0j84=",
|
||||||
"owner": "nixos",
|
"owner": "nixos",
|
||||||
"repo": "nixpkgs",
|
"repo": "nixpkgs",
|
||||||
"rev": "00c21e4c93d963c50d4c0c89bfa84ed6e0694df2",
|
"rev": "d6c71932130818840fc8fe9509cf50be8c64634f",
|
||||||
"type": "github"
|
"type": "github"
|
||||||
},
|
},
|
||||||
"original": {
|
"original": {
|
||||||
@@ -119,31 +98,9 @@
|
|||||||
"inputs": {
|
"inputs": {
|
||||||
"alerttonotify": "alerttonotify",
|
"alerttonotify": "alerttonotify",
|
||||||
"homelab-deploy": "homelab-deploy",
|
"homelab-deploy": "homelab-deploy",
|
||||||
"labmon": "labmon",
|
|
||||||
"nixos-exporter": "nixos-exporter",
|
"nixos-exporter": "nixos-exporter",
|
||||||
"nixpkgs": "nixpkgs",
|
"nixpkgs": "nixpkgs",
|
||||||
"nixpkgs-unstable": "nixpkgs-unstable",
|
"nixpkgs-unstable": "nixpkgs-unstable"
|
||||||
"sops-nix": "sops-nix"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"sops-nix": {
|
|
||||||
"inputs": {
|
|
||||||
"nixpkgs": [
|
|
||||||
"nixpkgs-unstable"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"locked": {
|
|
||||||
"lastModified": 1770145881,
|
|
||||||
"narHash": "sha256-ktjWTq+D5MTXQcL9N6cDZXUf9kX8JBLLBLT0ZyOTSYY=",
|
|
||||||
"owner": "Mic92",
|
|
||||||
"repo": "sops-nix",
|
|
||||||
"rev": "17eea6f3816ba6568b8c81db8a4e6ca438b30b7c",
|
|
||||||
"type": "github"
|
|
||||||
},
|
|
||||||
"original": {
|
|
||||||
"owner": "Mic92",
|
|
||||||
"repo": "sops-nix",
|
|
||||||
"type": "github"
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|||||||
171
flake.nix
171
flake.nix
@@ -5,18 +5,10 @@
|
|||||||
nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-25.11";
|
nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-25.11";
|
||||||
nixpkgs-unstable.url = "github:nixos/nixpkgs?ref=nixos-unstable";
|
nixpkgs-unstable.url = "github:nixos/nixpkgs?ref=nixos-unstable";
|
||||||
|
|
||||||
sops-nix = {
|
|
||||||
url = "github:Mic92/sops-nix";
|
|
||||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
|
||||||
};
|
|
||||||
alerttonotify = {
|
alerttonotify = {
|
||||||
url = "git+https://git.t-juice.club/torjus/alerttonotify?ref=master";
|
url = "git+https://git.t-juice.club/torjus/alerttonotify?ref=master";
|
||||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||||
};
|
};
|
||||||
labmon = {
|
|
||||||
url = "git+https://git.t-juice.club/torjus/labmon?ref=master";
|
|
||||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
|
||||||
};
|
|
||||||
nixos-exporter = {
|
nixos-exporter = {
|
||||||
url = "git+https://git.t-juice.club/torjus/nixos-exporter";
|
url = "git+https://git.t-juice.club/torjus/nixos-exporter";
|
||||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||||
@@ -32,9 +24,7 @@
|
|||||||
self,
|
self,
|
||||||
nixpkgs,
|
nixpkgs,
|
||||||
nixpkgs-unstable,
|
nixpkgs-unstable,
|
||||||
sops-nix,
|
|
||||||
alerttonotify,
|
alerttonotify,
|
||||||
labmon,
|
|
||||||
nixos-exporter,
|
nixos-exporter,
|
||||||
homelab-deploy,
|
homelab-deploy,
|
||||||
...
|
...
|
||||||
@@ -50,7 +40,6 @@
|
|||||||
commonOverlays = [
|
commonOverlays = [
|
||||||
overlay-unstable
|
overlay-unstable
|
||||||
alerttonotify.overlays.default
|
alerttonotify.overlays.default
|
||||||
labmon.overlays.default
|
|
||||||
];
|
];
|
||||||
# Common modules applied to all hosts
|
# Common modules applied to all hosts
|
||||||
commonModules = [
|
commonModules = [
|
||||||
@@ -61,7 +50,6 @@
|
|||||||
system.configurationRevision = self.rev or self.dirtyRev or "dirty";
|
system.configurationRevision = self.rev or self.dirtyRev or "dirty";
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
nixos-exporter.nixosModules.default
|
nixos-exporter.nixosModules.default
|
||||||
homelab-deploy.nixosModules.default
|
homelab-deploy.nixosModules.default
|
||||||
./modules/homelab
|
./modules/homelab
|
||||||
@@ -77,46 +65,19 @@
|
|||||||
in
|
in
|
||||||
{
|
{
|
||||||
nixosConfigurations = {
|
nixosConfigurations = {
|
||||||
ns1 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = commonModules ++ [
|
|
||||||
./hosts/ns1
|
|
||||||
];
|
|
||||||
};
|
|
||||||
ns2 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = commonModules ++ [
|
|
||||||
./hosts/ns2
|
|
||||||
];
|
|
||||||
};
|
|
||||||
ha1 = nixpkgs.lib.nixosSystem {
|
ha1 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/ha1
|
./hosts/ha1
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
template1 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = commonModules ++ [
|
|
||||||
./hosts/template
|
|
||||||
];
|
|
||||||
};
|
|
||||||
template2 = nixpkgs.lib.nixosSystem {
|
template2 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/template2
|
./hosts/template2
|
||||||
@@ -125,92 +86,127 @@
|
|||||||
http-proxy = nixpkgs.lib.nixosSystem {
|
http-proxy = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/http-proxy
|
./hosts/http-proxy
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
ca = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = commonModules ++ [
|
|
||||||
./hosts/ca
|
|
||||||
];
|
|
||||||
};
|
|
||||||
monitoring01 = nixpkgs.lib.nixosSystem {
|
monitoring01 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/monitoring01
|
./hosts/monitoring01
|
||||||
labmon.nixosModules.labmon
|
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
jelly01 = nixpkgs.lib.nixosSystem {
|
jelly01 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/jelly01
|
./hosts/jelly01
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
nix-cache01 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = commonModules ++ [
|
|
||||||
./hosts/nix-cache01
|
|
||||||
];
|
|
||||||
};
|
|
||||||
pgdb1 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = commonModules ++ [
|
|
||||||
./hosts/pgdb1
|
|
||||||
];
|
|
||||||
};
|
|
||||||
nats1 = nixpkgs.lib.nixosSystem {
|
nats1 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/nats1
|
./hosts/nats1
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
testvm01 = nixpkgs.lib.nixosSystem {
|
|
||||||
inherit system;
|
|
||||||
specialArgs = {
|
|
||||||
inherit inputs self sops-nix;
|
|
||||||
};
|
|
||||||
modules = commonModules ++ [
|
|
||||||
./hosts/testvm01
|
|
||||||
];
|
|
||||||
};
|
|
||||||
vault01 = nixpkgs.lib.nixosSystem {
|
vault01 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/vault01
|
./hosts/vault01
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
vaulttest01 = nixpkgs.lib.nixosSystem {
|
testvm01 = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {
|
specialArgs = {
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
};
|
};
|
||||||
modules = commonModules ++ [
|
modules = commonModules ++ [
|
||||||
./hosts/vaulttest01
|
./hosts/testvm01
|
||||||
|
];
|
||||||
|
};
|
||||||
|
testvm02 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/testvm02
|
||||||
|
];
|
||||||
|
};
|
||||||
|
testvm03 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/testvm03
|
||||||
|
];
|
||||||
|
};
|
||||||
|
ns2 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/ns2
|
||||||
|
];
|
||||||
|
};
|
||||||
|
ns1 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/ns1
|
||||||
|
];
|
||||||
|
};
|
||||||
|
kanidm01 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/kanidm01
|
||||||
|
];
|
||||||
|
};
|
||||||
|
monitoring02 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/monitoring02
|
||||||
|
];
|
||||||
|
};
|
||||||
|
nix-cache02 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/nix-cache02
|
||||||
|
];
|
||||||
|
};
|
||||||
|
garage01 = nixpkgs.lib.nixosSystem {
|
||||||
|
inherit system;
|
||||||
|
specialArgs = {
|
||||||
|
inherit inputs self;
|
||||||
|
};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/garage01
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
@@ -229,9 +225,12 @@
|
|||||||
pkgs.ansible
|
pkgs.ansible
|
||||||
pkgs.opentofu
|
pkgs.opentofu
|
||||||
pkgs.openbao
|
pkgs.openbao
|
||||||
|
pkgs.kanidm_1_8
|
||||||
|
pkgs.nkeys
|
||||||
(pkgs.callPackage ./scripts/create-host { })
|
(pkgs.callPackage ./scripts/create-host { })
|
||||||
homelab-deploy.packages.${pkgs.system}.default
|
homelab-deploy.packages.${pkgs.system}.default
|
||||||
];
|
];
|
||||||
|
ANSIBLE_CONFIG = "./ansible/ansible.cfg";
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
);
|
);
|
||||||
|
|||||||
@@ -1,33 +1,37 @@
|
|||||||
{
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
pkgs,
|
pkgs,
|
||||||
...
|
...
|
||||||
}:
|
}:
|
||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
homelab.dns.cnames = [ "nix-cache" "actions1" ];
|
# Host metadata (adjust as needed)
|
||||||
|
homelab.host = {
|
||||||
homelab.host.role = "build-host";
|
tier = "test"; # Start in test tier, move to prod after validation
|
||||||
|
role = "storage";
|
||||||
fileSystems."/nix" = {
|
|
||||||
device = "/dev/disk/by-label/nixcache";
|
|
||||||
fsType = "xfs";
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
homelab.dns.cnames = [ "s3" ];
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
# Use the systemd-boot EFI boot loader.
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub = {
|
boot.loader.grub.device = "/dev/vda";
|
||||||
enable = true;
|
|
||||||
device = "/dev/sda";
|
|
||||||
configurationLimit = 3;
|
|
||||||
};
|
|
||||||
|
|
||||||
networking.hostName = "nix-cache01";
|
networking.hostName = "garage01";
|
||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
@@ -41,7 +45,7 @@
|
|||||||
systemd.network.networks."ens18" = {
|
systemd.network.networks."ens18" = {
|
||||||
matchConfig.Name = "ens18";
|
matchConfig.Name = "ens18";
|
||||||
address = [
|
address = [
|
||||||
"10.69.13.15/24"
|
"10.69.13.26/24"
|
||||||
];
|
];
|
||||||
routes = [
|
routes = [
|
||||||
{ Gateway = "10.69.13.1"; }
|
{ Gateway = "10.69.13.1"; }
|
||||||
@@ -54,9 +58,6 @@
|
|||||||
"nix-command"
|
"nix-command"
|
||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
vault.enable = true;
|
|
||||||
homelab.deploy.enable = true;
|
|
||||||
|
|
||||||
nix.settings.tarball-ttl = 0;
|
nix.settings.tarball-ttl = 0;
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
vim
|
vim
|
||||||
@@ -64,13 +65,11 @@
|
|||||||
git
|
git
|
||||||
];
|
];
|
||||||
|
|
||||||
services.qemuGuest.enable = true;
|
|
||||||
|
|
||||||
# Open ports in the firewall.
|
# Open ports in the firewall.
|
||||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
system.stateVersion = "24.05"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,6 @@
|
|||||||
{ ... }:
|
{ ... }: {
|
||||||
{
|
|
||||||
imports = [
|
imports = [
|
||||||
./configuration.nix
|
./configuration.nix
|
||||||
../../services/postgres
|
../../services/garage
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
@@ -7,12 +7,14 @@
|
|||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
./hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
|
homelab.host.role = "home-automation";
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
# Use the systemd-boot EFI boot loader.
|
# Use the systemd-boot EFI boot loader.
|
||||||
boot.loader.grub = {
|
boot.loader.grub = {
|
||||||
@@ -85,6 +87,7 @@
|
|||||||
"--keep-monthly 6"
|
"--keep-monthly 6"
|
||||||
"--keep-within 1d"
|
"--keep-within 1d"
|
||||||
];
|
];
|
||||||
|
extraOptions = [ "--retry-lock=5m" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
# Open ports in the firewall.
|
# Open ports in the firewall.
|
||||||
|
|||||||
@@ -5,12 +5,13 @@
|
|||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
./hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
|
homelab.host.role = "proxy";
|
||||||
homelab.dns.cnames = [
|
homelab.dns.cnames = [
|
||||||
"nzbget"
|
"nzbget"
|
||||||
"radarr"
|
"radarr"
|
||||||
|
|||||||
42
hosts/http-proxy/hardware-configuration.nix
Normal file
42
hosts/http-proxy/hardware-configuration.nix
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
modulesPath,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
(modulesPath + "/profiles/qemu-guest.nix")
|
||||||
|
];
|
||||||
|
boot.initrd.availableKernelModules = [
|
||||||
|
"ata_piix"
|
||||||
|
"uhci_hcd"
|
||||||
|
"virtio_pci"
|
||||||
|
"virtio_scsi"
|
||||||
|
"sd_mod"
|
||||||
|
"sr_mod"
|
||||||
|
];
|
||||||
|
boot.initrd.kernelModules = [ "dm-snapshot" ];
|
||||||
|
boot.kernelModules = [
|
||||||
|
"ptp_kvm"
|
||||||
|
];
|
||||||
|
boot.extraModulePackages = [ ];
|
||||||
|
|
||||||
|
fileSystems."/" = {
|
||||||
|
device = "/dev/disk/by-label/root";
|
||||||
|
fsType = "xfs";
|
||||||
|
};
|
||||||
|
|
||||||
|
swapDevices = [ { device = "/dev/disk/by-label/swap"; } ];
|
||||||
|
|
||||||
|
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
||||||
|
# (the default) this is the recommended approach. When using systemd-networkd it's
|
||||||
|
# still possible to use this option, but it's recommended to use it in conjunction
|
||||||
|
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
||||||
|
networking.useDHCP = lib.mkDefault true;
|
||||||
|
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
||||||
|
|
||||||
|
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
||||||
|
}
|
||||||
@@ -5,12 +5,14 @@
|
|||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
./hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
|
homelab.host.role = "media";
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
# Use the systemd-boot EFI boot loader.
|
# Use the systemd-boot EFI boot loader.
|
||||||
boot.loader.grub = {
|
boot.loader.grub = {
|
||||||
@@ -61,9 +63,8 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
zramSwap = {
|
vault.enable = true;
|
||||||
enable = true;
|
homelab.deploy.enable = true;
|
||||||
};
|
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "23.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
|
|||||||
42
hosts/jelly01/hardware-configuration.nix
Normal file
42
hosts/jelly01/hardware-configuration.nix
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
modulesPath,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
(modulesPath + "/profiles/qemu-guest.nix")
|
||||||
|
];
|
||||||
|
boot.initrd.availableKernelModules = [
|
||||||
|
"ata_piix"
|
||||||
|
"uhci_hcd"
|
||||||
|
"virtio_pci"
|
||||||
|
"virtio_scsi"
|
||||||
|
"sd_mod"
|
||||||
|
"sr_mod"
|
||||||
|
];
|
||||||
|
boot.initrd.kernelModules = [ "dm-snapshot" ];
|
||||||
|
boot.kernelModules = [
|
||||||
|
"ptp_kvm"
|
||||||
|
];
|
||||||
|
boot.extraModulePackages = [ ];
|
||||||
|
|
||||||
|
fileSystems."/" = {
|
||||||
|
device = "/dev/disk/by-label/root";
|
||||||
|
fsType = "xfs";
|
||||||
|
};
|
||||||
|
|
||||||
|
swapDevices = [ { device = "/dev/disk/by-label/swap"; } ];
|
||||||
|
|
||||||
|
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
||||||
|
# (the default) this is the recommended approach. When using systemd-networkd it's
|
||||||
|
# still possible to use this option, but it's recommended to use it in conjunction
|
||||||
|
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
||||||
|
networking.useDHCP = lib.mkDefault true;
|
||||||
|
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
||||||
|
|
||||||
|
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
||||||
|
}
|
||||||
@@ -1,56 +0,0 @@
|
|||||||
{ config, lib, pkgs, ... }:
|
|
||||||
|
|
||||||
{
|
|
||||||
imports =
|
|
||||||
[
|
|
||||||
../template/hardware-configuration.nix
|
|
||||||
../../system
|
|
||||||
];
|
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
|
||||||
|
|
||||||
homelab.host.role = "bastion";
|
|
||||||
|
|
||||||
# Use the systemd-boot EFI boot loader.
|
|
||||||
boot.loader.grub.enable = true;
|
|
||||||
boot.loader.grub.device = "/dev/sda";
|
|
||||||
|
|
||||||
networking.hostName = "jump";
|
|
||||||
networking.domain = "home.2rjus.net";
|
|
||||||
networking.useNetworkd = true;
|
|
||||||
networking.useDHCP = false;
|
|
||||||
services.resolved.enable = false;
|
|
||||||
networking.nameservers = [
|
|
||||||
"10.69.13.5"
|
|
||||||
"10.69.13.6"
|
|
||||||
];
|
|
||||||
|
|
||||||
systemd.network.enable = true;
|
|
||||||
systemd.network.networks."ens18" = {
|
|
||||||
matchConfig.Name = "ens18";
|
|
||||||
address = [
|
|
||||||
"10.69.13.10/24"
|
|
||||||
];
|
|
||||||
routes = [
|
|
||||||
{ Gateway = "10.69.13.1"; }
|
|
||||||
];
|
|
||||||
linkConfig.RequiredForOnline = "routable";
|
|
||||||
};
|
|
||||||
time.timeZone = "Europe/Oslo";
|
|
||||||
|
|
||||||
nix.settings.experimental-features = [ "nix-command" "flakes" ];
|
|
||||||
environment.systemPackages = with pkgs; [
|
|
||||||
vim
|
|
||||||
wget
|
|
||||||
git
|
|
||||||
];
|
|
||||||
|
|
||||||
# Open ports in the firewall.
|
|
||||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
|
||||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
|
||||||
# Or disable the firewall altogether.
|
|
||||||
networking.firewall.enable = false;
|
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
|
||||||
}
|
|
||||||
|
|
||||||
@@ -1,36 +0,0 @@
|
|||||||
{ config, lib, pkgs, modulesPath, ... }:
|
|
||||||
|
|
||||||
{
|
|
||||||
imports =
|
|
||||||
[
|
|
||||||
(modulesPath + "/profiles/qemu-guest.nix")
|
|
||||||
];
|
|
||||||
|
|
||||||
boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "virtio_pci" "virtio_scsi" "sd_mod" "sr_mod" ];
|
|
||||||
boot.initrd.kernelModules = [ ];
|
|
||||||
# boot.kernelModules = [ ];
|
|
||||||
# boot.extraModulePackages = [ ];
|
|
||||||
|
|
||||||
fileSystems."/" =
|
|
||||||
{
|
|
||||||
device = "/dev/disk/by-uuid/6889aba9-61ed-4687-ab10-e5cf4017ac8d";
|
|
||||||
fsType = "xfs";
|
|
||||||
};
|
|
||||||
|
|
||||||
fileSystems."/boot" =
|
|
||||||
{
|
|
||||||
device = "/dev/disk/by-uuid/BC07-3B7A";
|
|
||||||
fsType = "vfat";
|
|
||||||
};
|
|
||||||
|
|
||||||
swapDevices =
|
|
||||||
[{ device = "/dev/disk/by-uuid/64e5757b-6625-4dd2-aa2a-66ca93444d23"; }];
|
|
||||||
|
|
||||||
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
|
||||||
# (the default) this is the recommended approach. When using systemd-networkd it's
|
|
||||||
# still possible to use this option, but it's recommended to use it in conjunction
|
|
||||||
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
|
||||||
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
|
||||||
|
|
||||||
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
|
||||||
}
|
|
||||||
76
hosts/kanidm01/configuration.nix
Normal file
76
hosts/kanidm01/configuration.nix
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
|
../../system
|
||||||
|
../../common/vm
|
||||||
|
../../services/kanidm
|
||||||
|
];
|
||||||
|
|
||||||
|
homelab.host = {
|
||||||
|
tier = "prod";
|
||||||
|
role = "auth";
|
||||||
|
};
|
||||||
|
|
||||||
|
# DNS CNAME for auth.home.2rjus.net
|
||||||
|
homelab.dns.cnames = [ "auth" ];
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
boot.loader.grub.enable = true;
|
||||||
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
|
networking.hostName = "kanidm01";
|
||||||
|
networking.domain = "home.2rjus.net";
|
||||||
|
networking.useNetworkd = true;
|
||||||
|
networking.useDHCP = false;
|
||||||
|
services.resolved.enable = true;
|
||||||
|
networking.nameservers = [
|
||||||
|
"10.69.13.5"
|
||||||
|
"10.69.13.6"
|
||||||
|
];
|
||||||
|
|
||||||
|
systemd.network.enable = true;
|
||||||
|
systemd.network.networks."ens18" = {
|
||||||
|
matchConfig.Name = "ens18";
|
||||||
|
address = [
|
||||||
|
"10.69.13.23/24"
|
||||||
|
];
|
||||||
|
routes = [
|
||||||
|
{ Gateway = "10.69.13.1"; }
|
||||||
|
];
|
||||||
|
linkConfig.RequiredForOnline = "routable";
|
||||||
|
};
|
||||||
|
time.timeZone = "Europe/Oslo";
|
||||||
|
|
||||||
|
nix.settings.experimental-features = [
|
||||||
|
"nix-command"
|
||||||
|
"flakes"
|
||||||
|
];
|
||||||
|
nix.settings.tarball-ttl = 0;
|
||||||
|
environment.systemPackages = with pkgs; [
|
||||||
|
vim
|
||||||
|
wget
|
||||||
|
git
|
||||||
|
];
|
||||||
|
|
||||||
|
# Open ports in the firewall.
|
||||||
|
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||||
|
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||||
|
# Or disable the firewall altogether.
|
||||||
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
|
}
|
||||||
@@ -5,12 +5,14 @@
|
|||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
./hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
|
homelab.host.role = "monitoring";
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
# Use the systemd-boot EFI boot loader.
|
# Use the systemd-boot EFI boot loader.
|
||||||
boot.loader.grub = {
|
boot.loader.grub = {
|
||||||
@@ -81,6 +83,7 @@
|
|||||||
"--keep-monthly 6"
|
"--keep-monthly 6"
|
||||||
"--keep-within 1d"
|
"--keep-within 1d"
|
||||||
];
|
];
|
||||||
|
extraOptions = [ "--retry-lock=5m" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
services.restic.backups.grafana-db = {
|
services.restic.backups.grafana-db = {
|
||||||
@@ -98,61 +101,7 @@
|
|||||||
"--keep-monthly 6"
|
"--keep-monthly 6"
|
||||||
"--keep-within 1d"
|
"--keep-within 1d"
|
||||||
];
|
];
|
||||||
};
|
extraOptions = [ "--retry-lock=5m" ];
|
||||||
|
|
||||||
labmon = {
|
|
||||||
enable = true;
|
|
||||||
|
|
||||||
settings = {
|
|
||||||
ListenAddr = ":9969";
|
|
||||||
Profiling = true;
|
|
||||||
StepMonitors = [
|
|
||||||
{
|
|
||||||
Enabled = true;
|
|
||||||
BaseURL = "https://ca.home.2rjus.net";
|
|
||||||
RootID = "3381bda8015a86b9a3cd1851439d1091890a79005e0f1f7c4301fe4bccc29d80";
|
|
||||||
}
|
|
||||||
];
|
|
||||||
|
|
||||||
TLSConnectionMonitors = [
|
|
||||||
{
|
|
||||||
Enabled = true;
|
|
||||||
Address = "ca.home.2rjus.net:443";
|
|
||||||
Verify = true;
|
|
||||||
Duration = "12h";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Enabled = true;
|
|
||||||
Address = "jelly.home.2rjus.net:443";
|
|
||||||
Verify = true;
|
|
||||||
Duration = "12h";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Enabled = true;
|
|
||||||
Address = "grafana.home.2rjus.net:443";
|
|
||||||
Verify = true;
|
|
||||||
Duration = "12h";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Enabled = true;
|
|
||||||
Address = "prometheus.home.2rjus.net:443";
|
|
||||||
Verify = true;
|
|
||||||
Duration = "12h";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Enabled = true;
|
|
||||||
Address = "alertmanager.home.2rjus.net:443";
|
|
||||||
Verify = true;
|
|
||||||
Duration = "12h";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Enabled = true;
|
|
||||||
Address = "pyroscope.home.2rjus.net:443";
|
|
||||||
Verify = true;
|
|
||||||
Duration = "12h";
|
|
||||||
}
|
|
||||||
];
|
|
||||||
};
|
|
||||||
};
|
};
|
||||||
|
|
||||||
# Open ports in the firewall.
|
# Open ports in the firewall.
|
||||||
|
|||||||
42
hosts/monitoring01/hardware-configuration.nix
Normal file
42
hosts/monitoring01/hardware-configuration.nix
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
modulesPath,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
(modulesPath + "/profiles/qemu-guest.nix")
|
||||||
|
];
|
||||||
|
boot.initrd.availableKernelModules = [
|
||||||
|
"ata_piix"
|
||||||
|
"uhci_hcd"
|
||||||
|
"virtio_pci"
|
||||||
|
"virtio_scsi"
|
||||||
|
"sd_mod"
|
||||||
|
"sr_mod"
|
||||||
|
];
|
||||||
|
boot.initrd.kernelModules = [ "dm-snapshot" ];
|
||||||
|
boot.kernelModules = [
|
||||||
|
"ptp_kvm"
|
||||||
|
];
|
||||||
|
boot.extraModulePackages = [ ];
|
||||||
|
|
||||||
|
fileSystems."/" = {
|
||||||
|
device = "/dev/disk/by-label/root";
|
||||||
|
fsType = "xfs";
|
||||||
|
};
|
||||||
|
|
||||||
|
swapDevices = [ { device = "/dev/disk/by-label/swap"; } ];
|
||||||
|
|
||||||
|
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
||||||
|
# (the default) this is the recommended approach. When using systemd-networkd it's
|
||||||
|
# still possible to use this option, but it's recommended to use it in conjunction
|
||||||
|
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
||||||
|
networking.useDHCP = lib.mkDefault true;
|
||||||
|
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
||||||
|
|
||||||
|
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
||||||
|
}
|
||||||
@@ -1,25 +1,37 @@
|
|||||||
{
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
pkgs,
|
pkgs,
|
||||||
...
|
...
|
||||||
}:
|
}:
|
||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
homelab.host = {
|
||||||
# Use the systemd-boot EFI boot loader.
|
tier = "prod";
|
||||||
boot.loader.grub = {
|
role = "monitoring";
|
||||||
enable = true;
|
|
||||||
device = "/dev/sda";
|
|
||||||
configurationLimit = 3;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
networking.hostName = "ca";
|
# DNS CNAME for Grafana test instance
|
||||||
|
homelab.dns.cnames = [ "grafana-test" ];
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
boot.loader.grub.enable = true;
|
||||||
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
|
networking.hostName = "monitoring02";
|
||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
@@ -33,7 +45,7 @@
|
|||||||
systemd.network.networks."ens18" = {
|
systemd.network.networks."ens18" = {
|
||||||
matchConfig.Name = "ens18";
|
matchConfig.Name = "ens18";
|
||||||
address = [
|
address = [
|
||||||
"10.69.13.12/24"
|
"10.69.13.24/24"
|
||||||
];
|
];
|
||||||
routes = [
|
routes = [
|
||||||
{ Gateway = "10.69.13.1"; }
|
{ Gateway = "10.69.13.1"; }
|
||||||
@@ -59,5 +71,5 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,6 @@
|
|||||||
{ ... }: {
|
{ ... }: {
|
||||||
imports = [
|
imports = [
|
||||||
./hardware-configuration.nix
|
|
||||||
./configuration.nix
|
./configuration.nix
|
||||||
./scripts.nix
|
../../services/grafana
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
@@ -5,12 +5,14 @@
|
|||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
./hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
|
homelab.host.role = "messaging";
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
# Use the systemd-boot EFI boot loader.
|
# Use the systemd-boot EFI boot loader.
|
||||||
boot.loader.grub = {
|
boot.loader.grub = {
|
||||||
@@ -59,5 +61,8 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "23.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
|
|||||||
42
hosts/nats1/hardware-configuration.nix
Normal file
42
hosts/nats1/hardware-configuration.nix
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
modulesPath,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
(modulesPath + "/profiles/qemu-guest.nix")
|
||||||
|
];
|
||||||
|
boot.initrd.availableKernelModules = [
|
||||||
|
"ata_piix"
|
||||||
|
"uhci_hcd"
|
||||||
|
"virtio_pci"
|
||||||
|
"virtio_scsi"
|
||||||
|
"sd_mod"
|
||||||
|
"sr_mod"
|
||||||
|
];
|
||||||
|
boot.initrd.kernelModules = [ "dm-snapshot" ];
|
||||||
|
boot.kernelModules = [
|
||||||
|
"ptp_kvm"
|
||||||
|
];
|
||||||
|
boot.extraModulePackages = [ ];
|
||||||
|
|
||||||
|
fileSystems."/" = {
|
||||||
|
device = "/dev/disk/by-label/root";
|
||||||
|
fsType = "xfs";
|
||||||
|
};
|
||||||
|
|
||||||
|
swapDevices = [ { device = "/dev/disk/by-label/swap"; } ];
|
||||||
|
|
||||||
|
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
||||||
|
# (the default) this is the recommended approach. When using systemd-networkd it's
|
||||||
|
# still possible to use this option, but it's recommended to use it in conjunction
|
||||||
|
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
||||||
|
networking.useDHCP = lib.mkDefault true;
|
||||||
|
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
||||||
|
|
||||||
|
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
||||||
|
}
|
||||||
@@ -1,6 +0,0 @@
|
|||||||
{ ... }:
|
|
||||||
{
|
|
||||||
zramSwap = {
|
|
||||||
enable = true;
|
|
||||||
};
|
|
||||||
}
|
|
||||||
45
hosts/nix-cache02/builder.nix
Normal file
45
hosts/nix-cache02/builder.nix
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
{ config, ... }:
|
||||||
|
{
|
||||||
|
# Fetch builder NKey from Vault
|
||||||
|
vault.secrets.builder-nkey = {
|
||||||
|
secretPath = "shared/homelab-deploy/builder-nkey";
|
||||||
|
extractKey = "nkey";
|
||||||
|
outputDir = "/run/secrets/builder-nkey";
|
||||||
|
services = [ "homelab-deploy-builder" ];
|
||||||
|
};
|
||||||
|
|
||||||
|
# Configure the builder service
|
||||||
|
services.homelab-deploy.builder = {
|
||||||
|
enable = true;
|
||||||
|
natsUrl = "nats://nats1.home.2rjus.net:4222";
|
||||||
|
nkeyFile = "/run/secrets/builder-nkey";
|
||||||
|
|
||||||
|
settings.repos = {
|
||||||
|
nixos-servers = {
|
||||||
|
url = "git+https://git.t-juice.club/torjus/nixos-servers.git";
|
||||||
|
defaultBranch = "master";
|
||||||
|
};
|
||||||
|
nixos = {
|
||||||
|
url = "git+https://git.t-juice.club/torjus/nixos.git";
|
||||||
|
defaultBranch = "master";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
timeout = 7200;
|
||||||
|
metrics.enable = true;
|
||||||
|
};
|
||||||
|
|
||||||
|
# Expose builder metrics for Prometheus scraping
|
||||||
|
homelab.monitoring.scrapeTargets = [
|
||||||
|
{
|
||||||
|
job_name = "homelab-deploy-builder";
|
||||||
|
port = 9973;
|
||||||
|
}
|
||||||
|
];
|
||||||
|
|
||||||
|
# Ensure builder starts after vault secret is available
|
||||||
|
systemd.services.homelab-deploy-builder = {
|
||||||
|
after = [ "vault-secret-builder-nkey.service" ];
|
||||||
|
requires = [ "vault-secret-builder-nkey.service" ];
|
||||||
|
};
|
||||||
|
}
|
||||||
@@ -1,25 +1,36 @@
|
|||||||
{
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
pkgs,
|
pkgs,
|
||||||
...
|
...
|
||||||
}:
|
}:
|
||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
];
|
];
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
homelab.host = {
|
||||||
# Use the systemd-boot EFI boot loader.
|
tier = "prod";
|
||||||
boot.loader.grub = {
|
role = "build-host";
|
||||||
enable = true;
|
|
||||||
device = "/dev/sda";
|
|
||||||
configurationLimit = 3;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
networking.hostName = "pgdb1";
|
homelab.dns.cnames = [ "nix-cache" ];
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
boot.loader.grub.enable = true;
|
||||||
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
|
networking.hostName = "nix-cache02";
|
||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
@@ -33,7 +44,7 @@
|
|||||||
systemd.network.networks."ens18" = {
|
systemd.network.networks."ens18" = {
|
||||||
matchConfig.Name = "ens18";
|
matchConfig.Name = "ens18";
|
||||||
address = [
|
address = [
|
||||||
"10.69.13.16/24"
|
"10.69.13.25/24"
|
||||||
];
|
];
|
||||||
routes = [
|
routes = [
|
||||||
{ Gateway = "10.69.13.1"; }
|
{ Gateway = "10.69.13.1"; }
|
||||||
@@ -59,5 +70,5 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
@@ -1,9 +1,8 @@
|
|||||||
{ ... }:
|
{ ... }: {
|
||||||
{
|
|
||||||
imports = [
|
imports = [
|
||||||
./configuration.nix
|
./configuration.nix
|
||||||
|
./builder.nix
|
||||||
|
./scheduler.nix
|
||||||
../../services/nix-cache
|
../../services/nix-cache
|
||||||
../../services/actions-runner
|
|
||||||
./zram.nix
|
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
61
hosts/nix-cache02/scheduler.nix
Normal file
61
hosts/nix-cache02/scheduler.nix
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
{ config, pkgs, lib, inputs, ... }:
|
||||||
|
let
|
||||||
|
homelab-deploy = inputs.homelab-deploy.packages.${pkgs.system}.default;
|
||||||
|
|
||||||
|
scheduledBuildScript = pkgs.writeShellApplication {
|
||||||
|
name = "scheduled-build";
|
||||||
|
runtimeInputs = [ homelab-deploy ];
|
||||||
|
text = ''
|
||||||
|
NATS_URL="nats://nats1.home.2rjus.net:4222"
|
||||||
|
NKEY_FILE="/run/secrets/scheduler-nkey"
|
||||||
|
|
||||||
|
echo "Starting scheduled builds at $(date)"
|
||||||
|
|
||||||
|
# Build all nixos-servers hosts
|
||||||
|
homelab-deploy build \
|
||||||
|
--nats-url "$NATS_URL" \
|
||||||
|
--nkey-file "$NKEY_FILE" \
|
||||||
|
nixos-servers --all
|
||||||
|
|
||||||
|
# Build all nixos (gunter) hosts
|
||||||
|
homelab-deploy build \
|
||||||
|
--nats-url "$NATS_URL" \
|
||||||
|
--nkey-file "$NKEY_FILE" \
|
||||||
|
nixos --all
|
||||||
|
|
||||||
|
echo "Scheduled builds completed at $(date)"
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
in
|
||||||
|
{
|
||||||
|
# Fetch scheduler NKey from Vault
|
||||||
|
vault.secrets.scheduler-nkey = {
|
||||||
|
secretPath = "shared/homelab-deploy/scheduler-nkey";
|
||||||
|
extractKey = "nkey";
|
||||||
|
outputDir = "/run/secrets/scheduler-nkey";
|
||||||
|
services = [ "scheduled-build" ];
|
||||||
|
};
|
||||||
|
|
||||||
|
# Timer: every 2 hours
|
||||||
|
systemd.timers.scheduled-build = {
|
||||||
|
description = "Trigger scheduled Nix builds";
|
||||||
|
wantedBy = [ "timers.target" ];
|
||||||
|
timerConfig = {
|
||||||
|
OnCalendar = "*-*-* 00/2:00:00"; # Every 2 hours at :00
|
||||||
|
Persistent = true; # Run missed builds on boot
|
||||||
|
RandomizedDelaySec = "5m"; # Slight jitter
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
# Service: oneshot that triggers builds
|
||||||
|
systemd.services.scheduled-build = {
|
||||||
|
description = "Trigger builds for all hosts via NATS";
|
||||||
|
after = [ "network-online.target" "vault-secret-scheduler-nkey.service" ];
|
||||||
|
requires = [ "vault-secret-scheduler-nkey.service" ];
|
||||||
|
wants = [ "network-online.target" ];
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
ExecStart = lib.getExe scheduledBuildScript;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
@@ -7,23 +7,38 @@
|
|||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
|
../../common/vm
|
||||||
|
|
||||||
|
# DNS services
|
||||||
../../services/ns/master-authorative.nix
|
../../services/ns/master-authorative.nix
|
||||||
../../services/ns/resolver.nix
|
../../services/ns/resolver.nix
|
||||||
../../common/vm
|
|
||||||
];
|
];
|
||||||
|
|
||||||
|
# Host metadata
|
||||||
|
homelab.host = {
|
||||||
|
tier = "prod";
|
||||||
|
role = "dns";
|
||||||
|
labels.dns_role = "primary";
|
||||||
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
# Use the systemd-boot EFI boot loader.
|
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/sda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
networking.hostName = "ns1";
|
networking.hostName = "ns1";
|
||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
|
# Disable resolved - conflicts with Unbound resolver
|
||||||
services.resolved.enable = false;
|
services.resolved.enable = false;
|
||||||
networking.nameservers = [
|
networking.nameservers = [
|
||||||
"10.69.13.5"
|
"10.69.13.5"
|
||||||
@@ -47,14 +62,6 @@
|
|||||||
"nix-command"
|
"nix-command"
|
||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
vault.enable = true;
|
|
||||||
homelab.deploy.enable = true;
|
|
||||||
|
|
||||||
homelab.host = {
|
|
||||||
role = "dns";
|
|
||||||
labels.dns_role = "primary";
|
|
||||||
};
|
|
||||||
|
|
||||||
nix.settings.tarball-ttl = 0;
|
nix.settings.tarball-ttl = 0;
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
vim
|
vim
|
||||||
@@ -68,5 +75,5 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
@@ -1,36 +0,0 @@
|
|||||||
{ config, lib, pkgs, modulesPath, ... }:
|
|
||||||
|
|
||||||
{
|
|
||||||
imports =
|
|
||||||
[
|
|
||||||
(modulesPath + "/profiles/qemu-guest.nix")
|
|
||||||
];
|
|
||||||
|
|
||||||
boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "virtio_pci" "virtio_scsi" "sd_mod" "sr_mod" ];
|
|
||||||
boot.initrd.kernelModules = [ ];
|
|
||||||
# boot.kernelModules = [ ];
|
|
||||||
# boot.extraModulePackages = [ ];
|
|
||||||
|
|
||||||
fileSystems."/" =
|
|
||||||
{
|
|
||||||
device = "/dev/disk/by-uuid/6889aba9-61ed-4687-ab10-e5cf4017ac8d";
|
|
||||||
fsType = "xfs";
|
|
||||||
};
|
|
||||||
|
|
||||||
fileSystems."/boot" =
|
|
||||||
{
|
|
||||||
device = "/dev/disk/by-uuid/BC07-3B7A";
|
|
||||||
fsType = "vfat";
|
|
||||||
};
|
|
||||||
|
|
||||||
swapDevices =
|
|
||||||
[{ device = "/dev/disk/by-uuid/64e5757b-6625-4dd2-aa2a-66ca93444d23"; }];
|
|
||||||
|
|
||||||
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
|
||||||
# (the default) this is the recommended approach. When using systemd-networkd it's
|
|
||||||
# still possible to use this option, but it's recommended to use it in conjunction
|
|
||||||
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
|
||||||
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
|
||||||
|
|
||||||
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
|
||||||
}
|
|
||||||
@@ -7,23 +7,38 @@
|
|||||||
|
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
../template/hardware-configuration.nix
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
|
../../common/vm
|
||||||
|
|
||||||
|
# DNS services
|
||||||
../../services/ns/secondary-authorative.nix
|
../../services/ns/secondary-authorative.nix
|
||||||
../../services/ns/resolver.nix
|
../../services/ns/resolver.nix
|
||||||
../../common/vm
|
|
||||||
];
|
];
|
||||||
|
|
||||||
|
# Host metadata
|
||||||
|
homelab.host = {
|
||||||
|
tier = "prod";
|
||||||
|
role = "dns";
|
||||||
|
labels.dns_role = "secondary";
|
||||||
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
# Use the systemd-boot EFI boot loader.
|
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/sda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
networking.hostName = "ns2";
|
networking.hostName = "ns2";
|
||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
|
# Disable resolved - conflicts with Unbound resolver
|
||||||
services.resolved.enable = false;
|
services.resolved.enable = false;
|
||||||
networking.nameservers = [
|
networking.nameservers = [
|
||||||
"10.69.13.5"
|
"10.69.13.5"
|
||||||
@@ -47,14 +62,7 @@
|
|||||||
"nix-command"
|
"nix-command"
|
||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
vault.enable = true;
|
nix.settings.tarball-ttl = 0;
|
||||||
homelab.deploy.enable = true;
|
|
||||||
|
|
||||||
homelab.host = {
|
|
||||||
role = "dns";
|
|
||||||
labels.dns_role = "secondary";
|
|
||||||
};
|
|
||||||
|
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
vim
|
vim
|
||||||
wget
|
wget
|
||||||
@@ -67,5 +75,5 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
@@ -1,36 +0,0 @@
|
|||||||
{ config, lib, pkgs, modulesPath, ... }:
|
|
||||||
|
|
||||||
{
|
|
||||||
imports =
|
|
||||||
[
|
|
||||||
(modulesPath + "/profiles/qemu-guest.nix")
|
|
||||||
];
|
|
||||||
|
|
||||||
boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "virtio_pci" "virtio_scsi" "sd_mod" "sr_mod" ];
|
|
||||||
boot.initrd.kernelModules = [ ];
|
|
||||||
# boot.kernelModules = [ ];
|
|
||||||
# boot.extraModulePackages = [ ];
|
|
||||||
|
|
||||||
fileSystems."/" =
|
|
||||||
{
|
|
||||||
device = "/dev/disk/by-uuid/6889aba9-61ed-4687-ab10-e5cf4017ac8d";
|
|
||||||
fsType = "xfs";
|
|
||||||
};
|
|
||||||
|
|
||||||
fileSystems."/boot" =
|
|
||||||
{
|
|
||||||
device = "/dev/disk/by-uuid/BC07-3B7A";
|
|
||||||
fsType = "vfat";
|
|
||||||
};
|
|
||||||
|
|
||||||
swapDevices =
|
|
||||||
[{ device = "/dev/disk/by-uuid/64e5757b-6625-4dd2-aa2a-66ca93444d23"; }];
|
|
||||||
|
|
||||||
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
|
|
||||||
# (the default) this is the recommended approach. When using systemd-networkd it's
|
|
||||||
# still possible to use this option, but it's recommended to use it in conjunction
|
|
||||||
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
|
|
||||||
# networking.interfaces.ens18.useDHCP = lib.mkDefault true;
|
|
||||||
|
|
||||||
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
|
|
||||||
}
|
|
||||||
@@ -1,36 +0,0 @@
|
|||||||
{ pkgs, ... }:
|
|
||||||
let
|
|
||||||
prepare-host-script = pkgs.writeShellApplication {
|
|
||||||
name = "prepare-host.sh";
|
|
||||||
runtimeInputs = [ pkgs.age ];
|
|
||||||
text = ''
|
|
||||||
echo "Removing machine-id"
|
|
||||||
rm -f /etc/machine-id || true
|
|
||||||
|
|
||||||
echo "Removing SSH host keys"
|
|
||||||
rm -f /etc/ssh/ssh_host_* || true
|
|
||||||
|
|
||||||
echo "Restarting SSH"
|
|
||||||
systemctl restart sshd
|
|
||||||
|
|
||||||
echo "Removing temporary files"
|
|
||||||
rm -rf /tmp/* || true
|
|
||||||
|
|
||||||
echo "Removing logs"
|
|
||||||
journalctl --rotate || true
|
|
||||||
journalctl --vacuum-time=1s || true
|
|
||||||
|
|
||||||
echo "Removing cache"
|
|
||||||
rm -rf /var/cache/* || true
|
|
||||||
|
|
||||||
echo "Generate age key"
|
|
||||||
rm -rf /var/lib/sops-nix || true
|
|
||||||
mkdir -p /var/lib/sops-nix
|
|
||||||
age-keygen -o /var/lib/sops-nix/key.txt
|
|
||||||
'';
|
|
||||||
};
|
|
||||||
in
|
|
||||||
{
|
|
||||||
environment.systemPackages = [ prepare-host-script ];
|
|
||||||
users.motd = "Prepare host by running 'prepare-host.sh'.";
|
|
||||||
}
|
|
||||||
@@ -6,22 +6,72 @@ let
|
|||||||
text = ''
|
text = ''
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
|
LOKI_URL="http://monitoring01.home.2rjus.net:3100/loki/api/v1/push"
|
||||||
|
|
||||||
|
# Send a log entry to Loki with bootstrap status
|
||||||
|
# Usage: log_to_loki <stage> <message>
|
||||||
|
# Fails silently if Loki is unreachable
|
||||||
|
log_to_loki() {
|
||||||
|
local stage="$1"
|
||||||
|
local message="$2"
|
||||||
|
local timestamp_ns
|
||||||
|
timestamp_ns="$(date +%s)000000000"
|
||||||
|
|
||||||
|
local payload
|
||||||
|
payload=$(jq -n \
|
||||||
|
--arg host "$HOSTNAME" \
|
||||||
|
--arg stage "$stage" \
|
||||||
|
--arg branch "''${BRANCH:-master}" \
|
||||||
|
--arg ts "$timestamp_ns" \
|
||||||
|
--arg msg "$message" \
|
||||||
|
'{
|
||||||
|
streams: [{
|
||||||
|
stream: {
|
||||||
|
job: "bootstrap",
|
||||||
|
hostname: $host,
|
||||||
|
stage: $stage,
|
||||||
|
branch: $branch
|
||||||
|
},
|
||||||
|
values: [[$ts, $msg]]
|
||||||
|
}]
|
||||||
|
}')
|
||||||
|
|
||||||
|
curl -s --connect-timeout 2 --max-time 5 \
|
||||||
|
-X POST \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d "$payload" \
|
||||||
|
"$LOKI_URL" >/dev/null 2>&1 || true
|
||||||
|
}
|
||||||
|
|
||||||
|
echo "================================================================================"
|
||||||
|
echo " NIXOS BOOTSTRAP IN PROGRESS"
|
||||||
|
echo "================================================================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
# Read hostname set by cloud-init (from Terraform VM name via user-data)
|
# Read hostname set by cloud-init (from Terraform VM name via user-data)
|
||||||
# Cloud-init sets the system hostname from user-data.txt, so we read it from hostnamectl
|
# Cloud-init sets the system hostname from user-data.txt, so we read it from hostnamectl
|
||||||
HOSTNAME=$(hostnamectl hostname)
|
HOSTNAME=$(hostnamectl hostname)
|
||||||
echo "DEBUG: Hostname from hostnamectl: '$HOSTNAME'"
|
# Read git branch from environment, default to master
|
||||||
|
BRANCH="''${NIXOS_FLAKE_BRANCH:-master}"
|
||||||
|
|
||||||
|
echo "Hostname: $HOSTNAME"
|
||||||
|
echo ""
|
||||||
echo "Starting NixOS bootstrap for host: $HOSTNAME"
|
echo "Starting NixOS bootstrap for host: $HOSTNAME"
|
||||||
|
|
||||||
|
log_to_loki "starting" "Bootstrap starting for $HOSTNAME (branch: $BRANCH)"
|
||||||
|
|
||||||
echo "Waiting for network connectivity..."
|
echo "Waiting for network connectivity..."
|
||||||
|
|
||||||
# Verify we can reach the git server via HTTPS (doesn't respond to ping)
|
# Verify we can reach the git server via HTTPS (doesn't respond to ping)
|
||||||
if ! curl -s --connect-timeout 5 --max-time 10 https://git.t-juice.club >/dev/null 2>&1; then
|
if ! curl -s --connect-timeout 5 --max-time 10 https://git.t-juice.club >/dev/null 2>&1; then
|
||||||
echo "ERROR: Cannot reach git.t-juice.club via HTTPS"
|
echo "ERROR: Cannot reach git.t-juice.club via HTTPS"
|
||||||
echo "Check network configuration and DNS settings"
|
echo "Check network configuration and DNS settings"
|
||||||
|
log_to_loki "failed" "Network check failed - cannot reach git.t-juice.club"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
echo "Network connectivity confirmed"
|
echo "Network connectivity confirmed"
|
||||||
|
log_to_loki "network_ok" "Network connectivity confirmed"
|
||||||
|
|
||||||
# Unwrap Vault token and store AppRole credentials (if provided)
|
# Unwrap Vault token and store AppRole credentials (if provided)
|
||||||
if [ -n "''${VAULT_WRAPPED_TOKEN:-}" ]; then
|
if [ -n "''${VAULT_WRAPPED_TOKEN:-}" ]; then
|
||||||
@@ -50,6 +100,7 @@ let
|
|||||||
chmod 600 /var/lib/vault/approle/secret-id
|
chmod 600 /var/lib/vault/approle/secret-id
|
||||||
|
|
||||||
echo "Vault credentials unwrapped and stored successfully"
|
echo "Vault credentials unwrapped and stored successfully"
|
||||||
|
log_to_loki "vault_ok" "Vault credentials unwrapped and stored"
|
||||||
else
|
else
|
||||||
echo "WARNING: Failed to unwrap Vault token"
|
echo "WARNING: Failed to unwrap Vault token"
|
||||||
if [ -n "$UNWRAP_RESPONSE" ]; then
|
if [ -n "$UNWRAP_RESPONSE" ]; then
|
||||||
@@ -63,17 +114,17 @@ let
|
|||||||
echo "To regenerate token, run: create-host --hostname $HOSTNAME --force"
|
echo "To regenerate token, run: create-host --hostname $HOSTNAME --force"
|
||||||
echo ""
|
echo ""
|
||||||
echo "Vault secrets will not be available, but continuing bootstrap..."
|
echo "Vault secrets will not be available, but continuing bootstrap..."
|
||||||
|
log_to_loki "vault_warn" "Failed to unwrap Vault token - continuing without secrets"
|
||||||
fi
|
fi
|
||||||
else
|
else
|
||||||
echo "No Vault wrapped token provided (VAULT_WRAPPED_TOKEN not set)"
|
echo "No Vault wrapped token provided (VAULT_WRAPPED_TOKEN not set)"
|
||||||
echo "Skipping Vault credential setup"
|
echo "Skipping Vault credential setup"
|
||||||
|
log_to_loki "vault_skip" "No Vault token provided - skipping credential setup"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
echo "Fetching and building NixOS configuration from flake..."
|
echo "Fetching and building NixOS configuration from flake..."
|
||||||
|
|
||||||
# Read git branch from environment, default to master
|
|
||||||
BRANCH="''${NIXOS_FLAKE_BRANCH:-master}"
|
|
||||||
echo "Using git branch: $BRANCH"
|
echo "Using git branch: $BRANCH"
|
||||||
|
log_to_loki "building" "Starting nixos-rebuild boot"
|
||||||
|
|
||||||
# Build and activate the host-specific configuration
|
# Build and activate the host-specific configuration
|
||||||
FLAKE_URL="git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#''${HOSTNAME}"
|
FLAKE_URL="git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#''${HOSTNAME}"
|
||||||
@@ -81,18 +132,30 @@ let
|
|||||||
if nixos-rebuild boot --flake "$FLAKE_URL"; then
|
if nixos-rebuild boot --flake "$FLAKE_URL"; then
|
||||||
echo "Successfully built configuration for $HOSTNAME"
|
echo "Successfully built configuration for $HOSTNAME"
|
||||||
echo "Rebooting into new configuration..."
|
echo "Rebooting into new configuration..."
|
||||||
|
log_to_loki "success" "Build successful - rebooting into new configuration"
|
||||||
sleep 2
|
sleep 2
|
||||||
systemctl reboot
|
systemctl reboot
|
||||||
else
|
else
|
||||||
echo "ERROR: nixos-rebuild failed for $HOSTNAME"
|
echo "ERROR: nixos-rebuild failed for $HOSTNAME"
|
||||||
echo "Check that flake has configuration for this hostname"
|
echo "Check that flake has configuration for this hostname"
|
||||||
echo "Manual intervention required - system will not reboot"
|
echo "Manual intervention required - system will not reboot"
|
||||||
|
log_to_loki "failed" "nixos-rebuild failed - manual intervention required"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
'';
|
'';
|
||||||
};
|
};
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
|
# Custom greeting line to indicate this is a bootstrap image
|
||||||
|
services.getty.greetingLine = lib.mkForce ''
|
||||||
|
================================================================================
|
||||||
|
BOOTSTRAP IMAGE - NixOS \V (\l)
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
Bootstrap service is running. Logs are displayed on tty1.
|
||||||
|
Check status: journalctl -fu nixos-bootstrap
|
||||||
|
'';
|
||||||
|
|
||||||
systemd.services."nixos-bootstrap" = {
|
systemd.services."nixos-bootstrap" = {
|
||||||
description = "Bootstrap NixOS configuration from flake on first boot";
|
description = "Bootstrap NixOS configuration from flake on first boot";
|
||||||
|
|
||||||
@@ -107,12 +170,12 @@ in
|
|||||||
serviceConfig = {
|
serviceConfig = {
|
||||||
Type = "oneshot";
|
Type = "oneshot";
|
||||||
RemainAfterExit = true;
|
RemainAfterExit = true;
|
||||||
ExecStart = "${bootstrap-script}/bin/nixos-bootstrap";
|
ExecStart = lib.getExe bootstrap-script;
|
||||||
|
|
||||||
# Read environment variables from cloud-init (set by cloud-init write_files)
|
# Read environment variables from cloud-init (set by cloud-init write_files)
|
||||||
EnvironmentFile = "-/run/cloud-init-env";
|
EnvironmentFile = "-/run/cloud-init-env";
|
||||||
|
|
||||||
# Logging to journald
|
# Log to journal and console
|
||||||
StandardOutput = "journal+console";
|
StandardOutput = "journal+console";
|
||||||
StandardError = "journal+console";
|
StandardError = "journal+console";
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -35,6 +35,7 @@
|
|||||||
homelab.host = {
|
homelab.host = {
|
||||||
tier = "test";
|
tier = "test";
|
||||||
priority = "low";
|
priority = "low";
|
||||||
|
labels.ansible = "false"; # Exclude from Ansible inventory
|
||||||
};
|
};
|
||||||
|
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
@@ -58,6 +59,14 @@
|
|||||||
"flakes"
|
"flakes"
|
||||||
];
|
];
|
||||||
nix.settings.tarball-ttl = 0;
|
nix.settings.tarball-ttl = 0;
|
||||||
|
nix.settings.substituters = [
|
||||||
|
"https://nix-cache.home.2rjus.net"
|
||||||
|
"https://cache.nixos.org"
|
||||||
|
];
|
||||||
|
nix.settings.trusted-public-keys = [
|
||||||
|
"nix-cache.home.2rjus.net-1:2kowZOG6pvhoK4AHVO3alBlvcghH20wchzoR0V86UWI="
|
||||||
|
"cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY="
|
||||||
|
];
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
age
|
age
|
||||||
vim
|
vim
|
||||||
@@ -71,5 +80,8 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
|
# Compressed swap in RAM - prevents OOM during bootstrap nixos-rebuild
|
||||||
|
zramSwap.enable = true;
|
||||||
|
|
||||||
system.stateVersion = "25.11";
|
system.stateVersion = "25.11";
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -2,7 +2,6 @@
|
|||||||
let
|
let
|
||||||
prepare-host-script = pkgs.writeShellApplication {
|
prepare-host-script = pkgs.writeShellApplication {
|
||||||
name = "prepare-host.sh";
|
name = "prepare-host.sh";
|
||||||
runtimeInputs = [ pkgs.age ];
|
|
||||||
text = ''
|
text = ''
|
||||||
echo "Removing machine-id"
|
echo "Removing machine-id"
|
||||||
rm -f /etc/machine-id || true
|
rm -f /etc/machine-id || true
|
||||||
@@ -22,11 +21,6 @@ let
|
|||||||
|
|
||||||
echo "Removing cache"
|
echo "Removing cache"
|
||||||
rm -rf /var/cache/* || true
|
rm -rf /var/cache/* || true
|
||||||
|
|
||||||
echo "Generate age key"
|
|
||||||
rm -rf /var/lib/sops-nix || true
|
|
||||||
mkdir -p /var/lib/sops-nix
|
|
||||||
age-keygen -o /var/lib/sops-nix/key.txt
|
|
||||||
'';
|
'';
|
||||||
};
|
};
|
||||||
in
|
in
|
||||||
|
|||||||
@@ -11,16 +11,23 @@
|
|||||||
|
|
||||||
../../system
|
../../system
|
||||||
../../common/vm
|
../../common/vm
|
||||||
|
../../common/ssh-audit.nix
|
||||||
];
|
];
|
||||||
|
|
||||||
# Test VM - exclude from DNS zone generation
|
|
||||||
homelab.dns.enable = false;
|
|
||||||
|
|
||||||
homelab.host = {
|
homelab.host = {
|
||||||
tier = "test";
|
tier = "test";
|
||||||
priority = "low";
|
role = "test";
|
||||||
};
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
# Enable Kanidm PAM/NSS for central authentication
|
||||||
|
homelab.kanidm.enable = true;
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/vda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
@@ -29,7 +36,7 @@
|
|||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
services.resolved.enable = false;
|
services.resolved.enable = true;
|
||||||
networking.nameservers = [
|
networking.nameservers = [
|
||||||
"10.69.13.5"
|
"10.69.13.5"
|
||||||
"10.69.13.6"
|
"10.69.13.6"
|
||||||
@@ -39,7 +46,7 @@
|
|||||||
systemd.network.networks."ens18" = {
|
systemd.network.networks."ens18" = {
|
||||||
matchConfig.Name = "ens18";
|
matchConfig.Name = "ens18";
|
||||||
address = [
|
address = [
|
||||||
"10.69.13.101/24"
|
"10.69.13.20/24"
|
||||||
];
|
];
|
||||||
routes = [
|
routes = [
|
||||||
{ Gateway = "10.69.13.1"; }
|
{ Gateway = "10.69.13.1"; }
|
||||||
@@ -59,6 +66,39 @@
|
|||||||
git
|
git
|
||||||
];
|
];
|
||||||
|
|
||||||
|
# Test nginx with ACME certificate from OpenBao PKI
|
||||||
|
services.nginx = {
|
||||||
|
enable = true;
|
||||||
|
virtualHosts."testvm01.home.2rjus.net" = {
|
||||||
|
forceSSL = true;
|
||||||
|
enableACME = true;
|
||||||
|
locations."/" = {
|
||||||
|
root = pkgs.writeTextDir "index.html" ''
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>testvm01 - ACME Test</title>
|
||||||
|
<style>
|
||||||
|
body { font-family: monospace; max-width: 600px; margin: 50px auto; padding: 20px; }
|
||||||
|
.joke { background: #f0f0f0; padding: 20px; border-radius: 8px; margin: 20px 0; }
|
||||||
|
.punchline { margin-top: 15px; font-weight: bold; }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>OpenBao PKI ACME Test</h1>
|
||||||
|
<p>If you're seeing this over HTTPS, the migration worked!</p>
|
||||||
|
<div class="joke">
|
||||||
|
<p>Why do programmers prefer dark mode?</p>
|
||||||
|
<p class="punchline">Because light attracts bugs.</p>
|
||||||
|
</div>
|
||||||
|
<p><small>Certificate issued by: vault.home.2rjus.net</small></p>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
# Open ports in the firewall.
|
# Open ports in the firewall.
|
||||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||||
|
|||||||
@@ -1,25 +1,38 @@
|
|||||||
{ config, lib, pkgs, ... }:
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
{
|
{
|
||||||
imports =
|
imports = [
|
||||||
[
|
../template2/hardware-configuration.nix
|
||||||
./hardware-configuration.nix
|
|
||||||
|
|
||||||
../../system
|
../../system
|
||||||
];
|
../../common/vm
|
||||||
|
../../common/ssh-audit.nix
|
||||||
# Template host - exclude from DNS zone generation
|
];
|
||||||
homelab.dns.enable = false;
|
|
||||||
|
|
||||||
homelab.host = {
|
homelab.host = {
|
||||||
tier = "test";
|
tier = "test";
|
||||||
priority = "low";
|
role = "test";
|
||||||
};
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
# Enable Kanidm PAM/NSS for central authentication
|
||||||
|
homelab.kanidm.enable = true;
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/sda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
networking.hostName = "nixos-template";
|
|
||||||
|
networking.hostName = "testvm02";
|
||||||
networking.domain = "home.2rjus.net";
|
networking.domain = "home.2rjus.net";
|
||||||
networking.useNetworkd = true;
|
networking.useNetworkd = true;
|
||||||
networking.useDHCP = false;
|
networking.useDHCP = false;
|
||||||
@@ -33,19 +46,21 @@
|
|||||||
systemd.network.networks."ens18" = {
|
systemd.network.networks."ens18" = {
|
||||||
matchConfig.Name = "ens18";
|
matchConfig.Name = "ens18";
|
||||||
address = [
|
address = [
|
||||||
"10.69.8.250/24"
|
"10.69.13.21/24"
|
||||||
];
|
];
|
||||||
routes = [
|
routes = [
|
||||||
{ Gateway = "10.69.8.1"; }
|
{ Gateway = "10.69.13.1"; }
|
||||||
];
|
];
|
||||||
linkConfig.RequiredForOnline = "routable";
|
linkConfig.RequiredForOnline = "routable";
|
||||||
};
|
};
|
||||||
time.timeZone = "Europe/Oslo";
|
time.timeZone = "Europe/Oslo";
|
||||||
|
|
||||||
nix.settings.experimental-features = [ "nix-command" "flakes" ];
|
nix.settings.experimental-features = [
|
||||||
|
"nix-command"
|
||||||
|
"flakes"
|
||||||
|
];
|
||||||
nix.settings.tarball-ttl = 0;
|
nix.settings.tarball-ttl = 0;
|
||||||
environment.systemPackages = with pkgs; [
|
environment.systemPackages = with pkgs; [
|
||||||
age
|
|
||||||
vim
|
vim
|
||||||
wget
|
wget
|
||||||
git
|
git
|
||||||
@@ -57,6 +72,5 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
system.stateVersion = "23.11"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
|
|
||||||
76
hosts/testvm03/configuration.nix
Normal file
76
hosts/testvm03/configuration.nix
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
{
|
||||||
|
config,
|
||||||
|
lib,
|
||||||
|
pkgs,
|
||||||
|
...
|
||||||
|
}:
|
||||||
|
|
||||||
|
{
|
||||||
|
imports = [
|
||||||
|
../template2/hardware-configuration.nix
|
||||||
|
|
||||||
|
../../system
|
||||||
|
../../common/vm
|
||||||
|
../../common/ssh-audit.nix
|
||||||
|
];
|
||||||
|
|
||||||
|
homelab.host = {
|
||||||
|
tier = "test";
|
||||||
|
role = "test";
|
||||||
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
# Enable Kanidm PAM/NSS for central authentication
|
||||||
|
homelab.kanidm.enable = true;
|
||||||
|
|
||||||
|
nixpkgs.config.allowUnfree = true;
|
||||||
|
boot.loader.grub.enable = true;
|
||||||
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|
||||||
|
networking.hostName = "testvm03";
|
||||||
|
networking.domain = "home.2rjus.net";
|
||||||
|
networking.useNetworkd = true;
|
||||||
|
networking.useDHCP = false;
|
||||||
|
services.resolved.enable = true;
|
||||||
|
networking.nameservers = [
|
||||||
|
"10.69.13.5"
|
||||||
|
"10.69.13.6"
|
||||||
|
];
|
||||||
|
|
||||||
|
systemd.network.enable = true;
|
||||||
|
systemd.network.networks."ens18" = {
|
||||||
|
matchConfig.Name = "ens18";
|
||||||
|
address = [
|
||||||
|
"10.69.13.22/24"
|
||||||
|
];
|
||||||
|
routes = [
|
||||||
|
{ Gateway = "10.69.13.1"; }
|
||||||
|
];
|
||||||
|
linkConfig.RequiredForOnline = "routable";
|
||||||
|
};
|
||||||
|
time.timeZone = "Europe/Oslo";
|
||||||
|
|
||||||
|
nix.settings.experimental-features = [
|
||||||
|
"nix-command"
|
||||||
|
"flakes"
|
||||||
|
];
|
||||||
|
nix.settings.tarball-ttl = 0;
|
||||||
|
environment.systemPackages = with pkgs; [
|
||||||
|
vim
|
||||||
|
wget
|
||||||
|
git
|
||||||
|
];
|
||||||
|
|
||||||
|
# Open ports in the firewall.
|
||||||
|
# networking.firewall.allowedTCPPorts = [ ... ];
|
||||||
|
# networking.firewall.allowedUDPPorts = [ ... ];
|
||||||
|
# Or disable the firewall altogether.
|
||||||
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
|
}
|
||||||
@@ -1,7 +1,5 @@
|
|||||||
{ ... }:
|
{ ... }: {
|
||||||
{
|
|
||||||
imports = [
|
imports = [
|
||||||
./configuration.nix
|
./configuration.nix
|
||||||
../../services/ca
|
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
@@ -62,6 +62,16 @@
|
|||||||
# Or disable the firewall altogether.
|
# Or disable the firewall altogether.
|
||||||
networking.firewall.enable = false;
|
networking.firewall.enable = false;
|
||||||
|
|
||||||
|
# Vault fetches secrets from itself (after unseal)
|
||||||
|
vault.enable = true;
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
|
# Ensure vault-secret services wait for openbao to be unsealed
|
||||||
|
systemd.services.vault-secret-homelab-deploy-nkey = {
|
||||||
|
after = [ "openbao.service" ];
|
||||||
|
wants = [ "openbao.service" ];
|
||||||
|
};
|
||||||
|
|
||||||
system.stateVersion = "25.11"; # Did you read the comment?
|
system.stateVersion = "25.11"; # Did you read the comment?
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,135 +0,0 @@
|
|||||||
{
|
|
||||||
config,
|
|
||||||
lib,
|
|
||||||
pkgs,
|
|
||||||
...
|
|
||||||
}:
|
|
||||||
|
|
||||||
let
|
|
||||||
vault-test-script = pkgs.writeShellApplication {
|
|
||||||
name = "vault-test";
|
|
||||||
text = ''
|
|
||||||
echo "=== Vault Secret Test ==="
|
|
||||||
echo "Secret path: hosts/vaulttest01/test-service"
|
|
||||||
|
|
||||||
if [ -f /run/secrets/test-service/password ]; then
|
|
||||||
echo "✓ Password file exists"
|
|
||||||
echo "Password length: $(wc -c < /run/secrets/test-service/password)"
|
|
||||||
else
|
|
||||||
echo "✗ Password file missing!"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [ -d /var/lib/vault/cache/test-service ]; then
|
|
||||||
echo "✓ Cache directory exists"
|
|
||||||
else
|
|
||||||
echo "✗ Cache directory missing!"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo "Test successful!"
|
|
||||||
'';
|
|
||||||
};
|
|
||||||
in
|
|
||||||
{
|
|
||||||
imports = [
|
|
||||||
../template2/hardware-configuration.nix
|
|
||||||
|
|
||||||
../../system
|
|
||||||
../../common/vm
|
|
||||||
];
|
|
||||||
|
|
||||||
homelab.host = {
|
|
||||||
tier = "test";
|
|
||||||
priority = "low";
|
|
||||||
role = "vault";
|
|
||||||
};
|
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
|
||||||
boot.loader.grub.enable = true;
|
|
||||||
boot.loader.grub.device = "/dev/vda";
|
|
||||||
|
|
||||||
networking.hostName = "vaulttest01";
|
|
||||||
networking.domain = "home.2rjus.net";
|
|
||||||
networking.useNetworkd = true;
|
|
||||||
networking.useDHCP = false;
|
|
||||||
services.resolved.enable = true;
|
|
||||||
networking.nameservers = [
|
|
||||||
"10.69.13.5"
|
|
||||||
"10.69.13.6"
|
|
||||||
];
|
|
||||||
|
|
||||||
systemd.network.enable = true;
|
|
||||||
systemd.network.networks."ens18" = {
|
|
||||||
matchConfig.Name = "ens18";
|
|
||||||
address = [
|
|
||||||
"10.69.13.150/24"
|
|
||||||
];
|
|
||||||
routes = [
|
|
||||||
{ Gateway = "10.69.13.1"; }
|
|
||||||
];
|
|
||||||
linkConfig.RequiredForOnline = "routable";
|
|
||||||
};
|
|
||||||
time.timeZone = "Europe/Oslo";
|
|
||||||
|
|
||||||
nix.settings.experimental-features = [
|
|
||||||
"nix-command"
|
|
||||||
"flakes"
|
|
||||||
];
|
|
||||||
nix.settings.tarball-ttl = 0;
|
|
||||||
environment.systemPackages = with pkgs; [
|
|
||||||
vim
|
|
||||||
wget
|
|
||||||
git
|
|
||||||
htop # test deploy verification
|
|
||||||
];
|
|
||||||
|
|
||||||
# Open ports in the firewall.
|
|
||||||
# networking.firewall.allowedTCPPorts = [ ... ];
|
|
||||||
# networking.firewall.allowedUDPPorts = [ ... ];
|
|
||||||
# Or disable the firewall altogether.
|
|
||||||
networking.firewall.enable = false;
|
|
||||||
|
|
||||||
# Testing config
|
|
||||||
# Enable Vault secrets management
|
|
||||||
vault.enable = true;
|
|
||||||
homelab.deploy.enable = true;
|
|
||||||
|
|
||||||
# Define a test secret
|
|
||||||
vault.secrets.test-service = {
|
|
||||||
secretPath = "hosts/vaulttest01/test-service";
|
|
||||||
restartTrigger = true;
|
|
||||||
restartInterval = "daily";
|
|
||||||
services = [ "vault-test" ];
|
|
||||||
};
|
|
||||||
|
|
||||||
# Create a test service that uses the secret
|
|
||||||
systemd.services.vault-test = {
|
|
||||||
description = "Test Vault secret fetching";
|
|
||||||
wantedBy = [ "multi-user.target" ];
|
|
||||||
after = [ "vault-secret-test-service.service" ];
|
|
||||||
|
|
||||||
serviceConfig = {
|
|
||||||
Type = "oneshot";
|
|
||||||
RemainAfterExit = true;
|
|
||||||
|
|
||||||
ExecStart = lib.getExe vault-test-script;
|
|
||||||
|
|
||||||
StandardOutput = "journal+console";
|
|
||||||
};
|
|
||||||
};
|
|
||||||
|
|
||||||
# Test ACME certificate issuance from OpenBao PKI
|
|
||||||
# Override the global ACME server (from system/acme.nix) to use OpenBao instead of step-ca
|
|
||||||
security.acme.defaults.server = lib.mkForce "https://vault01.home.2rjus.net:8200/v1/pki_int/acme/directory";
|
|
||||||
|
|
||||||
# Request a certificate for this host
|
|
||||||
# Using HTTP-01 challenge with standalone listener on port 80
|
|
||||||
security.acme.certs."vaulttest01.home.2rjus.net" = {
|
|
||||||
listenHTTP = ":80";
|
|
||||||
enableDebugLogs = true;
|
|
||||||
};
|
|
||||||
|
|
||||||
system.stateVersion = "25.11"; # Did you read the comment?
|
|
||||||
}
|
|
||||||
|
|
||||||
@@ -21,6 +21,7 @@ let
|
|||||||
cfg = hostConfig.config;
|
cfg = hostConfig.config;
|
||||||
monConfig = (cfg.homelab or { }).monitoring or { enable = true; scrapeTargets = [ ]; };
|
monConfig = (cfg.homelab or { }).monitoring or { enable = true; scrapeTargets = [ ]; };
|
||||||
dnsConfig = (cfg.homelab or { }).dns or { enable = true; };
|
dnsConfig = (cfg.homelab or { }).dns or { enable = true; };
|
||||||
|
hostConfig' = (cfg.homelab or { }).host or { };
|
||||||
hostname = cfg.networking.hostName;
|
hostname = cfg.networking.hostName;
|
||||||
networks = cfg.systemd.network.networks or { };
|
networks = cfg.systemd.network.networks or { };
|
||||||
|
|
||||||
@@ -49,20 +50,72 @@ let
|
|||||||
inherit hostname;
|
inherit hostname;
|
||||||
ip = extractIP firstAddress;
|
ip = extractIP firstAddress;
|
||||||
scrapeTargets = monConfig.scrapeTargets or [ ];
|
scrapeTargets = monConfig.scrapeTargets or [ ];
|
||||||
|
# Host metadata for label propagation
|
||||||
|
tier = hostConfig'.tier or "prod";
|
||||||
|
priority = hostConfig'.priority or "high";
|
||||||
|
role = hostConfig'.role or null;
|
||||||
|
labels = hostConfig'.labels or { };
|
||||||
};
|
};
|
||||||
|
|
||||||
|
# Build effective labels for a host
|
||||||
|
# Always includes hostname and tier; only includes priority/role if non-default
|
||||||
|
buildEffectiveLabels = host:
|
||||||
|
{ hostname = host.hostname; tier = host.tier; }
|
||||||
|
// (lib.optionalAttrs (host.priority != "high") { priority = host.priority; })
|
||||||
|
// (lib.optionalAttrs (host.role != null) { role = host.role; })
|
||||||
|
// host.labels;
|
||||||
|
|
||||||
# Generate node-exporter targets from all flake hosts
|
# Generate node-exporter targets from all flake hosts
|
||||||
|
# Returns a list of static_configs entries with labels
|
||||||
generateNodeExporterTargets = self: externalTargets:
|
generateNodeExporterTargets = self: externalTargets:
|
||||||
let
|
let
|
||||||
nixosConfigs = self.nixosConfigurations or { };
|
nixosConfigs = self.nixosConfigurations or { };
|
||||||
hostList = lib.filter (x: x != null) (
|
hostList = lib.filter (x: x != null) (
|
||||||
lib.mapAttrsToList extractHostMonitoring nixosConfigs
|
lib.mapAttrsToList extractHostMonitoring nixosConfigs
|
||||||
);
|
);
|
||||||
flakeTargets = map (host: "${host.hostname}.home.2rjus.net:9100") hostList;
|
|
||||||
|
# Extract hostname from a target string like "gunter.home.2rjus.net:9100"
|
||||||
|
extractHostnameFromTarget = target:
|
||||||
|
builtins.head (lib.splitString "." target);
|
||||||
|
|
||||||
|
# Build target entries with labels for each host
|
||||||
|
flakeEntries = map
|
||||||
|
(host: {
|
||||||
|
target = "${host.hostname}.home.2rjus.net:9100";
|
||||||
|
labels = buildEffectiveLabels host;
|
||||||
|
})
|
||||||
|
hostList;
|
||||||
|
|
||||||
|
# External targets get hostname extracted from the target string
|
||||||
|
externalEntries = map
|
||||||
|
(target: {
|
||||||
|
inherit target;
|
||||||
|
labels = { hostname = extractHostnameFromTarget target; };
|
||||||
|
})
|
||||||
|
(externalTargets.nodeExporter or [ ]);
|
||||||
|
|
||||||
|
allEntries = flakeEntries ++ externalEntries;
|
||||||
|
|
||||||
|
# Group entries by their label set for efficient static_configs
|
||||||
|
# Convert labels attrset to a string key for grouping
|
||||||
|
labelKey = entry: builtins.toJSON entry.labels;
|
||||||
|
grouped = lib.groupBy labelKey allEntries;
|
||||||
|
|
||||||
|
# Convert groups to static_configs format
|
||||||
|
# Every flake host now has at least a hostname label
|
||||||
|
staticConfigs = lib.mapAttrsToList
|
||||||
|
(key: entries:
|
||||||
|
let
|
||||||
|
labels = (builtins.head entries).labels;
|
||||||
|
in
|
||||||
|
{ targets = map (e: e.target) entries; labels = labels; }
|
||||||
|
)
|
||||||
|
grouped;
|
||||||
in
|
in
|
||||||
flakeTargets ++ (externalTargets.nodeExporter or [ ]);
|
staticConfigs;
|
||||||
|
|
||||||
# Generate scrape configs from all flake hosts and external targets
|
# Generate scrape configs from all flake hosts and external targets
|
||||||
|
# Host labels are propagated to service targets for semantic alert filtering
|
||||||
generateScrapeConfigs = self: externalTargets:
|
generateScrapeConfigs = self: externalTargets:
|
||||||
let
|
let
|
||||||
nixosConfigs = self.nixosConfigurations or { };
|
nixosConfigs = self.nixosConfigurations or { };
|
||||||
@@ -70,13 +123,14 @@ let
|
|||||||
lib.mapAttrsToList extractHostMonitoring nixosConfigs
|
lib.mapAttrsToList extractHostMonitoring nixosConfigs
|
||||||
);
|
);
|
||||||
|
|
||||||
# Collect all scrapeTargets from all hosts, grouped by job_name
|
# Collect all scrapeTargets from all hosts, including host labels
|
||||||
allTargets = lib.flatten (map
|
allTargets = lib.flatten (map
|
||||||
(host:
|
(host:
|
||||||
map
|
map
|
||||||
(target: {
|
(target: {
|
||||||
inherit (target) job_name port metrics_path scheme scrape_interval honor_labels;
|
inherit (target) job_name port metrics_path scheme scrape_interval honor_labels;
|
||||||
hostname = host.hostname;
|
hostname = host.hostname;
|
||||||
|
hostLabels = buildEffectiveLabels host;
|
||||||
})
|
})
|
||||||
host.scrapeTargets
|
host.scrapeTargets
|
||||||
)
|
)
|
||||||
@@ -87,22 +141,32 @@ let
|
|||||||
grouped = lib.groupBy (t: t.job_name) allTargets;
|
grouped = lib.groupBy (t: t.job_name) allTargets;
|
||||||
|
|
||||||
# Generate a scrape config for each job
|
# Generate a scrape config for each job
|
||||||
|
# Within each job, group targets by their host labels for efficient static_configs
|
||||||
flakeScrapeConfigs = lib.mapAttrsToList
|
flakeScrapeConfigs = lib.mapAttrsToList
|
||||||
(jobName: targets:
|
(jobName: targets:
|
||||||
let
|
let
|
||||||
first = builtins.head targets;
|
first = builtins.head targets;
|
||||||
targetAddrs = map
|
|
||||||
(t:
|
# Group targets within this job by their host labels
|
||||||
|
labelKey = t: builtins.toJSON t.hostLabels;
|
||||||
|
groupedByLabels = lib.groupBy labelKey targets;
|
||||||
|
|
||||||
|
# Every flake host now has at least a hostname label
|
||||||
|
staticConfigs = lib.mapAttrsToList
|
||||||
|
(key: labelTargets:
|
||||||
let
|
let
|
||||||
portStr = toString t.port;
|
labels = (builtins.head labelTargets).hostLabels;
|
||||||
|
targetAddrs = map
|
||||||
|
(t: "${t.hostname}.home.2rjus.net:${toString t.port}")
|
||||||
|
labelTargets;
|
||||||
in
|
in
|
||||||
"${t.hostname}.home.2rjus.net:${portStr}")
|
{ targets = targetAddrs; labels = labels; }
|
||||||
targets;
|
)
|
||||||
|
groupedByLabels;
|
||||||
|
|
||||||
config = {
|
config = {
|
||||||
job_name = jobName;
|
job_name = jobName;
|
||||||
static_configs = [{
|
static_configs = staticConfigs;
|
||||||
targets = targetAddrs;
|
|
||||||
}];
|
|
||||||
}
|
}
|
||||||
// (lib.optionalAttrs (first.metrics_path != "/metrics") {
|
// (lib.optionalAttrs (first.metrics_path != "/metrics") {
|
||||||
metrics_path = first.metrics_path;
|
metrics_path = first.metrics_path;
|
||||||
|
|||||||
@@ -1,5 +0,0 @@
|
|||||||
[proxmox]
|
|
||||||
pve1.home.2rjus.net
|
|
||||||
|
|
||||||
[proxmox:vars]
|
|
||||||
ansible_user=root
|
|
||||||
@@ -1,20 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
# array of hosts
|
|
||||||
HOSTS=(
|
|
||||||
"ns1"
|
|
||||||
"ns2"
|
|
||||||
"ca"
|
|
||||||
"ha1"
|
|
||||||
"http-proxy"
|
|
||||||
"jelly01"
|
|
||||||
"monitoring01"
|
|
||||||
"nix-cache01"
|
|
||||||
"pgdb1"
|
|
||||||
)
|
|
||||||
|
|
||||||
for host in "${HOSTS[@]}"; do
|
|
||||||
echo "Rebuilding $host"
|
|
||||||
nixos-rebuild boot --flake .#${host} --target-host root@${host}
|
|
||||||
done
|
|
||||||
@@ -18,6 +18,8 @@ from manipulators import (
|
|||||||
remove_from_flake_nix,
|
remove_from_flake_nix,
|
||||||
remove_from_terraform_vms,
|
remove_from_terraform_vms,
|
||||||
remove_from_vault_terraform,
|
remove_from_vault_terraform,
|
||||||
|
remove_from_approle_tf,
|
||||||
|
find_host_secrets,
|
||||||
check_entries_exist,
|
check_entries_exist,
|
||||||
)
|
)
|
||||||
from models import HostConfig
|
from models import HostConfig
|
||||||
@@ -255,7 +257,10 @@ def handle_remove(
|
|||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
# Check what entries exist
|
# Check what entries exist
|
||||||
flake_exists, terraform_exists, vault_exists = check_entries_exist(hostname, repo_root)
|
flake_exists, terraform_exists, vault_exists, approle_exists = check_entries_exist(hostname, repo_root)
|
||||||
|
|
||||||
|
# Check for secrets in secrets.tf
|
||||||
|
host_secrets = find_host_secrets(hostname, repo_root)
|
||||||
|
|
||||||
# Collect all files in the host directory recursively
|
# Collect all files in the host directory recursively
|
||||||
files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()])
|
files_in_host_dir = sorted([f for f in host_dir.rglob("*") if f.is_file()])
|
||||||
@@ -294,11 +299,25 @@ def handle_remove(
|
|||||||
else:
|
else:
|
||||||
console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]")
|
console.print(f" • terraform/vault/hosts-generated.tf [dim](not found)[/dim]")
|
||||||
|
|
||||||
# Warn about secrets directory
|
if approle_exists:
|
||||||
|
console.print(f' • terraform/vault/approle.tf (host_policies["{hostname}"])')
|
||||||
|
else:
|
||||||
|
console.print(f" • terraform/vault/approle.tf [dim](not found)[/dim]")
|
||||||
|
|
||||||
|
# Warn about secrets in secrets.tf
|
||||||
|
if host_secrets:
|
||||||
|
console.print(f"\n[yellow]⚠️ Warning: Found {len(host_secrets)} secret(s) in terraform/vault/secrets.tf:[/yellow]")
|
||||||
|
for secret_path in host_secrets:
|
||||||
|
console.print(f' • "{secret_path}"')
|
||||||
|
console.print(f"\n [yellow]These will NOT be removed automatically.[/yellow]")
|
||||||
|
console.print(f" After removal, manually edit secrets.tf and run:")
|
||||||
|
for secret_path in host_secrets:
|
||||||
|
console.print(f" [white]vault kv delete secret/{secret_path}[/white]")
|
||||||
|
|
||||||
|
# Warn about legacy secrets directory
|
||||||
if secrets_exist:
|
if secrets_exist:
|
||||||
console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists and will NOT be deleted[/yellow]")
|
console.print(f"\n[yellow]⚠️ Warning: secrets/{hostname}/ directory exists (legacy SOPS)[/yellow]")
|
||||||
console.print(f" Manually remove if no longer needed: [white]rm -rf secrets/{hostname}/[/white]")
|
console.print(f" Manually remove if no longer needed: [white]rm -rf secrets/{hostname}/[/white]")
|
||||||
console.print(f" Also update .sops.yaml to remove the host's age key")
|
|
||||||
|
|
||||||
# Exit if dry run
|
# Exit if dry run
|
||||||
if dry_run:
|
if dry_run:
|
||||||
@@ -323,6 +342,13 @@ def handle_remove(
|
|||||||
else:
|
else:
|
||||||
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf")
|
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/hosts-generated.tf")
|
||||||
|
|
||||||
|
# Remove from terraform/vault/approle.tf
|
||||||
|
if approle_exists:
|
||||||
|
if remove_from_approle_tf(hostname, repo_root):
|
||||||
|
console.print("[green]✓[/green] Removed from terraform/vault/approle.tf")
|
||||||
|
else:
|
||||||
|
console.print("[yellow]⚠[/yellow] Could not remove from terraform/vault/approle.tf")
|
||||||
|
|
||||||
# Remove from terraform/vms.tf
|
# Remove from terraform/vms.tf
|
||||||
if terraform_exists:
|
if terraform_exists:
|
||||||
if remove_from_terraform_vms(hostname, repo_root):
|
if remove_from_terraform_vms(hostname, repo_root):
|
||||||
@@ -345,19 +371,34 @@ def handle_remove(
|
|||||||
console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n")
|
console.print(f"\n[bold green]✓ Host {hostname} removed successfully![/bold green]\n")
|
||||||
|
|
||||||
# Display next steps
|
# Display next steps
|
||||||
display_removal_next_steps(hostname, vault_exists)
|
display_removal_next_steps(hostname, vault_exists, approle_exists, host_secrets)
|
||||||
|
|
||||||
|
|
||||||
def display_removal_next_steps(hostname: str, had_vault: bool) -> None:
|
def display_removal_next_steps(hostname: str, had_vault: bool, had_approle: bool, host_secrets: list) -> None:
|
||||||
"""Display next steps after successful removal."""
|
"""Display next steps after successful removal."""
|
||||||
vault_file = " terraform/vault/hosts-generated.tf" if had_vault else ""
|
vault_files = ""
|
||||||
vault_apply = ""
|
|
||||||
if had_vault:
|
if had_vault:
|
||||||
|
vault_files += " terraform/vault/hosts-generated.tf"
|
||||||
|
if had_approle:
|
||||||
|
vault_files += " terraform/vault/approle.tf"
|
||||||
|
|
||||||
|
vault_apply = ""
|
||||||
|
if had_vault or had_approle:
|
||||||
vault_apply = f"""
|
vault_apply = f"""
|
||||||
3. Apply Vault changes:
|
3. Apply Vault changes:
|
||||||
[white]cd terraform/vault && tofu apply[/white]
|
[white]cd terraform/vault && tofu apply[/white]
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
secrets_cleanup = ""
|
||||||
|
if host_secrets:
|
||||||
|
secrets_cleanup = f"""
|
||||||
|
5. Clean up secrets (manual):
|
||||||
|
Edit terraform/vault/secrets.tf to remove entries for {hostname}
|
||||||
|
Then delete from Vault:"""
|
||||||
|
for secret_path in host_secrets:
|
||||||
|
secrets_cleanup += f"\n [white]vault kv delete secret/{secret_path}[/white]"
|
||||||
|
secrets_cleanup += "\n"
|
||||||
|
|
||||||
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
|
next_steps = f"""[bold cyan]Next Steps:[/bold cyan]
|
||||||
|
|
||||||
1. Review changes:
|
1. Review changes:
|
||||||
@@ -367,9 +408,9 @@ def display_removal_next_steps(hostname: str, had_vault: bool) -> None:
|
|||||||
[white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white]
|
[white]cd terraform && tofu destroy -target='proxmox_vm_qemu.vm["{hostname}"]'[/white]
|
||||||
{vault_apply}
|
{vault_apply}
|
||||||
4. Commit changes:
|
4. Commit changes:
|
||||||
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_file}
|
[white]git add -u hosts/{hostname} flake.nix terraform/vms.tf{vault_files}
|
||||||
git commit -m "hosts: remove {hostname}"[/white]
|
git commit -m "hosts: remove {hostname}"[/white]
|
||||||
"""
|
{secrets_cleanup}"""
|
||||||
console.print(Panel(next_steps, border_style="cyan"))
|
console.print(Panel(next_steps, border_style="cyan"))
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -144,7 +144,7 @@ resource "vault_approle_auth_backend_role" "generated_hosts" {
|
|||||||
|
|
||||||
backend = vault_auth_backend.approle.path
|
backend = vault_auth_backend.approle.path
|
||||||
role_name = each.key
|
role_name = each.key
|
||||||
token_policies = ["host-\${each.key}"]
|
token_policies = ["host-\${each.key}", "homelab-deploy"]
|
||||||
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
secret_id_ttl = 0 # Never expire (wrapped tokens provide time limit)
|
||||||
token_ttl = 3600
|
token_ttl = 3600
|
||||||
token_max_ttl = 3600
|
token_max_ttl = 3600
|
||||||
|
|||||||
@@ -22,12 +22,12 @@ def remove_from_flake_nix(hostname: str, repo_root: Path) -> bool:
|
|||||||
content = flake_path.read_text()
|
content = flake_path.read_text()
|
||||||
|
|
||||||
# Check if hostname exists
|
# Check if hostname exists
|
||||||
hostname_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
|
hostname_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
|
||||||
if not re.search(hostname_pattern, content, re.MULTILINE):
|
if not re.search(hostname_pattern, content, re.MULTILINE):
|
||||||
return False
|
return False
|
||||||
|
|
||||||
# Match the entire block from "hostname = " to "};"
|
# Match the entire block from "hostname = " to "};"
|
||||||
replace_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
|
replace_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
|
||||||
new_content, count = re.subn(replace_pattern, "", content, flags=re.MULTILINE | re.DOTALL)
|
new_content, count = re.subn(replace_pattern, "", content, flags=re.MULTILINE | re.DOTALL)
|
||||||
|
|
||||||
if count == 0:
|
if count == 0:
|
||||||
@@ -101,7 +101,68 @@ def remove_from_vault_terraform(hostname: str, repo_root: Path) -> bool:
|
|||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool]:
|
def remove_from_approle_tf(hostname: str, repo_root: Path) -> bool:
|
||||||
|
"""
|
||||||
|
Remove host entry from terraform/vault/approle.tf locals.host_policies.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Hostname to remove
|
||||||
|
repo_root: Path to repository root
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if found and removed, False if not found
|
||||||
|
"""
|
||||||
|
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
|
||||||
|
|
||||||
|
if not approle_path.exists():
|
||||||
|
return False
|
||||||
|
|
||||||
|
content = approle_path.read_text()
|
||||||
|
|
||||||
|
# Check if hostname exists in host_policies
|
||||||
|
hostname_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
|
||||||
|
if not re.search(hostname_pattern, content, re.MULTILINE):
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Match the entire block from "hostname" = { to closing }
|
||||||
|
# The block contains paths = [ ... ] and possibly extra_policies = [...]
|
||||||
|
replace_pattern = rf'\n?\s+"{re.escape(hostname)}" = \{{[^}}]*\}}\n?'
|
||||||
|
new_content, count = re.subn(replace_pattern, "\n", content, flags=re.DOTALL)
|
||||||
|
|
||||||
|
if count == 0:
|
||||||
|
return False
|
||||||
|
|
||||||
|
approle_path.write_text(new_content)
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def find_host_secrets(hostname: str, repo_root: Path) -> list:
|
||||||
|
"""
|
||||||
|
Find secrets in terraform/vault/secrets.tf that belong to a host.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Hostname to search for
|
||||||
|
repo_root: Path to repository root
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of secret paths found (e.g., ["hosts/hostname/test-service"])
|
||||||
|
"""
|
||||||
|
secrets_path = repo_root / "terraform" / "vault" / "secrets.tf"
|
||||||
|
|
||||||
|
if not secrets_path.exists():
|
||||||
|
return []
|
||||||
|
|
||||||
|
content = secrets_path.read_text()
|
||||||
|
|
||||||
|
# Find all secret paths matching hosts/{hostname}/
|
||||||
|
pattern = rf'"(hosts/{re.escape(hostname)}/[^"]+)"'
|
||||||
|
matches = re.findall(pattern, content)
|
||||||
|
|
||||||
|
# Return unique paths, preserving order
|
||||||
|
return list(dict.fromkeys(matches))
|
||||||
|
|
||||||
|
|
||||||
|
def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, bool, bool]:
|
||||||
"""
|
"""
|
||||||
Check which entries exist for a hostname.
|
Check which entries exist for a hostname.
|
||||||
|
|
||||||
@@ -110,12 +171,12 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
|
|||||||
repo_root: Path to repository root
|
repo_root: Path to repository root
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Tuple of (flake_exists, terraform_vms_exists, vault_exists)
|
Tuple of (flake_exists, terraform_vms_exists, vault_generated_exists, approle_exists)
|
||||||
"""
|
"""
|
||||||
# Check flake.nix
|
# Check flake.nix
|
||||||
flake_path = repo_root / "flake.nix"
|
flake_path = repo_root / "flake.nix"
|
||||||
flake_content = flake_path.read_text()
|
flake_content = flake_path.read_text()
|
||||||
flake_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
|
flake_pattern = rf"^ {re.escape(hostname)} = nixpkgs\.lib\.nixosSystem"
|
||||||
flake_exists = bool(re.search(flake_pattern, flake_content, re.MULTILINE))
|
flake_exists = bool(re.search(flake_pattern, flake_content, re.MULTILINE))
|
||||||
|
|
||||||
# Check terraform/vms.tf
|
# Check terraform/vms.tf
|
||||||
@@ -131,7 +192,15 @@ def check_entries_exist(hostname: str, repo_root: Path) -> Tuple[bool, bool, boo
|
|||||||
vault_content = vault_tf_path.read_text()
|
vault_content = vault_tf_path.read_text()
|
||||||
vault_exists = f'"{hostname}"' in vault_content
|
vault_exists = f'"{hostname}"' in vault_content
|
||||||
|
|
||||||
return (flake_exists, terraform_exists, vault_exists)
|
# Check terraform/vault/approle.tf
|
||||||
|
approle_path = repo_root / "terraform" / "vault" / "approle.tf"
|
||||||
|
approle_exists = False
|
||||||
|
if approle_path.exists():
|
||||||
|
approle_content = approle_path.read_text()
|
||||||
|
approle_pattern = rf'^\s+"{re.escape(hostname)}" = \{{'
|
||||||
|
approle_exists = bool(re.search(approle_pattern, approle_content, re.MULTILINE))
|
||||||
|
|
||||||
|
return (flake_exists, terraform_exists, vault_exists, approle_exists)
|
||||||
|
|
||||||
|
|
||||||
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
|
def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -> None:
|
||||||
@@ -147,32 +216,25 @@ def update_flake_nix(config: HostConfig, repo_root: Path, force: bool = False) -
|
|||||||
content = flake_path.read_text()
|
content = flake_path.read_text()
|
||||||
|
|
||||||
# Create new entry
|
# Create new entry
|
||||||
new_entry = f""" {config.hostname} = nixpkgs.lib.nixosSystem {{
|
new_entry = f""" {config.hostname} = nixpkgs.lib.nixosSystem {{
|
||||||
inherit system;
|
inherit system;
|
||||||
specialArgs = {{
|
specialArgs = {{
|
||||||
inherit inputs self sops-nix;
|
inherit inputs self;
|
||||||
|
}};
|
||||||
|
modules = commonModules ++ [
|
||||||
|
./hosts/{config.hostname}
|
||||||
|
];
|
||||||
}};
|
}};
|
||||||
modules = [
|
|
||||||
(
|
|
||||||
{{ config, pkgs, ... }}:
|
|
||||||
{{
|
|
||||||
nixpkgs.overlays = commonOverlays;
|
|
||||||
}}
|
|
||||||
)
|
|
||||||
./hosts/{config.hostname}
|
|
||||||
sops-nix.nixosModules.sops
|
|
||||||
];
|
|
||||||
}};
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# Check if hostname already exists
|
# Check if hostname already exists
|
||||||
hostname_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem"
|
hostname_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem"
|
||||||
existing_match = re.search(hostname_pattern, content, re.MULTILINE)
|
existing_match = re.search(hostname_pattern, content, re.MULTILINE)
|
||||||
|
|
||||||
if existing_match and force:
|
if existing_match and force:
|
||||||
# Replace existing entry
|
# Replace existing entry
|
||||||
# Match the entire block from "hostname = " to "};"
|
# Match the entire block from "hostname = " to "};"
|
||||||
replace_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
|
replace_pattern = rf"^ {re.escape(config.hostname)} = nixpkgs\.lib\.nixosSystem \{{.*?^ \}};\n"
|
||||||
new_content, count = re.subn(replace_pattern, new_entry, content, flags=re.MULTILINE | re.DOTALL)
|
new_content, count = re.subn(replace_pattern, new_entry, content, flags=re.MULTILINE | re.DOTALL)
|
||||||
|
|
||||||
if count == 0:
|
if count == 0:
|
||||||
|
|||||||
@@ -18,6 +18,12 @@
|
|||||||
tier = "test"; # Start in test tier, move to prod after validation
|
tier = "test"; # Start in test tier, move to prod after validation
|
||||||
};
|
};
|
||||||
|
|
||||||
|
# Enable Vault integration
|
||||||
|
vault.enable = true;
|
||||||
|
|
||||||
|
# Enable remote deployment via NATS
|
||||||
|
homelab.deploy.enable = true;
|
||||||
|
|
||||||
nixpkgs.config.allowUnfree = true;
|
nixpkgs.config.allowUnfree = true;
|
||||||
boot.loader.grub.enable = true;
|
boot.loader.grub.enable = true;
|
||||||
boot.loader.grub.device = "/dev/vda";
|
boot.loader.grub.device = "/dev/vda";
|
||||||
|
|||||||
@@ -140,20 +140,22 @@ def validate_ip_unique(ip: Optional[str], repo_root: Path) -> None:
|
|||||||
ip_part = ip.split("/")[0]
|
ip_part = ip.split("/")[0]
|
||||||
|
|
||||||
# Check all hosts/*/configuration.nix files
|
# Check all hosts/*/configuration.nix files
|
||||||
|
# Search for IP with CIDR notation to match static IP assignments
|
||||||
|
# (e.g., "10.69.13.5/24") but not DNS resolver entries (e.g., "10.69.13.5")
|
||||||
hosts_dir = repo_root / "hosts"
|
hosts_dir = repo_root / "hosts"
|
||||||
if hosts_dir.exists():
|
if hosts_dir.exists():
|
||||||
for config_file in hosts_dir.glob("*/configuration.nix"):
|
for config_file in hosts_dir.glob("*/configuration.nix"):
|
||||||
content = config_file.read_text()
|
content = config_file.read_text()
|
||||||
if ip_part in content:
|
if ip in content:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
f"IP address {ip_part} already in use in {config_file}"
|
f"IP address {ip_part} already in use in {config_file}"
|
||||||
)
|
)
|
||||||
|
|
||||||
# Check terraform/vms.tf
|
# Check terraform/vms.tf - search for full IP with CIDR
|
||||||
terraform_file = repo_root / "terraform" / "vms.tf"
|
terraform_file = repo_root / "terraform" / "vms.tf"
|
||||||
if terraform_file.exists():
|
if terraform_file.exists():
|
||||||
content = terraform_file.read_text()
|
content = terraform_file.read_text()
|
||||||
if ip_part in content:
|
if ip in content:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
f"IP address {ip_part} already in use in {terraform_file}"
|
f"IP address {ip_part} already in use in {terraform_file}"
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -1,24 +0,0 @@
|
|||||||
{
|
|
||||||
"data": "ENC[AES256_GCM,data:TgGIuklFPUSCBosD86NFnkAtRvYijQNQP4vvTkKu3dRAOjdDa2li5djZDUS4NEEPEihpOcMXqHBb+ABk3LmoU5nLmsKCeylUp7+DhcGi9f3xw2h1zbHV37mt40OVLTF3cYufRdydIkCGQA3td3q1ue/wCna2ewe73xwGg5j6ZVJCZAtW4VCNZM+rcG+YxPUC0gmBH59+O0VSrZrkvSnifbr+K0dGwg4i17KwAukI4Ac7YMkQoeuAPXq38+ZftlRx4tq9xBUko6wpPY9zOaFzeagWYMF0n1UYqDt+/3XZI/mukPhJc9tzbWneqgkQBOx3OiDwrNglCHvEpnb+bZePIRLOnNHd1ShETgBqhsHGp9OAwwbAt4tO+HFpCQtVz7s2LWQFLbWiN0SCGzYUkFGCgoXae5H58lxFav8=,iv:UzaWlJ+M+VQx3CcPSGbFZh5/rGbKpS2Rq2XVZAIDFiQ=,tag:F3waoAMuEKTvN2xANReSww==,type:str]",
|
|
||||||
"sops": {
|
|
||||||
"kms": null,
|
|
||||||
"gcp_kms": null,
|
|
||||||
"azure_kv": null,
|
|
||||||
"hc_vault": null,
|
|
||||||
"age": [
|
|
||||||
{
|
|
||||||
"recipient": "age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBpRGZSVHRSMGlyazAwQU5j\nd1o1L0Y1ckhQMkh4MVZiRmZlR2ozcmdsUW1vCk4xZ1ZibDBrUWZhYmxVVjBUczRn\nYlJtUWF3Y1lHWG56NkhmK2JOUHVGajQKLS0tIDN2S2doQURpTis2U3lWV0NxdWEz\ncjNZaEl1dEQwOXhsNE9xbHhYUzNTV3cKVmVIe05JwgXKSku7AJmrujYXrbBSbpBJ\nnqCuDIhok1w/fiff+XXn8udbgPVq5bC2SOhHbtVxImgBCFzrj5hQ0A==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"recipient": "age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA4V3NaUEdvMmJvakQ0L1F0\nUnkvQ2F5dEVlZ2pMdlBZcjJac0tERnF5ZWljCmFrdU1NZ29jMkJ1a1ZLdURmVWI0\ncm1vNytFVzZjbVY2aVd2N3laMWNRNFEKLS0tIGgzOTFZY0lxc0JyVmd5cFBlNkRr\nVDBWc0t4c3pVV3RhSTB1UUVpNHd6NUkKNn6Sxb5oxP7iWqTF1+X9nOiYum3U+Rzk\nkryxVnf9EvQIVIFKDaTb+yAEO8otjqj+C4mHA9fannnNEJduOiPWOg==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"lastmodified": "2024-11-30T13:18:08Z",
|
|
||||||
"mac": "ENC[AES256_GCM,data:9R9RJzPMr9Bv8aeCDxhExTfbr+R2hjap6FGSk5QxBdbNpOcNS78ica0CLEmkAYVAfjmx/X2jC5ZnsAueSPUK7nAgNX2gJXbUTpY0F+oKt35GJziLrFLl3u/ahpF9lQ50EL9OqqgS+igDqtodJhKme5DXH5/GXQHhz++O3VZkR78=,iv:XgN3PiowiEosi2DmrjP82HhJMvnwaV530tsBE8GQfjs=,tag:U243BrtH7H/DU9LcjN/MMg==,type:str]",
|
|
||||||
"pgp": null,
|
|
||||||
"unencrypted_suffix": "_unencrypted",
|
|
||||||
"version": "3.9.1"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,24 +0,0 @@
|
|||||||
{
|
|
||||||
"data": "ENC[AES256_GCM,data:5AePh5uXcUseYBGWvlztgmg8mGBGy3ngKRa6+QxOaT0/fzSB1pKkaMtZJo76tV9wwjdL6/b6VVUI7GIaCBD5kgdZuA8RdBTXguHyjjdxAlI9xcrQaWWdATd8JJt+eQp/m2Y+0dioyXKaDV2ukI3GtHYjp/ixMoHHWEocnEEb40wG6c3CZcvsLWJvKTkFc2OvcjcU2RTfuNlYtEETidiD9iC/dtCakNQHmLP1UFYgcn0ebXBKmlqD6+x2o7BVT1SLwVCyGNvH3eKA2AWvddZChnhaNCUIXcRwBFCgS8lPs4iXhAhly+nwuj7ssFpuu3sjm5pq196tRS8WQl2iNUEJ2tzoOpceg1kZZ7KHX3wCbdBlCRqhy9Q4JMvWPDssO+zz2aU21+BDEySDTCnTYX9Hu2/iFvZejt++mKY=,iv:u/Ukye0BAj2ka++AA72W8WfXJAZZ/YJ3RC/aydxdoUc=,tag:ihTP5bCCigWEPcLFaYOhMA==,type:str]",
|
|
||||||
"sops": {
|
|
||||||
"kms": null,
|
|
||||||
"gcp_kms": null,
|
|
||||||
"azure_kv": null,
|
|
||||||
"hc_vault": null,
|
|
||||||
"age": [
|
|
||||||
{
|
|
||||||
"recipient": "age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0VElDNHArZXlXa2JRQjd0\nQmVIbGpPWk43NDdiTkFtcEd1bDhRdXJWOUY0CndITHdKTFNJQXFOVFdyUGNtQ09k\nN2hnQmFYR0ZORWtxcUN0ZFhsM0U3N2cKLS0tIFh1TTBpMjFIZ2NYM1QxeDRjYlJx\nYkdrUDZmMUpGbjk3REJCVVRpeFk5Z28KJcia0Bk+3ZoifZnRLwqAko526ODPnkSS\nzymtOj/QYTA0++NP3B1aScIyhWITMEZX1iSoWDmgHj8ZQoNMdkM7AQ==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"recipient": "age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBZNlNHRWNEcUZGNXNBMDFR\nTzE5RnNMQUMvU1k2OS9XMlpvUktMRzQ5RmxvCnlCS3lzRVpGUHJLRGZ6SWZ2ZktR\na3l0TVN2NUlRVEQwRHByYkNEMDQyWUkKLS0tIEh3RjBWT3c5K2RWeDRjWFpsU1lP\ncStqY2xta3RSNkR6Vkt5YXhYUTZmbDgKvVKmZc8S/RwurJGsGiJ5LhM4waLO9B9k\n2cawxHmcYM3KfXDFwp9UZWhIwF7SRkG56ZE4OjGI3sOL+74ixnePxA==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"lastmodified": "2024-11-30T13:18:16Z",
|
|
||||||
"mac": "ENC[AES256_GCM,data:JwjbQ129cYCBNA5Fb8lN9rW7/y4wuVOqLeajIMcYyCzlBcjzCZAV1DKN5n75xMamb/hb1AUkmtp/K82PKM0Vg5X4/lpWTUZXZOzn/TrwHx+yqlJjL9mUdGuHnSY5DwME38Dde3UxdtUa0CVgQOxvMIycW27w8+8NNfO2zxGxkzc=,iv:ZMZASOsqXZOb0NkBqG3GGaqqKgQdjZLiku2yU5QonB8=,tag:/lb/HMxsYOV5XX/5kWnFHA==,type:str]",
|
|
||||||
"pgp": null,
|
|
||||||
"unencrypted_suffix": "_unencrypted",
|
|
||||||
"version": "3.9.1"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,24 +0,0 @@
|
|||||||
{
|
|
||||||
"data": "ENC[AES256_GCM,data:vqQ3HwSmuDlI4UwraLWvwkBSj9zTFeNEWI1xzhVrO/gpx8+WBZOt2F0J7/LSTGAWsWW/9Gov+XXXAOtfnKfjYVzizyT/jE8EQwMuItWiFEVA6hohgwtsk7YKJjXdJIxmiv+WKs73gWb0uFVGh1ArMzsVkGPj1W1AKMFAneDPgsfSCy9aVOMuF8zQwypFC8eaxqOQhLpiN2ncRm8e7khwGurSgYfHDgFghaDr8torgUrZTOPNFk+LEdxB3WcC17+4a8ZyuBapmYdRTrP73czTAuxOF8lMwddJhO99SF7nWuOYVF1FOKLGtK04oKci5/xRIzvWo3I0pGajkxtuF5CyWbd1KblcPfBALIU/J5hU/puGJ7M2sE/qsg/4kaTFxnhq32rPZj291jFb4evDdOhVodfC1axOQUbzAC0=,iv:yOeQ384ikqgDqfthl7GIVSIMNA/n0BYTSIqFN3T9MAY=,tag:Y6nhOCrkWx7MnVpEeKN0Jg==,type:str]",
|
|
||||||
"sops": {
|
|
||||||
"kms": null,
|
|
||||||
"gcp_kms": null,
|
|
||||||
"azure_kv": null,
|
|
||||||
"hc_vault": null,
|
|
||||||
"age": [
|
|
||||||
{
|
|
||||||
"recipient": "age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBFTjRMWlNtYVQ2WnJEaGFN\nVFU2TXRTK2FHREpqREhOWHBKemxNc2U4WW44CnV4OWlBdXlFUWhJYi9jTTRuUWJV\nOWFPV2I4UytDRFo3blN3bUtFQ1NGU0kKLS0tIGp2VHlDc1JMMUdDUjlNNDFwUUxj\nVnhHbCtrNVNpZXo0K2dDVU5YTVJJUEkKk9mVTbzQVGZo3RKDLPDwtENknh+in1Q5\njf4DA1cGDDNzcEIWOOYyS+1mzT9WY8gU0hWqihX/bAx7CVsNUallZw==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"recipient": "age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBrVFNwUGpkOUhkUXFWWERq\nMVdueC9VSE9KbGZkenBVK3NRMjRNVXVmcVRRCjNLa0QzbWVCQks3ZmV3eFVjcEp0\nRmxDSlZIZU1IbEdnbE83WlkxV3VZV1EKLS0tICtsRXArajQ4Um9mNEV5OWZBdS85\nVGFSU2wwODZ3Zm44M3pWcTdDV1dxejQKM2BK5Axb1cF344ea89gkzCLzEX6j4amK\nzxf+boBK7JUX7F6QaPB0sRU8J4Cei9mALz96C8xNHjX00KcD3O2QOA==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"lastmodified": "2024-11-30T13:18:20Z",
|
|
||||||
"mac": "ENC[AES256_GCM,data:AllgcWxHnr3igPi/JbfJCbEa6hKtmILnAjiaMojRZNO4p6zYSoF0s8lo9XX05/vIrFUo+YaCtsuacv+kfz9f6vQafPn7Vulbh6PeH1VlAmzyVfJOTmHP3YX8ic3uM56A4+III1jOERCFOIcc/CKsnRLFhLCRQRMgtgT0hTl5aPw=,iv:60dOYhoUTu1HIHzY36eJeRZ66/v6JmRRpIW99W2D+CI=,tag:F7nLSFm933K5M+JE4IvNYw==,type:str]",
|
|
||||||
"pgp": null,
|
|
||||||
"unencrypted_suffix": "_unencrypted",
|
|
||||||
"version": "3.9.1"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,24 +0,0 @@
|
|||||||
{
|
|
||||||
"data": "ENC[AES256_GCM,data:YRdPrTLQH0xdWiIzOyjfEGpvfmuj6me6GzZZcauh9bUUywyA1ranDnWqbJYgawQQxIXsq9dhXD0uco+7mmXq2598kF1NI9jh6uLf3k0H494zZOalRBv/k8u9oJDLIiVAkg9eNNLbGX0PMZr/Yue/qdkuXx2Hg9E7bQJwpU/NXF+jKKs+3NmKT5NBlegwAzUs530D4DUoaq5AhvVvdC6a1UcE+KJzQ8pRiz1GjFIxAB7qX+GVwa3yNdLgo2tlAbOzjGtaDfJnhZIHSNEq+4TEhjlF9lCmFCGFDUVupvMOWs0kBywJEzIrDmxmvGHlPj3FfyytPb7qhlsOXDDDS67IoiwluKOnw+sALAG0Iv9LMrDZ3z8MXeEGvRWu0VDMuGXN905/9kGx/A40mPjcfnZvI+qSRIKjER5R8aU=,iv:qiP2Ml59AnK24MBbs7N/HqJIylf+fXGqJAo2N8iFNB0=,tag:0Dj5fVs6OB07kvV4qzuvfw==,type:str]",
|
|
||||||
"sops": {
|
|
||||||
"kms": null,
|
|
||||||
"gcp_kms": null,
|
|
||||||
"azure_kv": null,
|
|
||||||
"hc_vault": null,
|
|
||||||
"age": [
|
|
||||||
{
|
|
||||||
"recipient": "age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBUFlvNmRNYUlJSHZYUkpJ\nMEloQXFSdENIWGJVVDNIOVY5MS9SYWRoL0FrCnRJc05wZUZBSDRvMHNUUEhNRXQ4\nTWhYOUp6YUNGZFNWUFRrSmlJM1c4aWcKLS0tIFc1b3NlSEo2eFJhdDgwejRqcHlT\nZE5wN01uaE04cTlIbVJMVWQvQ1pXajgKQ1n6UmP7LEBsnIBXVc0BceOqvwCqQzBP\ncI8C5Io4ILgMjY4dr6sd0SeJG6mfDdiMA+k7c6jqoyZCW/Pkd3LANQ==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"recipient": "age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBtM2lyeXVzdE9nL1k5L3dC\nTkl2MjhMb1FKMFdCeXFPSmNST0pvOTRUaEVvCmdwMnhjSFFHVFhidmIySS9jMEJu\nNTJpRjdFOWpZZ3ZuZFJwZUUrRFU5NnMKLS0tIDJ1UjdVQkpMNm5Pd01JRnZNOEtr\nb1lpMlBkVHpiT2lYdWtZaUQrRW1HUDgKq/JVMf5gdu6lNEmqY6zU2SymbT+jklem\nnUQ9yieJGF+PanutNW6BCJH8jb/fH+Y6AeJ9S+kKCB4Yi75i4d+oHg==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"lastmodified": "2024-11-30T13:18:24Z",
|
|
||||||
"mac": "ENC[AES256_GCM,data:6FJTKEdIpCm+Dz7Ua8dZOMZQFaGU0oU/HRP6ly5mWbXCv81LRbZXRBd+5RDY3z9g9nb0PXZrOMNps63F6SKxK52VfzLIOap3UGeMNQn5P4/yyFj7JQHQ5Gjcf2l2z2VZ7NhUdNoSCV/6lwjValbKtids48Q5c3sFX997ZiqIUnY=,iv:nUeyJd/v8d9v7QsLLckziD9K5qjOZKK4vOQJw/ymi18=,tag:6n5EE3oklWdVcedvB2J/zA==,type:str]",
|
|
||||||
"pgp": null,
|
|
||||||
"unencrypted_suffix": "_unencrypted",
|
|
||||||
"version": "3.9.1"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,30 +0,0 @@
|
|||||||
ca_root_pw: ENC[AES256_GCM,data:jS5BHS9i/pOykus5aGsW+w==,iv:aQIU7uXnNKaeNXv1UjRpBoSYcRpHo8RjnvCaIw4yCqc=,tag:lkjGm5/Ve93nizqGDQ0ByA==,type:str]
|
|
||||||
sops:
|
|
||||||
kms: []
|
|
||||||
gcp_kms: []
|
|
||||||
azure_kv: []
|
|
||||||
hc_vault: []
|
|
||||||
age:
|
|
||||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA5anlORWxJalhRWkJPeGIy
|
|
||||||
OStyVG8vMFRTTEZOWHR3Q3N1UWJQbFlxV3pBCmVKQVM1SlJ2L0JOb3U3cTh3YkZ4
|
|
||||||
WHAxSUpTT1dyRHJHYVd1Qkh1ZWxwYW8KLS0tIEhXeklsSmlGaFlaaWF5L0Nodk5a
|
|
||||||
clZ4M3hFSlFqaEZ0UWREdHpTQ29GVUEKAxj5P05Ilpwis2oKFe54mJX+1LfTwfUv
|
|
||||||
2XRFOrEQbFNcK5WFu46p1mc/AAjKTeHWuvb2Yq43CO+sh1+kqKz0XA==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBaS0dqQ1p4MEE2d2JaeFRx
|
|
||||||
UnB4ejhrS3hLekpqeWJhcEJGdnpzMTZDelVRCmFjVGswd3VtRUloWG1WbWY5N0s3
|
|
||||||
cG9aV2hGU3lFZkkvcUJNWE1rWUIwMmMKLS0tIG1KdlhoQzREWDhPbXVSZVBUQkdE
|
|
||||||
N1hmcEwxWXBIWkQ3a3BrdGhvUFoxbzgKX6hLoz7o/Du6ymrYwmGDkXp2XT+0+7QE
|
|
||||||
YhD5qQzGLVQSh3XM/wWExj2Ue5/gw/NqNziHezOh2r9gQljbHjG2/g==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
lastmodified: "2024-10-21T09:12:26Z"
|
|
||||||
mac: ENC[AES256_GCM,data:hfPRIXt/kZJa6lsj7rz+5xGlrWhR/LX895S2d8auP/4t3V//80YE/ofIsHeAY9M7eSFsW9ce2Vp0C/WiCQefVWNaNN7nVAwskCfQ6vTWzs23oYz4NYIeCtZggBG3uGgJxb7ZnAFUJWmLwCxkKTQyoVVnn8i/rUDIBrkilbeLWNI=,iv:lm1HVbWtAifHjqKP0D3sxRadsE9+82ugbA2x54yRBTo=,tag:averxmPLa131lJtFrNxcEA==,type:str]
|
|
||||||
pgp: []
|
|
||||||
unencrypted_suffix: _unencrypted
|
|
||||||
version: 3.9.1
|
|
||||||
@@ -1,25 +0,0 @@
|
|||||||
wg_private_key: ENC[AES256_GCM,data:DlC9txcLkTnb7FoEd249oJV/Ehcp50P8uulbE4rY/xU16fkTlnKvPmYZ7u8=,iv:IsiTzdrh+BNSVgx1mfjpMGNV2J0c88q6AoP0kHX2aGY=,tag:OqFsOIyE71SBD1mcNS/PeQ==,type:str]
|
|
||||||
sops:
|
|
||||||
age:
|
|
||||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSAzdm9HTTN1amwxQ2Z6MUQv
|
|
||||||
dGJ0cEgyaHNOZWtWSWlXNXc5bGhUdSsvVlVzCkJkc3ZQdzlBNDNxb3Avdi96bXFt
|
|
||||||
TExZY29nUDI3RE5vanh6TVBRME1Fa1UKLS0tIG8vSHdCYzkvWmJpd0hNbnRtUmtk
|
|
||||||
aVcwaFJJclZ3YUlUTTNwR2VESmVyZWMKHvKUJBDuNCqacEcRlapetCXHKRb0Js09
|
|
||||||
sqxLfEDwiN2LQQjYHZOmnMfCOt/b2rwXVKEHdTcIsXbdIdKOJwuAIQ==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1gq8434ku0xekqmvnseeunv83e779cg03c06gwrusnymdsr3rpufqx6vr3m
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBEeU01UTc2V1UyZXRadE5I
|
|
||||||
VE1aakVZUEZUNnJxbzJ1K3J1R3ZQdFdMbUhBCjZBMDM3ZkYvQWlyNHBtaDZRWkd4
|
|
||||||
VzY0L3l4N2RNZjJRTDJWZTZyZVhHbW8KLS0tIGVNZ0N0emVmaVRCV09jNmVKRlla
|
|
||||||
cWVSNkJqWHh5c21KcWFac2FlZTVaMTAK1UvfPgZAZYtwiONKIAo5HlaDpN+UT/S/
|
|
||||||
JfPUfjxgRQid8P20Eh/jUepxrDY8iXRZdsUMON+OoQ8mpwoAh5eN1A==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
lastmodified: "2025-05-15T18:56:55Z"
|
|
||||||
mac: ENC[AES256_GCM,data:J2kHY7pXBJZ0UuNCZOhkU11M8rDqCYNzY71NyuDRmzzRCC9ZiNIbavyQAWj2Dpk1pjGsYjXsVoZvP7ti1wTFqahpaR/YWI5gmphrzAe32b9qFVEWTC3YTnmItnY0YxQZYehYghspBjnJtfUK0BvZxSb17egpoFnvHmAq+u5dyxg=,iv:/aLg02RLuJZ1bRzZfOD74pJuE7gppCBztQvUEt557mU=,tag:toxHHBuv3WRblyc9Sth6Iw==,type:str]
|
|
||||||
unencrypted_suffix: _unencrypted
|
|
||||||
version: 3.10.2
|
|
||||||
@@ -1,33 +0,0 @@
|
|||||||
default:
|
|
||||||
user: ENC[AES256_GCM,data:4Zzjm6/e8GCKSPNivnY=,iv:Y3gR+JSH/GLYvkVu3CN4T/chM5mjGjwVPI0iMB4p1t4=,tag:auyG8iWsd/YGjDnnTC21Ew==,type:str]
|
|
||||||
password: ENC[AES256_GCM,data:9cyM9U8VnzXBBA==,iv:YMHNNUoQ9Az5+81Df07tjC+LaEWPHV6frUjd4PZrQOs=,tag:3hKR+BhLJODJp19nn4ppkA==,type:str]
|
|
||||||
verify_ssl: ENC[AES256_GCM,data:Cu5Ucf0=,iv:QFfdV7gDBQ+L2kSZZqlVqCrn9CRg5RNG5DNTFWtVf5Y=,tag:u24ZbpWA65wj3WOwqU1v+g==,type:bool]
|
|
||||||
sops:
|
|
||||||
kms: []
|
|
||||||
gcp_kms: []
|
|
||||||
azure_kv: []
|
|
||||||
hc_vault: []
|
|
||||||
age:
|
|
||||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBuUXdMMG5YaHRJbThQZW9u
|
|
||||||
RHVBbXFiSHNiUWdLTDdPajIyQjN3OGR0dGpzCm9ZVkdNWjhBakU3dVdhRU9kbU81
|
|
||||||
aDlCNzJBQ1hvQ3FnTUk2N2RWQkZpUUEKLS0tIEZacTNqa3FWc2p1NXVtRWhwVExj
|
|
||||||
cUJtYXNjb2Z4QkF4MjlidEZxSUFNa3MKAGHGksPc9oJheSlUQ3ARK5MuR5NFbPmD
|
|
||||||
kmSDSgRmzbarxT8eJnK8/K4ii3hX5E9vGOohUkyc03w4ENsh/dw43g==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1vpns76ykll8jgdlu3h05cur4ew2t3k7u03kxdg8y6ypfhsfhq9fqyurjey
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBOVGhvdGE5Mzl0ckhBM21D
|
|
||||||
RXJwb09OS25PMGViblViM21wTVZiZWhtWmhFCnAzL1NqeUVyOGZFVDFvdXFPbklQ
|
|
||||||
ZkJPWDVIdUdCdjZGUjcrcmtvak5CWG8KLS0tIDhLUHJNN2VqNy9CdVh0K0N0b0k1
|
|
||||||
RUE4U0E0aGxiRkF0NWdwSEIrQTU4MjgKeOU6bIWO6ke9YcG+1E3brnC21sSQxZ9b
|
|
||||||
SiG2QEnFnTeJ5P50XQoYHqUY3B0qx7nDLvyzatYEi6sDkfLXhmHGbw==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
lastmodified: "2024-12-03T16:25:12Z"
|
|
||||||
mac: ENC[AES256_GCM,data:gemq8YpMZQC+gY7lmMM3tfZh9XxL40qdGlLiB2CD4SIG49w0V6E/vY7xygt0WW0zHbhMI9yUIqlRc/PaXn+QfyxJEr3IjaT05rrWUqQAeRP9Zss74Y3NtQehh8fM8SgeyU4j2CQ9f9B/lW9IgdOW/TNgQZVXGg1vXZPEzl7AZ4A=,iv:LG5ojv3hAqk+EvFa/xEn43MBqL457uKFDE3dG5lSgZo=,tag:AxzcUzmdhO411Sw7Vg1itA==,type:str]
|
|
||||||
pgp: []
|
|
||||||
unencrypted_suffix: _unencrypted
|
|
||||||
version: 3.9.1
|
|
||||||
@@ -1,19 +0,0 @@
|
|||||||
{
|
|
||||||
"data": "ENC[AES256_GCM,data:P84qHFU+xQjwQGK8I1gIdcBsHrskuUg0M1nGMMaA+hFjAdFYUhdhmAN/+y0CO28=,iv:zJtk01zNMTBDQdVtZBTM34CHRaNYDkabolxh7PWGKUI=,tag:8AS80AbZJbh9B3Av3zuI1w==,type:str]",
|
|
||||||
"sops": {
|
|
||||||
"age": [
|
|
||||||
{
|
|
||||||
"recipient": "age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBkRFB6QTIyWWdwVkV4ZXNB\nWkdSdEhMc0s4cnByWVZXTGhnSWZ0MTdEUWhJCnFlOFQ5TU1hcE91azVyZXVXRCtu\nZjIxalRLYlEreGZ6ZDNoeXNPaFN4b28KLS0tIHY5WVFXN1k4NFVmUjh6VURkcEpv\ncklGcWVhdTdBRnlOdm1qM2h5SS9UUkEKq2RyxSVymDqcsZ+yiNRujDCwk1WOWYRW\nDa4TRKg3FCe7TcCEPkIaev1aBqjLg9J9c/70SYpUm6Zgeps7v5yl3A==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"recipient": "age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSArTGVuckp2NlhMZXRNMVhO\naUV3K0h3cmZ5ZGx4Q3dJWHNqZXFJeE1kM0dFCmF4TUFUMm9mTHJlYzlYWVhNa1RH\nR29VNDIrL1IvYUpQYm5SZEYzbWhhbkkKLS0tIEJsK1dwZVdaaHpWQkpOOS90dkhx\nbGhvRXhqdFdqQmhZZmhCdmw4NUtSVG8K3z2do+/cIjAqg6EMJnubOWid1sMeTxvo\nrq6eGJ7YzdgZr2JBVtJdDRtk/KeHXu9In4efbBXwLAPIfn1pU0gm1w==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"lastmodified": "2025-08-21T19:08:48Z",
|
|
||||||
"mac": "ENC[AES256_GCM,data:5CkO09NIqttb4UZPB9iGym8avhTsMeUkTFTKZJlNGjgB1qWyGQNeKCa50A1+SbBCCWE5EwxoynB1so7bi8vnq7k8CPUHbiWG8rLOJSYHQcZ9Tu7ZGtpeWPcCw1zPWJ/PTBsFVeaT5/ufdx/6ut+sTtRoKHOZZtO9oStHmu/Rlfg=,iv:z9iJJlbvhgxJaART5QoCrqvrqlgoVlGj8jlndCALmKU=,tag:ldjmND4NVVQrHUldLrB4Jg==,type:str]",
|
|
||||||
"unencrypted_suffix": "_unencrypted",
|
|
||||||
"version": "3.10.2"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,19 +0,0 @@
|
|||||||
{
|
|
||||||
"data": "ENC[AES256_GCM,data:MQkR6FQGHK2AuhOmy2was49RY2XlLO5NwaXnUFzFo5Ata/2ufVoAj4Jvotw/dSrKL7f62A6s+2BPAyWrvACJ+pwYFlfyj3T9bNwhxwZPkEmiHEubJjWSiD6jkSW0gOxbY8ib6g/GbyF8I1cPeYr/hJD5qQ==,iv:eBL2Y3MOt9gYTETUZqsHo1D5hPOHxb4JR6Z/DFlzzqI=,tag:Qqbt39xZvQz/QhsggsArsw==,type:str]",
|
|
||||||
"sops": {
|
|
||||||
"age": [
|
|
||||||
{
|
|
||||||
"recipient": "age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSAwZzFXaEsyUkZGNFV0bVlW\nRkpPRHpUK2VwUHpOQXZCUUpoVzFGa3hycnhvCndTN0toVFdoU2E5N3V3UFhTTjU0\nNDByWTkrV0o3T295dE0zS08rVGpyQjAKLS0tIC96M0VEcWpjRk5DMjJnMFB4ZHI3\nM2Jod2x4ZzMyZm1pbDhZNTFuWGNRUlEKHs5jBSfjml09JOeKiT9vFR0Fykg6OxKG\njhFU/J2+fWB22G7dBc4PI60SNqhxIheUbGTdcz4Yp4BPL6vW3eArIw==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"recipient": "age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq",
|
|
||||||
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBJT3lxamcrQUpFdjZteFlF\nYUQ3aGdadGpuNXd2Z3RtZ3dQU0cvMlFUMUNRClBDR3U0OXZJU0NDamVMSlR5NitN\nYlhvNVlvUE0wRjErYzkwVHFOdGVCVjgKLS0tIEttR1BLTGpDYTRSQ0lUZmVEcnNi\nWkNaMEViUHVBcExVOEpjNE5CZHpjVkEKuX/Rf8kaB3apr1UhAnq3swS6fXiVmwm8\n7Key+SUAPNstbWbz0u6B9m1ev5QcXB2lx2/+Cm7cjW+6VE2gLHjTsQ==\n-----END AGE ENCRYPTED FILE-----\n"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"lastmodified": "2025-01-24T12:19:16Z",
|
|
||||||
"mac": "ENC[AES256_GCM,data:X8X91LVP1MMJ8ZYeSNPRO6XHN+NuswLZcHpAkbvoY+E9aTteO8UqS+fsStbNDlpF5jz/mhdMsKElnU8Z/CIWImwolI4GGE6blKy6gyqRkn4VeZotUoXcJadYV/5COud3XP2uSTb694JyQEZnBXFNeYeiHpN0y38zLxoX8kXHFbc=,iv:fFCRfv+Y1Nt2zgJNKsxElrYcuKkATJ3A/jvheUY2IK4=,tag:hYojbMGUAQvx7I4qkO7o9w==,type:str]",
|
|
||||||
"unencrypted_suffix": "_unencrypted",
|
|
||||||
"version": "3.9.3"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,109 +0,0 @@
|
|||||||
root_password_hash: ENC[AES256_GCM,data:wk/xEuf+qU3ezmondq9y3OIotXPI/L+TOErTjgJz58wEvQkApYkjc3bHaUTzOrmWjQBgDUENObzPmvQ8WKawUSJRVlpfOEr5TQ==,iv:I8Z3xJz3qoXBD7igx087A1fMwf8d29hQ4JEI3imRXdY=,tag:M80osQeWGG9AAA8BrMfhHA==,type:str]
|
|
||||||
ns_xfer_key: ENC[AES256_GCM,data:VFpK7GChgFeUgQm31tTvVC888bN0yt6BAnHQa6KUTg4iZGP1WL5Bx6Zp8dY=,iv:9RF1eEc7JBxBebDOKfcDjGS2U7XsHkOW/l52yIP+1LA=,tag:L6DR2QlHOfo02kzfWWCrvg==,type:str]
|
|
||||||
backup_helper_secret: ENC[AES256_GCM,data:EvXEJnDilbfALQ==,iv:Q3dkZ8Ee3qbcjcoi5GxfbaVB4uRIvkIB6ioKVV/dL2Y=,tag:T/UgZvQgYGa740Wh7D0b7Q==,type:str]
|
|
||||||
nats_nkey: ENC[AES256_GCM,data:N2CVXjdwiE7eSPUtXe+NeKSTzA9eFwK2igxaCdYsXd4Ps0/DjYb/ggnQziQzSy8viESZYjXhJ2VtNw==,iv:Xhcf5wPB01Wu0A+oMw0wzTEHATp+uN+wsaYshxIzy1w=,tag:IauTIOHqfiM75Ufml/JXbg==,type:str]
|
|
||||||
sops:
|
|
||||||
age:
|
|
||||||
- recipient: age1lznyk4ee7e7x8n92cq2n87kz9920473ks5u9jlhd3dczfzq4wamqept56u
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBuWXhzQWFmeCt1R05jREcz
|
|
||||||
Ui9HZFN5dkxHNVE0RVJGZUJUa3hKK2sxdkhBCktYcGpLeGZIQzZIV3ZZWGs3YzF1
|
|
||||||
T09sUEhPWkRkOWZFWkltQXBlM1lQV1UKLS0tIERRSlRUYW5QeW9TVjJFSmorOWNI
|
|
||||||
ZytmaEhzMjVhRXI1S0hielF0NlBrMmcK4I1PtSf7tSvSIJxWBjTnfBCO8GEFHbuZ
|
|
||||||
BkZskr5fRnWUIs72ZOGoTAVSO5ZNiBglOZ8YChl4Vz1U7bvdOCt0bw==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1hz2lz4k050ru3shrk5j3zk3f8azxmrp54pktw5a7nzjml4saudesx6jsl0
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBQcXM0RHlGcmZrYW4yNGZs
|
|
||||||
S1ZqQzVaYmQ4MGhGaTFMUVIwOTk5K0tZZjB3ClN0QkhVeHRrNXZHdmZWMzFBRnJ6
|
|
||||||
WTFtaWZyRmx2TitkOXkrVkFiYVd3RncKLS0tIExpeGUvY1VpODNDL2NCaUhtZkp0
|
|
||||||
cGNVZTI3UGxlNWdFWVZMd3FlS3pDR3cKBulaMeonV++pArXOg3ilgKnW/51IyT6Z
|
|
||||||
vH9HOJUix+ryEwDIcjv4aWx9pYDHthPFZUDC25kLYG91WrJFQOo2oA==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1w2q4gm2lrcgdzscq8du3ssyvk6qtzm4fcszc92z9ftclq23yyydqdga5um
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBabTdsZWxZQjV2TGx2YjNM
|
|
||||||
ZTgzWktqTjY0S0M3bFpNZXlDRDk5TSt3V2k0CjdWWTN0TlRlK1RpUm9xYW03MFFG
|
|
||||||
aWN4a3o4VUVnYzBDd2FrelUraWtrMTAKLS0tIE1vTGpKYkhzcWErWDRreml2QmE2
|
|
||||||
ZkNIWERKb1drdVR6MTBSTnVmdm51VEkKVNDYdyBSrUT7dUn6a4eF7ELQ2B2Pk6V9
|
|
||||||
Z5fbT75ibuyX1JO315/gl2P/FhxmlRW1K6e+04gQe2R/t/3H11Q7YQ==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1d2w5zece9647qwyq4vas9qyqegg96xwmg6c86440a6eg4uj6dd2qrq0w3l
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBVSFhDOFRVbnZWbVlQaG5G
|
|
||||||
U0NWekU0NzI1SlpRN0NVS1hPN210MXY3Z244CmtFemR5OUpzdlBzMHBUV3g0SFFo
|
|
||||||
eUtqNThXZDJ2b01yVVVuOFdwQVo2Qm8KLS0tIHpXRWd3OEpPRkpaVDNDTEJLMWEv
|
|
||||||
ZlZtaFpBdzF0YXFmdjNkNUR3YkxBZU0KAub+HF/OBZQR9bx/SVadZcL6Ms+NQ7yq
|
|
||||||
21HCcDTWyWHbN4ymUrIYXci1A/0tTOrQL9Mkvaz7IJh4VdHLPZrwwA==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1gq8434ku0xekqmvnseeunv83e779cg03c06gwrusnymdsr3rpufqx6vr3m
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBWkhBL1NTdjFDeEhQcEgv
|
|
||||||
Z3c3Z213L2ZhWGo0Qm5Zd1A1RTBDY3plUkh3CkNWV2ZtNWkrUjB0eWFzUlVtbHlk
|
|
||||||
WTdTQjN4eDIzY0c0dyt6ajVXZ0krd1UKLS0tIHB4aEJqTTRMenV3UkFkTGEySjQ2
|
|
||||||
YVM1a3ZPdUU4T244UU0rc3hVQ3NYczQK10wug4kTjsvv/iOPWi5WrVZMOYUq4/Mf
|
|
||||||
oXS4sikXeUsqH1T2LUBjVnUieSneQVn7puYZlN+cpDQ0XdK/RZ+91A==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1288993th0ge00reg4zqueyvmkrsvk829cs068eekjqfdprsrkeqql7mljk
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBYcEtHbjNWRkdodUxYdHRn
|
|
||||||
MDBMU08zWDlKa0Z4cHJvc28rZk5pUjhnMjE0CmdzRmVGWDlYQ052Wm1zWnlYSFV6
|
|
||||||
dURQK3JSbThxQlg3M2ZaL1hGRzVuL0UKLS0tIEI3UGZvbEpvRS9aR2J2Tnc1YmxZ
|
|
||||||
aUY5Q2MrdHNQWDJNaGt5MWx6MVRrRVEKRPxyAekGHFMKs0Z6spVDayBA4EtPk18e
|
|
||||||
jiFc97BGVtC5IoSu4icq3ZpKOdxymnkqKEt0YP/p/JTC+8MKvTJFQw==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1vpns76ykll8jgdlu3h05cur4ew2t3k7u03kxdg8y6ypfhsfhq9fqyurjey
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBQL3ZMUkI1dUV1T2tTSHhn
|
|
||||||
SjhyQ3dKTytoaDBNcit1VHpwVGUzWVNpdjBnCklYZWtBYzBpcGxZSDBvM2tIZm9H
|
|
||||||
bTFjb1ZCaDkrOU1JODVBVTBTbmxFbmcKLS0tIGtGcS9kejZPZlhHRXI5QnI5Wm9Q
|
|
||||||
VjMxTDdWZEltWThKVDl0S24yWHJxZHcKgzH79zT2I7ZgyTbbbvIhLN/rEcfiomJH
|
|
||||||
oSZDFvPiXlhPgy8bRyyq3l47CVpWbUI2Y7DFXRuODpLUirt3K3TmCA==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1hchvlf3apn8g8jq2743pw53sd6v6ay6xu6lqk0qufrjeccan9vzsc7hdfq
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBPcm9zUm1XUkpLWm1Jb3Uw
|
|
||||||
RncveGozOW5SRThEM1Y4SFF5RDdxUEhZTUE4CjVESHE5R3JZK0krOXZDL0RHR0oy
|
|
||||||
Z3JKaEpydjRjeFFHck1ic2JTRU5yZTQKLS0tIGY2ck56eG95YnpDYlNqUDh5RVp1
|
|
||||||
U3dRYkNleUtsQU1LMWpDbitJbnRIem8K+27HRtZihG8+k7ZC33XVfuXDFjC1e8lA
|
|
||||||
kffmxp9kOEShZF3IKmAjVHFBiPXRyGk3fGPyQLmSMK2UOOfCy/a/qA==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1w029fksjv0edrff9p7s03tgk3axecdkppqymfpwfn2nu2gsqqefqc37sxq
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBTZHlldDdSOEhjTklCSXQr
|
|
||||||
U2pXajFwZnNqQzZOTzY5b3lkMzlyREhXRWo4CmxId2F6NkNqeHNCSWNrcUJIY0Nw
|
|
||||||
cGF6NXJaQnovK1FYSXQ2TkJSTFloTUEKLS0tIHRhWk5aZ0lDVkZaZEJobm9FTDNw
|
|
||||||
a29sZE1GL2ZQSk0vUEc1ZGhkUlpNRkEK9tfe7cNOznSKgxshd5Z6TQiNKp+XW6XH
|
|
||||||
VvPgMqMitgiDYnUPj10bYo3kqhd0xZH2IhLXMnZnqqQ0I23zfPiNaw==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1ha34qeksr4jeaecevqvv2afqem67eja2mvawlmrqsudch0e7fe7qtpsekv
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB5bk9NVjJNWmMxUGd3cXRx
|
|
||||||
amZ5SWJ3dHpHcnM4UHJxdmh6NnhFVmJQdldzCm95dHN3R21qSkE4Vm9VTnVPREp3
|
|
||||||
dUQyS1B4MWhhdmd3dk5LQ0htZEtpTWMKLS0tIGFaa3MxVExFYk1MY2loOFBvWm1o
|
|
||||||
L0NoRStkeW9VZVdpWlhteC8yTnRmMUkKMYjUdE1rGgVR29FnhJ5OEVjTB1Rh5Mtu
|
|
||||||
M/DvlhW3a7tZU8nDF3IgG2GE5xOXZMDO9QWGdB8zO2RJZAr3Q+YIlA==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
- recipient: age1cxt8kwqzx35yuldazcc49q88qvgy9ajkz30xu0h37uw3ts97jagqgmn2ga
|
|
||||||
enc: |
|
|
||||||
-----BEGIN AGE ENCRYPTED FILE-----
|
|
||||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBBU0xYMnhqOE0wdXdleStF
|
|
||||||
THcrY2NBQzNoRHdYTXY3ZmM5YXRZZkQ4aUZnCm9ad0IxSWxYT1JBd2RseUdVT1pi
|
|
||||||
UXBuNzFxVlN0OWNTQU5BV2NiVEV0RUUKLS0tIGJHY0dzSDczUzcrV0RpTjE0czEy
|
|
||||||
cWZMNUNlTzBRcEV5MjlRV1BsWGhoaUUKGhYaH8I0oPCfrbs7HbQKVOF/99rg3HXv
|
|
||||||
RRTXUI71/ejKIuxehOvifClQc3nUW73bWkASFQ0guUvO4R+c0xOgUg==
|
|
||||||
-----END AGE ENCRYPTED FILE-----
|
|
||||||
lastmodified: "2025-02-11T21:18:22Z"
|
|
||||||
mac: ENC[AES256_GCM,data:5//boMp1awc/2XAkSASSCuobpkxa0E6IKf3GR8xHpMoCD30FJsCwV7PgX3fR8OuLEhOJ7UguqMNQdNqG37RMacreuDmI1J8oCFKp+3M2j4kCbXaEo8bw7WAtyjUez+SAXKzZWYmBibH0KOy6jdt+v0fdgy5hMBT4IFDofYRsyD0=,iv:6pD+SLwncpmal/FR4U8It2njvaQfUzzpALBCxa0NyME=,tag:4QN8ZFjdqck5ZgulF+FtbA==,type:str]
|
|
||||||
unencrypted_suffix: _unencrypted
|
|
||||||
version: 3.9.4
|
|
||||||
@@ -1,57 +0,0 @@
|
|||||||
{ pkgs, config, ... }:
|
|
||||||
{
|
|
||||||
vault.secrets.actions-token = {
|
|
||||||
secretPath = "hosts/nix-cache01/actions-token";
|
|
||||||
extractKey = "token";
|
|
||||||
outputDir = "/run/secrets/actions-token-1";
|
|
||||||
services = [ "gitea-runner-actions1" ];
|
|
||||||
};
|
|
||||||
|
|
||||||
virtualisation.podman = {
|
|
||||||
enable = true;
|
|
||||||
dockerCompat = true;
|
|
||||||
};
|
|
||||||
|
|
||||||
services.gitea-actions-runner.instances = {
|
|
||||||
actions1 = {
|
|
||||||
enable = true;
|
|
||||||
tokenFile = "/run/secrets/actions-token-1";
|
|
||||||
name = "actions1.home.2rjus.net";
|
|
||||||
settings = {
|
|
||||||
log = {
|
|
||||||
level = "debug";
|
|
||||||
};
|
|
||||||
|
|
||||||
runner = {
|
|
||||||
file = ".runner";
|
|
||||||
capacity = 4;
|
|
||||||
timeout = "2h";
|
|
||||||
shutdown_timeout = "10m";
|
|
||||||
insecure = false;
|
|
||||||
fetch_timeout = "10s";
|
|
||||||
fetch_interval = "30s";
|
|
||||||
};
|
|
||||||
|
|
||||||
cache = {
|
|
||||||
enabled = true;
|
|
||||||
dir = "/var/cache/gitea-actions1";
|
|
||||||
};
|
|
||||||
|
|
||||||
container = {
|
|
||||||
privileged = false;
|
|
||||||
};
|
|
||||||
};
|
|
||||||
labels =
|
|
||||||
builtins.map (n: "${n}:docker://gitea/runner-images:${n}") [
|
|
||||||
"ubuntu-latest"
|
|
||||||
"ubuntu-latest-slim"
|
|
||||||
"ubuntu-latest-full"
|
|
||||||
]
|
|
||||||
++ [
|
|
||||||
"homelab"
|
|
||||||
];
|
|
||||||
|
|
||||||
url = "https://git.t-juice.club";
|
|
||||||
};
|
|
||||||
};
|
|
||||||
}
|
|
||||||
@@ -1,169 +0,0 @@
|
|||||||
{ pkgs, unstable, ... }:
|
|
||||||
{
|
|
||||||
homelab.monitoring.scrapeTargets = [{
|
|
||||||
job_name = "step-ca";
|
|
||||||
port = 9000;
|
|
||||||
}];
|
|
||||||
sops.secrets."ca_root_pw" = {
|
|
||||||
sopsFile = ../../secrets/ca/secrets.yaml;
|
|
||||||
owner = "step-ca";
|
|
||||||
path = "/var/lib/step-ca/secrets/ca_root_pw";
|
|
||||||
};
|
|
||||||
sops.secrets."intermediate_ca_key" = {
|
|
||||||
sopsFile = ../../secrets/ca/keys/intermediate_ca_key;
|
|
||||||
format = "binary";
|
|
||||||
owner = "step-ca";
|
|
||||||
path = "/var/lib/step-ca/secrets/intermediate_ca_key";
|
|
||||||
};
|
|
||||||
sops.secrets."root_ca_key" = {
|
|
||||||
sopsFile = ../../secrets/ca/keys/root_ca_key;
|
|
||||||
format = "binary";
|
|
||||||
owner = "step-ca";
|
|
||||||
path = "/var/lib/step-ca/secrets/root_ca_key";
|
|
||||||
};
|
|
||||||
sops.secrets."ssh_host_ca_key" = {
|
|
||||||
sopsFile = ../../secrets/ca/keys/ssh_host_ca_key;
|
|
||||||
format = "binary";
|
|
||||||
owner = "step-ca";
|
|
||||||
path = "/var/lib/step-ca/secrets/ssh_host_ca_key";
|
|
||||||
};
|
|
||||||
sops.secrets."ssh_user_ca_key" = {
|
|
||||||
sopsFile = ../../secrets/ca/keys/ssh_user_ca_key;
|
|
||||||
format = "binary";
|
|
||||||
owner = "step-ca";
|
|
||||||
path = "/var/lib/step-ca/secrets/ssh_user_ca_key";
|
|
||||||
};
|
|
||||||
|
|
||||||
services.step-ca = {
|
|
||||||
enable = true;
|
|
||||||
package = pkgs.step-ca;
|
|
||||||
intermediatePasswordFile = "/var/lib/step-ca/secrets/ca_root_pw";
|
|
||||||
address = "0.0.0.0";
|
|
||||||
port = 443;
|
|
||||||
settings = {
|
|
||||||
metricsAddress = ":9000";
|
|
||||||
authority = {
|
|
||||||
provisioners = [
|
|
||||||
{
|
|
||||||
claims = {
|
|
||||||
enableSSHCA = true;
|
|
||||||
maxTLSCertDuration = "3600h";
|
|
||||||
defaultTLSCertDuration = "48h";
|
|
||||||
};
|
|
||||||
encryptedKey = "eyJhbGciOiJQQkVTMi1IUzI1NitBMTI4S1ciLCJjdHkiOiJqd2sranNvbiIsImVuYyI6IkEyNTZHQ00iLCJwMmMiOjYwMDAwMCwicDJzIjoiY1lWOFJPb3lteXFLMWpzcS1WM1ZXQSJ9.WS8tPK-Q4gtnSsw7MhpTzYT_oi-SQx-CsRLh7KwdZnpACtd4YbcOYg.zeyDkmKRx8BIp-eB.OQ8c-KDW07gqJFtEMqHacRBkttrbJRRz0sYR47vQWDCoWhodaXsxM_Bj2pGvUrR26ij1t7irDeypnJoh6WXvUg3n_JaIUL4HgTwKSBrXZKTscXmY7YVmRMionhAb6oS9Jgus9K4QcFDHacC9_WgtGI7dnu3m0G7c-9Ur9dcDfROfyrnAByJp1rSZMzvriQr4t9bNYjDa8E8yu9zq6aAQqF0Xg_AxwiqYqesT-sdcfrxKS61appApRgPlAhW-uuzyY0wlWtsiyLaGlWM7WMfKdHsq-VqcVrI7Gi2i77vi7OqPEberqSt8D04tIri9S_sArKqWEDnBJsL07CC41IY.CqtYfbSa_wlmIsKgNj5u7g";
|
|
||||||
key = {
|
|
||||||
alg = "ES256";
|
|
||||||
crv = "P-256";
|
|
||||||
kid = "CIjtIe7FNhsNQe1qKGD9Rpj-lrf2ExyTYCXAOd3YDjE";
|
|
||||||
kty = "EC";
|
|
||||||
use = "sig";
|
|
||||||
x = "XRMX-BeobZ-R5-xb-E9YlaRjJUfd7JQxpscaF1NMgFo";
|
|
||||||
y = "bF9xLp5-jywRD-MugMaOGbpbniPituWSLMlXRJnUUl0";
|
|
||||||
};
|
|
||||||
name = "ca@home.2rjus.net";
|
|
||||||
type = "JWK";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
name = "acme";
|
|
||||||
type = "ACME";
|
|
||||||
claims = {
|
|
||||||
maxTLSCertDuration = "3600h";
|
|
||||||
defaultTLSCertDuration = "1800h";
|
|
||||||
};
|
|
||||||
}
|
|
||||||
{
|
|
||||||
claims = {
|
|
||||||
enableSSHCA = true;
|
|
||||||
};
|
|
||||||
name = "sshpop";
|
|
||||||
type = "SSHPOP";
|
|
||||||
}
|
|
||||||
];
|
|
||||||
};
|
|
||||||
crt = "/var/lib/step-ca/certs/intermediate_ca.crt";
|
|
||||||
db = {
|
|
||||||
badgerFileLoadingMode = "";
|
|
||||||
dataSource = "/var/lib/step-ca/db";
|
|
||||||
type = "badgerv2";
|
|
||||||
};
|
|
||||||
dnsNames = [
|
|
||||||
"ca.home.2rjus.net"
|
|
||||||
"10.69.13.12"
|
|
||||||
];
|
|
||||||
federatedRoots = null;
|
|
||||||
insecureAddress = "";
|
|
||||||
key = "/var/lib/step-ca/secrets/intermediate_ca_key";
|
|
||||||
logger = {
|
|
||||||
format = "text";
|
|
||||||
};
|
|
||||||
root = "/var/lib/step-ca/certs/root_ca.crt";
|
|
||||||
ssh = {
|
|
||||||
hostKey = "/var/lib/step-ca/secrets/ssh_host_ca_key";
|
|
||||||
userKey = "/var/lib/step-ca/secrets/ssh_user_ca_key";
|
|
||||||
};
|
|
||||||
templates = {
|
|
||||||
ssh = {
|
|
||||||
host = [
|
|
||||||
{
|
|
||||||
comment = "#";
|
|
||||||
name = "sshd_config.tpl";
|
|
||||||
path = "/etc/ssh/sshd_config";
|
|
||||||
requires = [
|
|
||||||
"Certificate"
|
|
||||||
"Key"
|
|
||||||
];
|
|
||||||
template = ./templates/ssh/sshd_config.tpl;
|
|
||||||
type = "snippet";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
comment = "#";
|
|
||||||
name = "ca.tpl";
|
|
||||||
path = "/etc/ssh/ca.pub";
|
|
||||||
template = ./templates/ssh/ca.tpl;
|
|
||||||
type = "snippet";
|
|
||||||
}
|
|
||||||
];
|
|
||||||
user = [
|
|
||||||
{
|
|
||||||
comment = "#";
|
|
||||||
name = "config.tpl";
|
|
||||||
path = "~/.ssh/config";
|
|
||||||
template = ./templates/ssh/config.tpl;
|
|
||||||
type = "snippet";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
comment = "#";
|
|
||||||
name = "step_includes.tpl";
|
|
||||||
path = "\${STEPPATH}/ssh/includes";
|
|
||||||
template = ./templates/ssh/step_includes.tpl;
|
|
||||||
type = "prepend-line";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
comment = "#";
|
|
||||||
name = "step_config.tpl";
|
|
||||||
path = "ssh/config";
|
|
||||||
template = ./templates/ssh/step_config.tpl;
|
|
||||||
type = "file";
|
|
||||||
}
|
|
||||||
{
|
|
||||||
comment = "#";
|
|
||||||
name = "known_hosts.tpl";
|
|
||||||
path = "ssh/known_hosts";
|
|
||||||
template = ./templates/ssh/known_hosts.tpl;
|
|
||||||
type = "file";
|
|
||||||
}
|
|
||||||
];
|
|
||||||
};
|
|
||||||
};
|
|
||||||
tls = {
|
|
||||||
cipherSuites = [
|
|
||||||
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256"
|
|
||||||
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256"
|
|
||||||
];
|
|
||||||
maxVersion = 1.3;
|
|
||||||
minVersion = 1.2;
|
|
||||||
renegotiation = false;
|
|
||||||
};
|
|
||||||
};
|
|
||||||
};
|
|
||||||
}
|
|
||||||
Binary file not shown.
@@ -1,14 +0,0 @@
|
|||||||
Host *
|
|
||||||
{{- if or .User.GOOS "none" | eq "windows" }}
|
|
||||||
{{- if .User.StepBasePath }}
|
|
||||||
Include "{{ .User.StepBasePath | replace "\\" "/" | trimPrefix "C:" }}/ssh/includes"
|
|
||||||
{{- else }}
|
|
||||||
Include "{{ .User.StepPath | replace "\\" "/" | trimPrefix "C:" }}/ssh/includes"
|
|
||||||
{{- end }}
|
|
||||||
{{- else }}
|
|
||||||
{{- if .User.StepBasePath }}
|
|
||||||
Include "{{.User.StepBasePath}}/ssh/includes"
|
|
||||||
{{- else }}
|
|
||||||
Include "{{.User.StepPath}}/ssh/includes"
|
|
||||||
{{- end }}
|
|
||||||
{{- end }}
|
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user