Compare commits
11 Commits
2f195d26d3
...
homelab-de
| Author | SHA1 | Date | |
|---|---|---|---|
|
26ca6817f0
|
|||
|
b03a9b3b64
|
|||
|
f805b9f629
|
|||
|
f3adf7e77f
|
|||
|
f6eca9decc
|
|||
| 6e93b8eae3 | |||
|
c214f8543c
|
|||
|
7933127d77
|
|||
|
13c3897e86
|
|||
|
0643f23281
|
|||
|
ad8570f8db
|
11
.mcp.json
11
.mcp.json
@@ -22,6 +22,17 @@
|
||||
"ALERTMANAGER_URL": "https://alertmanager.home.2rjus.net",
|
||||
"LOKI_URL": "http://monitoring01.home.2rjus.net:3100"
|
||||
}
|
||||
},
|
||||
"homelab-deploy": {
|
||||
"command": "nix",
|
||||
"args": [
|
||||
"run",
|
||||
"git+https://git.t-juice.club/torjus/homelab-deploy",
|
||||
"--",
|
||||
"mcp",
|
||||
"--nats-url", "nats://nats1.home.2rjus.net:4222",
|
||||
"--nkey-file", "/home/torjus/.config/homelab-deploy/test-deployer.nkey"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
45
CLAUDE.md
45
CLAUDE.md
@@ -194,6 +194,51 @@ node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
|
||||
node_filesystem_avail_bytes{mountpoint="/"}
|
||||
```
|
||||
|
||||
### Deploying to Test Hosts
|
||||
|
||||
The **homelab-deploy** MCP server enables remote deployments to test-tier hosts via NATS messaging.
|
||||
|
||||
**Available Tools:**
|
||||
|
||||
- `deploy` - Deploy NixOS configuration to test-tier hosts
|
||||
- `list_hosts` - List available deployment targets
|
||||
|
||||
**Deploy Parameters:**
|
||||
|
||||
- `hostname` - Target a specific host (e.g., `vaulttest01`)
|
||||
- `role` - Deploy to all hosts with a specific role (e.g., `vault`)
|
||||
- `all` - Deploy to all test-tier hosts
|
||||
- `action` - nixos-rebuild action: `switch` (default), `boot`, `test`, `dry-activate`
|
||||
- `branch` - Git branch or commit to deploy (default: `master`)
|
||||
|
||||
**Examples:**
|
||||
|
||||
```
|
||||
# List available hosts
|
||||
list_hosts()
|
||||
|
||||
# Deploy to a specific host
|
||||
deploy(hostname="vaulttest01", action="switch")
|
||||
|
||||
# Dry-run deployment
|
||||
deploy(hostname="vaulttest01", action="dry-activate")
|
||||
|
||||
# Deploy to all hosts with a role
|
||||
deploy(role="vault", action="switch")
|
||||
```
|
||||
|
||||
**Note:** Only test-tier hosts with `homelab.deploy.enable = true` and the listener service running will respond to deployments.
|
||||
|
||||
**Verifying Deployments:**
|
||||
|
||||
After deploying, use the `nixos_flake_info` metric from nixos-exporter to verify the host is running the expected revision:
|
||||
|
||||
```promql
|
||||
nixos_flake_info{instance=~"vaulttest01.*"}
|
||||
```
|
||||
|
||||
The `current_rev` label contains the git commit hash of the deployed flake configuration.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Directory Structure
|
||||
|
||||
122
docs/plans/long-term-metrics-storage.md
Normal file
122
docs/plans/long-term-metrics-storage.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Long-Term Metrics Storage Options
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Current Prometheus configuration retains metrics for 30 days (`retentionTime = "30d"`). Extending retention further raises disk usage concerns on the homelab hypervisor with limited local storage.
|
||||
|
||||
Prometheus does not support downsampling - it stores all data at full resolution until the retention period expires, then deletes it entirely.
|
||||
|
||||
## Current Configuration
|
||||
|
||||
Location: `services/monitoring/prometheus.nix`
|
||||
|
||||
- **Retention**: 30 days
|
||||
- **Scrape interval**: 15s
|
||||
- **Features**: Alertmanager, Pushgateway, auto-generated scrape configs from flake hosts
|
||||
- **Storage**: Local disk on monitoring01
|
||||
|
||||
## Options Evaluated
|
||||
|
||||
### Option 1: VictoriaMetrics
|
||||
|
||||
VictoriaMetrics is a Prometheus-compatible TSDB with significantly better compression (5-10x smaller storage footprint).
|
||||
|
||||
**NixOS Options Available:**
|
||||
- `services.victoriametrics.enable`
|
||||
- `services.victoriametrics.prometheusConfig` - accepts Prometheus scrape config format
|
||||
- `services.victoriametrics.retentionPeriod` - e.g., "6m" for 6 months
|
||||
- `services.vmagent` - dedicated scraping agent
|
||||
- `services.vmalert` - alerting rules evaluation
|
||||
|
||||
**Pros:**
|
||||
- Simple migration - single service replacement
|
||||
- Same PromQL query language - Grafana dashboards work unchanged
|
||||
- Same scrape config format - existing auto-generated configs work as-is
|
||||
- 5-10x better compression means 30 days of Prometheus data could become 180+ days
|
||||
- Lightweight, single binary
|
||||
|
||||
**Cons:**
|
||||
- No automatic downsampling (relies on compression alone)
|
||||
- Alerting requires switching to vmalert instead of Prometheus alertmanager integration
|
||||
- Would need to migrate existing data or start fresh
|
||||
|
||||
**Migration Steps:**
|
||||
1. Replace `services.prometheus` with `services.victoriametrics`
|
||||
2. Move scrape configs to `prometheusConfig`
|
||||
3. Set up `services.vmalert` for alerting rules
|
||||
4. Update Grafana datasource to VictoriaMetrics port (8428)
|
||||
5. Keep Alertmanager for notification routing
|
||||
|
||||
### Option 2: Thanos
|
||||
|
||||
Thanos extends Prometheus with long-term storage and automatic downsampling by uploading data to object storage.
|
||||
|
||||
**NixOS Options Available:**
|
||||
- `services.thanos.sidecar` - uploads Prometheus blocks to object storage
|
||||
- `services.thanos.compact` - compacts and downsamples data
|
||||
- `services.thanos.query` - unified query gateway
|
||||
- `services.thanos.query-frontend` - query caching and parallelization
|
||||
- `services.thanos.downsample` - dedicated downsampling service
|
||||
|
||||
**Downsampling Behavior:**
|
||||
- Raw resolution kept for configurable period (default: indefinite)
|
||||
- 5-minute resolution created after 40 hours
|
||||
- 1-hour resolution created after 10 days
|
||||
|
||||
**Retention Configuration (in compactor):**
|
||||
```nix
|
||||
services.thanos.compact = {
|
||||
retention.resolution-raw = "30d"; # Keep raw for 30 days
|
||||
retention.resolution-5m = "180d"; # Keep 5m samples for 6 months
|
||||
retention.resolution-1h = "2y"; # Keep 1h samples for 2 years
|
||||
};
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- True downsampling - older data uses progressively less storage
|
||||
- Keep metrics for years with minimal storage impact
|
||||
- Prometheus continues running unchanged
|
||||
- Existing Alertmanager integration preserved
|
||||
|
||||
**Cons:**
|
||||
- Requires object storage (MinIO, S3, or local filesystem)
|
||||
- Multiple services to manage (sidecar, compactor, query)
|
||||
- More complex architecture
|
||||
- Additional infrastructure (MinIO) may be needed
|
||||
|
||||
**Required Components:**
|
||||
1. Thanos Sidecar (runs alongside Prometheus)
|
||||
2. Object storage (MinIO or local filesystem)
|
||||
3. Thanos Compactor (handles downsampling)
|
||||
4. Thanos Query (provides unified query endpoint)
|
||||
|
||||
**Migration Steps:**
|
||||
1. Deploy object storage (MinIO or configure filesystem backend)
|
||||
2. Add Thanos sidecar pointing to Prometheus data directory
|
||||
3. Add Thanos compactor with retention policies
|
||||
4. Add Thanos query gateway
|
||||
5. Update Grafana datasource to Thanos Query port (10902)
|
||||
|
||||
## Comparison
|
||||
|
||||
| Aspect | VictoriaMetrics | Thanos |
|
||||
|--------|-----------------|--------|
|
||||
| Complexity | Low (1 service) | Higher (3-4 services) |
|
||||
| Downsampling | No | Yes (automatic) |
|
||||
| Storage savings | 5-10x compression | Compression + downsampling |
|
||||
| Object storage required | No | Yes |
|
||||
| Migration effort | Minimal | Moderate |
|
||||
| Grafana changes | Change port only | Change port only |
|
||||
| Alerting changes | Need vmalert | Keep existing |
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Start with VictoriaMetrics** for simplicity. The compression alone may provide 6+ months of retention in the same disk space currently used for 30 days.
|
||||
|
||||
If multi-year retention with true downsampling becomes necessary, Thanos can be evaluated later. However, it requires deploying object storage infrastructure (MinIO) which adds operational complexity.
|
||||
|
||||
## References
|
||||
|
||||
- VictoriaMetrics docs: https://docs.victoriametrics.com/
|
||||
- Thanos docs: https://thanos.io/tip/thanos/getting-started.md/
|
||||
- NixOS options searched from nixpkgs revision e576e3c9 (NixOS 25.11)
|
||||
22
flake.lock
generated
22
flake.lock
generated
@@ -21,6 +21,27 @@
|
||||
"url": "https://git.t-juice.club/torjus/alerttonotify"
|
||||
}
|
||||
},
|
||||
"homelab-deploy": {
|
||||
"inputs": {
|
||||
"nixpkgs": [
|
||||
"nixpkgs-unstable"
|
||||
]
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1770447502,
|
||||
"narHash": "sha256-xH1PNyE3ydj4udhe1IpK8VQxBPZETGLuORZdSWYRmSU=",
|
||||
"ref": "master",
|
||||
"rev": "79db119d1ca6630023947ef0a65896cc3307c2ff",
|
||||
"revCount": 22,
|
||||
"type": "git",
|
||||
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
||||
},
|
||||
"original": {
|
||||
"ref": "master",
|
||||
"type": "git",
|
||||
"url": "https://git.t-juice.club/torjus/homelab-deploy"
|
||||
}
|
||||
},
|
||||
"labmon": {
|
||||
"inputs": {
|
||||
"nixpkgs": [
|
||||
@@ -97,6 +118,7 @@
|
||||
"root": {
|
||||
"inputs": {
|
||||
"alerttonotify": "alerttonotify",
|
||||
"homelab-deploy": "homelab-deploy",
|
||||
"labmon": "labmon",
|
||||
"nixos-exporter": "nixos-exporter",
|
||||
"nixpkgs": "nixpkgs",
|
||||
|
||||
15
flake.nix
15
flake.nix
@@ -21,6 +21,10 @@
|
||||
url = "git+https://git.t-juice.club/torjus/nixos-exporter";
|
||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||
};
|
||||
homelab-deploy = {
|
||||
url = "git+https://git.t-juice.club/torjus/homelab-deploy?ref=master";
|
||||
inputs.nixpkgs.follows = "nixpkgs-unstable";
|
||||
};
|
||||
};
|
||||
|
||||
outputs =
|
||||
@@ -32,6 +36,7 @@
|
||||
alerttonotify,
|
||||
labmon,
|
||||
nixos-exporter,
|
||||
homelab-deploy,
|
||||
...
|
||||
}@inputs:
|
||||
let
|
||||
@@ -58,6 +63,7 @@
|
||||
)
|
||||
sops-nix.nixosModules.sops
|
||||
nixos-exporter.nixosModules.default
|
||||
homelab-deploy.nixosModules.default
|
||||
./modules/homelab
|
||||
];
|
||||
allSystems = [
|
||||
@@ -219,11 +225,12 @@
|
||||
{ pkgs }:
|
||||
{
|
||||
default = pkgs.mkShell {
|
||||
packages = with pkgs; [
|
||||
ansible
|
||||
opentofu
|
||||
openbao
|
||||
packages = [
|
||||
pkgs.ansible
|
||||
pkgs.opentofu
|
||||
pkgs.openbao
|
||||
(pkgs.callPackage ./scripts/create-host { })
|
||||
homelab-deploy.packages.${pkgs.system}.default
|
||||
];
|
||||
};
|
||||
}
|
||||
|
||||
@@ -57,6 +57,7 @@
|
||||
|
||||
# Vault secrets management
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
vault.secrets.backup-helper = {
|
||||
secretPath = "shared/backup/password";
|
||||
extractKey = "password";
|
||||
|
||||
@@ -61,6 +61,7 @@
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
|
||||
@@ -58,6 +58,7 @@
|
||||
|
||||
# Vault secrets management
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
vault.secrets.backup-helper = {
|
||||
secretPath = "shared/backup/password";
|
||||
extractKey = "password";
|
||||
|
||||
@@ -55,6 +55,7 @@
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
nix.settings.tarball-ttl = 0;
|
||||
environment.systemPackages = with pkgs; [
|
||||
|
||||
@@ -48,6 +48,7 @@
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
homelab.host = {
|
||||
role = "dns";
|
||||
|
||||
@@ -48,6 +48,7 @@
|
||||
"flakes"
|
||||
];
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
homelab.host = {
|
||||
role = "dns";
|
||||
|
||||
@@ -81,6 +81,7 @@ in
|
||||
vim
|
||||
wget
|
||||
git
|
||||
htop # test deploy verification
|
||||
];
|
||||
|
||||
# Open ports in the firewall.
|
||||
@@ -92,6 +93,7 @@ in
|
||||
# Testing config
|
||||
# Enable Vault secrets management
|
||||
vault.enable = true;
|
||||
homelab.deploy.enable = true;
|
||||
|
||||
# Define a test secret
|
||||
vault.secrets.test-service = {
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{ ... }:
|
||||
{
|
||||
imports = [
|
||||
./deploy.nix
|
||||
./dns.nix
|
||||
./host.nix
|
||||
./monitoring.nix
|
||||
|
||||
16
modules/homelab/deploy.nix
Normal file
16
modules/homelab/deploy.nix
Normal file
@@ -0,0 +1,16 @@
|
||||
{ config, lib, ... }:
|
||||
|
||||
{
|
||||
options.homelab.deploy = {
|
||||
enable = lib.mkEnableOption "homelab-deploy listener for NATS-based deployments";
|
||||
};
|
||||
|
||||
config = {
|
||||
assertions = [
|
||||
{
|
||||
assertion = config.homelab.deploy.enable -> config.vault.enable;
|
||||
message = "homelab.deploy.enable requires vault.enable to be true (needed for NKey secret)";
|
||||
}
|
||||
];
|
||||
};
|
||||
}
|
||||
@@ -1,16 +1,18 @@
|
||||
{ ... }:
|
||||
{
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "nats";
|
||||
port = 7777;
|
||||
}];
|
||||
homelab.monitoring.scrapeTargets = [
|
||||
{
|
||||
job_name = "nats";
|
||||
port = 7777;
|
||||
}
|
||||
];
|
||||
|
||||
services.prometheus.exporters.nats = {
|
||||
enable = true;
|
||||
url = "http://localhost:8222";
|
||||
extraFlags = [
|
||||
"-varz" # General server info
|
||||
"-connz" # Connection info
|
||||
"-varz" # General server info
|
||||
"-connz" # Connection info
|
||||
"-jsz=all" # JetStream info
|
||||
];
|
||||
};
|
||||
@@ -38,6 +40,48 @@
|
||||
}
|
||||
];
|
||||
};
|
||||
|
||||
DEPLOY = {
|
||||
users = [
|
||||
# Shared listener (all hosts use this)
|
||||
{
|
||||
nkey = "UCCZJSUGLCSLBBKHBPL4QA66TUMQUGIXGLIFTWDEH43MGWM3LDD232X4";
|
||||
permissions = {
|
||||
subscribe = [
|
||||
"deploy.test.>"
|
||||
"deploy.prod.>"
|
||||
"deploy.discover"
|
||||
];
|
||||
publish = [
|
||||
"deploy.responses.>"
|
||||
"deploy.discover"
|
||||
];
|
||||
};
|
||||
}
|
||||
# Test deployer (MCP without admin)
|
||||
{
|
||||
nkey = "UBR66CX2ZNY5XNVQF5VBG4WFAF54LSGUYCUNNCEYRILDQ4NXDAD2THZU";
|
||||
permissions = {
|
||||
publish = [
|
||||
"deploy.test.>"
|
||||
"deploy.discover"
|
||||
];
|
||||
subscribe = [
|
||||
"deploy.responses.>"
|
||||
"deploy.discover"
|
||||
];
|
||||
};
|
||||
}
|
||||
# Admin deployer (full access)
|
||||
{
|
||||
nkey = "UD2BFB7DLM67P5UUVCKBUJMCHADIZLGGVUNSRLZE2ZC66FW2XT44P73Y";
|
||||
permissions = {
|
||||
publish = [ "deploy.>" ];
|
||||
subscribe = [ "deploy.>" ];
|
||||
};
|
||||
}
|
||||
];
|
||||
};
|
||||
};
|
||||
system_account = "ADMIN";
|
||||
jetstream = {
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
imports = [
|
||||
./acme.nix
|
||||
./autoupgrade.nix
|
||||
./homelab-deploy.nix
|
||||
./monitoring
|
||||
./motd.nix
|
||||
./packages.nix
|
||||
|
||||
37
system/homelab-deploy.nix
Normal file
37
system/homelab-deploy.nix
Normal file
@@ -0,0 +1,37 @@
|
||||
{ config, lib, ... }:
|
||||
|
||||
let
|
||||
hostCfg = config.homelab.host;
|
||||
in
|
||||
{
|
||||
config = lib.mkIf config.homelab.deploy.enable {
|
||||
# Fetch listener NKey from Vault
|
||||
vault.secrets.homelab-deploy-nkey = {
|
||||
secretPath = "shared/homelab-deploy/listener-nkey";
|
||||
extractKey = "nkey";
|
||||
};
|
||||
|
||||
# Enable homelab-deploy listener
|
||||
services.homelab-deploy.listener = {
|
||||
enable = true;
|
||||
tier = hostCfg.tier;
|
||||
role = hostCfg.role;
|
||||
natsUrl = "nats://nats1.home.2rjus.net:4222";
|
||||
nkeyFile = "/run/secrets/homelab-deploy-nkey";
|
||||
flakeUrl = "git+https://git.t-juice.club/torjus/nixos-servers.git";
|
||||
metrics.enable = true;
|
||||
};
|
||||
|
||||
# Expose metrics for Prometheus scraping
|
||||
homelab.monitoring.scrapeTargets = [{
|
||||
job_name = "homelab-deploy";
|
||||
port = 9972;
|
||||
}];
|
||||
|
||||
# Ensure listener starts after vault secret is available
|
||||
systemd.services.homelab-deploy-listener = {
|
||||
after = [ "vault-secret-homelab-deploy-nkey.service" ];
|
||||
requires = [ "vault-secret-homelab-deploy-nkey.service" ];
|
||||
};
|
||||
};
|
||||
}
|
||||
@@ -4,6 +4,17 @@ resource "vault_auth_backend" "approle" {
|
||||
path = "approle"
|
||||
}
|
||||
|
||||
# Shared policy for homelab-deploy (all hosts need this for NATS-based deployments)
|
||||
resource "vault_policy" "homelab_deploy" {
|
||||
name = "homelab-deploy"
|
||||
|
||||
policy = <<EOT
|
||||
path "secret/data/shared/homelab-deploy/*" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
EOT
|
||||
}
|
||||
|
||||
# Define host access policies
|
||||
locals {
|
||||
host_policies = {
|
||||
@@ -89,6 +100,12 @@ locals {
|
||||
"secret/data/hosts/nix-cache01/*",
|
||||
]
|
||||
}
|
||||
|
||||
"vaulttest01" = {
|
||||
paths = [
|
||||
"secret/data/hosts/vaulttest01/*",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -114,7 +131,7 @@ resource "vault_approle_auth_backend_role" "hosts" {
|
||||
backend = vault_auth_backend.approle.path
|
||||
role_name = each.key
|
||||
token_policies = concat(
|
||||
["${each.key}-policy"],
|
||||
["${each.key}-policy", "homelab-deploy"],
|
||||
lookup(each.value, "extra_policies", [])
|
||||
)
|
||||
|
||||
|
||||
@@ -92,6 +92,22 @@ locals {
|
||||
auto_generate = false
|
||||
data = { token = var.actions_token_1 }
|
||||
}
|
||||
|
||||
# Homelab-deploy NKeys
|
||||
"shared/homelab-deploy/listener-nkey" = {
|
||||
auto_generate = false
|
||||
data = { nkey = var.homelab_deploy_listener_nkey }
|
||||
}
|
||||
|
||||
"shared/homelab-deploy/test-deployer-nkey" = {
|
||||
auto_generate = false
|
||||
data = { nkey = var.homelab_deploy_test_deployer_nkey }
|
||||
}
|
||||
|
||||
"shared/homelab-deploy/admin-deployer-nkey" = {
|
||||
auto_generate = false
|
||||
data = { nkey = var.homelab_deploy_admin_deployer_nkey }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -52,3 +52,24 @@ variable "actions_token_1" {
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "homelab_deploy_listener_nkey" {
|
||||
description = "NKey seed for homelab-deploy listeners"
|
||||
type = string
|
||||
default = "PLACEHOLDER"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "homelab_deploy_test_deployer_nkey" {
|
||||
description = "NKey seed for test-tier deployer"
|
||||
type = string
|
||||
default = "PLACEHOLDER"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "homelab_deploy_admin_deployer_nkey" {
|
||||
description = "NKey seed for admin deployer"
|
||||
type = string
|
||||
default = "PLACEHOLDER"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user