ansible: restructure with dynamic inventory from flake

- Move playbooks/ to ansible/playbooks/
- Add dynamic inventory script that extracts hosts from flake
  - Groups by tier (tier_test, tier_prod) and role (role_dns, etc.)
  - Reads homelab.host.* options for metadata
- Add static inventory for non-flake hosts (Proxmox)
- Add ansible.cfg with inventory path and SSH optimizations
- Add group_vars/all.yml for common variables
- Add restart-service.yml playbook for restarting systemd services
- Update provision-approle.yml with single-host safeguard
- Add ANSIBLE_CONFIG to devshell for automatic inventory discovery
- Add ansible = "false" label to template2 to exclude from inventory
- Update CLAUDE.md to reference ansible/README.md for details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-09 21:41:29 +01:00
parent 7ff3d2a09b
commit 6e08ba9720
13 changed files with 403 additions and 44 deletions

116
ansible/README.md Normal file
View File

@@ -0,0 +1,116 @@
# Ansible Configuration
This directory contains Ansible configuration for fleet management tasks.
## Structure
```
ansible/
├── ansible.cfg # Ansible configuration
├── inventory/
│ ├── dynamic_flake.py # Dynamic inventory from NixOS flake
│ ├── static.yml # Non-flake hosts (Proxmox, etc.)
│ └── group_vars/
│ └── all.yml # Common variables
└── playbooks/
├── build-and-deploy-template.yml
├── provision-approle.yml
├── restart-service.yml
└── run-upgrade.yml
```
## Usage
The devshell automatically configures `ANSIBLE_CONFIG`, so commands work without extra flags:
```bash
# List inventory groups
nix develop -c ansible-inventory --graph
# List hosts in a specific group
nix develop -c ansible-inventory --list | jq '.role_dns'
# Run a playbook
nix develop -c ansible-playbook ansible/playbooks/run-upgrade.yml -l tier_test
```
## Inventory
The inventory combines dynamic and static sources automatically.
### Dynamic Inventory (from flake)
The `dynamic_flake.py` script extracts hosts from the NixOS flake using `homelab.host.*` options:
**Groups generated:**
- `flake_hosts` - All NixOS hosts from the flake
- `tier_test`, `tier_prod` - By `homelab.host.tier`
- `role_dns`, `role_vault`, `role_monitoring`, etc. - By `homelab.host.role`
**Host variables set:**
- `tier` - Deployment tier (test/prod)
- `role` - Host role
- `short_hostname` - Hostname without domain
### Static Inventory
Non-flake hosts are defined in `inventory/static.yml`:
- `proxmox` - Proxmox hypervisors
## Playbooks
| Playbook | Description | Example |
|----------|-------------|---------|
| `run-upgrade.yml` | Trigger nixos-upgrade on hosts | `-l tier_prod` |
| `restart-service.yml` | Restart a systemd service | `-l role_dns -e service=unbound` |
| `provision-approle.yml` | Deploy Vault credentials (single host only) | `-l testvm01` |
| `build-and-deploy-template.yml` | Build and deploy Proxmox template | (no limit needed) |
### Examples
```bash
# Restart unbound on all DNS servers
nix develop -c ansible-playbook ansible/playbooks/restart-service.yml \
-l role_dns -e service=unbound
# Trigger upgrade on all test hosts
nix develop -c ansible-playbook ansible/playbooks/run-upgrade.yml -l tier_test
# Provision Vault credentials for a specific host
nix develop -c ansible-playbook ansible/playbooks/provision-approle.yml -l testvm01
# Build and deploy Proxmox template
nix develop -c ansible-playbook ansible/playbooks/build-and-deploy-template.yml
```
## Excluding Flake Hosts
To exclude a flake host from the dynamic inventory, add the `ansible = "false"` label in the host's configuration:
```nix
homelab.host.labels.ansible = "false";
```
Hosts with `homelab.dns.enable = false` are also excluded automatically.
## Adding Non-Flake Hosts
Edit `inventory/static.yml` to add hosts not managed by the NixOS flake:
```yaml
all:
children:
my_group:
hosts:
host1.example.com:
ansible_user: admin
```
## Common Variables
Variables in `inventory/group_vars/all.yml` apply to all hosts:
- `ansible_user` - Default SSH user (root)
- `domain` - Domain name (home.2rjus.net)
- `vault_addr` - Vault server URL

16
ansible/ansible.cfg Normal file
View File

@@ -0,0 +1,16 @@
[defaults]
inventory = inventory/
remote_user = root
host_key_checking = False
# Reduce SSH connection overhead
forks = 10
pipelining = True
# Output formatting
stdout_callback = yaml
callbacks_enabled = profile_tasks
[ssh_connection]
# Reuse SSH connections
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

View File

@@ -0,0 +1,158 @@
#!/usr/bin/env python3
"""
Dynamic Ansible inventory script that extracts host information from the NixOS flake.
Generates groups:
- flake_hosts: All hosts defined in the flake
- tier_test, tier_prod: Hosts by deployment tier
- role_<name>: Hosts by role (dns, vault, monitoring, etc.)
Usage:
./dynamic_flake.py --list # Return full inventory
./dynamic_flake.py --host X # Return host vars (not used, but required by Ansible)
"""
import json
import subprocess
import sys
from pathlib import Path
def get_flake_dir() -> Path:
"""Find the flake root directory."""
script_dir = Path(__file__).resolve().parent
# ansible/inventory/dynamic_flake.py -> repo root
return script_dir.parent.parent
def evaluate_flake() -> dict:
"""Evaluate the flake and extract host metadata."""
flake_dir = get_flake_dir()
# Nix expression to extract relevant config from each host
nix_expr = """
configs: builtins.mapAttrs (name: cfg: {
hostname = cfg.config.networking.hostName;
domain = cfg.config.networking.domain or "home.2rjus.net";
tier = cfg.config.homelab.host.tier;
role = cfg.config.homelab.host.role;
labels = cfg.config.homelab.host.labels;
dns_enabled = cfg.config.homelab.dns.enable;
}) configs
"""
try:
result = subprocess.run(
[
"nix",
"eval",
"--json",
f"{flake_dir}#nixosConfigurations",
"--apply",
nix_expr,
],
capture_output=True,
text=True,
check=True,
cwd=flake_dir,
)
return json.loads(result.stdout)
except subprocess.CalledProcessError as e:
print(f"Error evaluating flake: {e.stderr}", file=sys.stderr)
sys.exit(1)
except json.JSONDecodeError as e:
print(f"Error parsing nix output: {e}", file=sys.stderr)
sys.exit(1)
def sanitize_group_name(name: str) -> str:
"""Sanitize a string for use as an Ansible group name.
Ansible group names should contain only alphanumeric characters and underscores.
"""
return name.replace("-", "_")
def build_inventory(hosts_data: dict) -> dict:
"""Build Ansible inventory structure from host data."""
inventory = {
"_meta": {"hostvars": {}},
"flake_hosts": {"hosts": []},
}
# Track groups we need to create
tier_groups: dict[str, list[str]] = {}
role_groups: dict[str, list[str]] = {}
for _config_name, host_info in hosts_data.items():
hostname = host_info["hostname"]
domain = host_info["domain"]
tier = host_info["tier"]
role = host_info["role"]
labels = host_info["labels"]
dns_enabled = host_info["dns_enabled"]
# Skip hosts that have DNS disabled (like templates)
if not dns_enabled:
continue
# Skip hosts with ansible = "false" label
if labels.get("ansible") == "false":
continue
fqdn = f"{hostname}.{domain}"
# Add to flake_hosts group
inventory["flake_hosts"]["hosts"].append(fqdn)
# Add host variables
inventory["_meta"]["hostvars"][fqdn] = {
"tier": tier,
"role": role,
"short_hostname": hostname,
}
# Group by tier
tier_group = f"tier_{sanitize_group_name(tier)}"
if tier_group not in tier_groups:
tier_groups[tier_group] = []
tier_groups[tier_group].append(fqdn)
# Group by role (if set)
if role:
role_group = f"role_{sanitize_group_name(role)}"
if role_group not in role_groups:
role_groups[role_group] = []
role_groups[role_group].append(fqdn)
# Add tier groups to inventory
for group_name, hosts in tier_groups.items():
inventory[group_name] = {"hosts": hosts}
# Add role groups to inventory
for group_name, hosts in role_groups.items():
inventory[group_name] = {"hosts": hosts}
return inventory
def main():
if len(sys.argv) < 2:
print("Usage: dynamic_flake.py --list | --host <hostname>", file=sys.stderr)
sys.exit(1)
if sys.argv[1] == "--list":
hosts_data = evaluate_flake()
inventory = build_inventory(hosts_data)
print(json.dumps(inventory, indent=2))
elif sys.argv[1] == "--host":
# Ansible calls this to get vars for a specific host
# We provide all vars in _meta.hostvars, so just return empty
print(json.dumps({}))
else:
print(f"Unknown option: {sys.argv[1]}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,5 @@
# Common variables for all hosts
ansible_user: root
domain: home.2rjus.net
vault_addr: https://vault01.home.2rjus.net:8200

View File

@@ -0,0 +1,10 @@
# Static inventory for non-flake hosts
#
# Hosts defined here are merged with the dynamic flake inventory.
# Use this for infrastructure that isn't managed by NixOS.
all:
children:
proxmox:
hosts:
pve1.home.2rjus.net:

View File

@@ -0,0 +1,146 @@
---
- name: Build and deploy NixOS Proxmox template
hosts: localhost
gather_facts: false
vars:
template_name: "template2"
nixos_config: "template2"
proxmox_node: "pve1.home.2rjus.net" # Change to your Proxmox node name
proxmox_host: "pve1.home.2rjus.net" # Change to your Proxmox host
template_vmid: 9000 # Template VM ID
storage: "local-zfs"
tasks:
- name: Build NixOS image
ansible.builtin.command:
cmd: "nixos-rebuild build-image --image-variant proxmox --flake .#template2"
chdir: "{{ playbook_dir }}/../.."
register: build_result
changed_when: true
- name: Find built image file
ansible.builtin.find:
paths: "{{ playbook_dir}}/../../result"
patterns: "*.vma.zst"
recurse: true
register: image_files
- name: Fail if no image found
ansible.builtin.fail:
msg: "No QCOW2 image found in build output"
when: image_files.matched == 0
- name: Set image path
ansible.builtin.set_fact:
image_path: "{{ image_files.files[0].path }}"
- name: Extract image filename
ansible.builtin.set_fact:
image_filename: "{{ image_path | basename }}"
- name: Display image info
ansible.builtin.debug:
msg: "Built image: {{ image_path }} ({{ image_filename }})"
- name: Deploy template to Proxmox
hosts: proxmox
gather_facts: false
vars:
template_name: "template2"
template_vmid: 9000
storage: "local-zfs"
tasks:
- name: Get image path and filename from localhost
ansible.builtin.set_fact:
image_path: "{{ hostvars['localhost']['image_path'] }}"
image_filename: "{{ hostvars['localhost']['image_filename'] }}"
- name: Set destination path
ansible.builtin.set_fact:
image_dest: "/var/lib/vz/dump/{{ image_filename }}"
- name: Copy image to Proxmox
ansible.builtin.copy:
src: "{{ image_path }}"
dest: "{{ image_dest }}"
mode: '0644'
- name: Check if template VM already exists
ansible.builtin.command:
cmd: "qm status {{ template_vmid }}"
register: vm_status
failed_when: false
changed_when: false
- name: Destroy existing template VM if it exists
ansible.builtin.command:
cmd: "qm destroy {{ template_vmid }} --purge"
when: vm_status.rc == 0
changed_when: true
- name: Import image
ansible.builtin.command:
cmd: "qmrestore {{ image_dest }} {{ template_vmid }}"
changed_when: true
- name: Convert VM to template
ansible.builtin.command:
cmd: "qm template {{ template_vmid }}"
changed_when: true
- name: Clean up uploaded image
ansible.builtin.file:
path: "{{ image_dest }}"
state: absent
- name: Display success message
ansible.builtin.debug:
msg: "Template VM {{ template_vmid }} created successfully on {{ storage }}"
- name: Update Terraform template name
hosts: localhost
gather_facts: false
vars:
terraform_dir: "{{ playbook_dir }}/../../terraform"
tasks:
- name: Get image filename from earlier play
ansible.builtin.set_fact:
image_filename: "{{ hostvars['localhost']['image_filename'] }}"
- name: Extract template name from image filename
ansible.builtin.set_fact:
new_template_name: "{{ image_filename | regex_replace('\\.vma\\.zst$', '') | regex_replace('^vzdump-qemu-', '') }}"
- name: Read current Terraform variables file
ansible.builtin.slurp:
src: "{{ terraform_dir }}/variables.tf"
register: variables_tf_content
- name: Extract current template name from variables.tf
ansible.builtin.set_fact:
current_template_name: "{{ (variables_tf_content.content | b64decode) | regex_search('variable \"default_template_name\"[^}]+default\\s*=\\s*\"([^\"]+)\"', '\\1') | first }}"
- name: Check if template name has changed
ansible.builtin.set_fact:
template_name_changed: "{{ current_template_name != new_template_name }}"
- name: Display template name status
ansible.builtin.debug:
msg: "Template name: {{ current_template_name }} -> {{ new_template_name }} ({{ 'changed' if template_name_changed else 'unchanged' }})"
- name: Update default_template_name in variables.tf
ansible.builtin.replace:
path: "{{ terraform_dir }}/variables.tf"
regexp: '(variable "default_template_name"[^}]+default\s*=\s*)"[^"]+"'
replace: '\1"{{ new_template_name }}"'
when: template_name_changed
- name: Display update result
ansible.builtin.debug:
msg: "Updated terraform/variables.tf with new template name: {{ new_template_name }}"
when: template_name_changed

View File

@@ -0,0 +1,98 @@
---
# Provision OpenBao AppRole credentials to a host
#
# Usage: ansible-playbook ansible/playbooks/provision-approle.yml -l <hostname>
# Requires: BAO_ADDR and BAO_TOKEN environment variables set
#
# IMPORTANT: This playbook must target exactly one host to prevent
# accidentally regenerating credentials for multiple hosts.
- name: Validate single host target
hosts: all
gather_facts: false
tasks:
- name: Fail if targeting multiple hosts
ansible.builtin.fail:
msg: |
This playbook must target exactly one host.
Use: ansible-playbook provision-approle.yml -l <hostname>
Targeting multiple hosts would regenerate credentials for all of them,
potentially breaking existing services.
when: ansible_play_hosts | length != 1
run_once: true
- name: Fetch AppRole credentials from OpenBao
hosts: localhost
connection: local
gather_facts: false
vars:
target_host: "{{ groups['all'] | first }}"
target_hostname: "{{ hostvars[target_host]['short_hostname'] | default(target_host.split('.')[0]) }}"
tasks:
- name: Display target host
ansible.builtin.debug:
msg: "Provisioning AppRole credentials for: {{ target_hostname }}"
- name: Get role-id for host
ansible.builtin.command:
cmd: "bao read -field=role_id auth/approle/role/{{ target_hostname }}/role-id"
environment:
BAO_ADDR: "{{ vault_addr }}"
BAO_SKIP_VERIFY: "1"
register: role_id_result
changed_when: false
- name: Generate secret-id for host
ansible.builtin.command:
cmd: "bao write -field=secret_id -f auth/approle/role/{{ target_hostname }}/secret-id"
environment:
BAO_ADDR: "{{ vault_addr }}"
BAO_SKIP_VERIFY: "1"
register: secret_id_result
changed_when: true
- name: Store credentials for next play
ansible.builtin.set_fact:
vault_role_id: "{{ role_id_result.stdout }}"
vault_secret_id: "{{ secret_id_result.stdout }}"
- name: Deploy AppRole credentials to host
hosts: all
gather_facts: false
vars:
vault_role_id: "{{ hostvars['localhost']['vault_role_id'] }}"
vault_secret_id: "{{ hostvars['localhost']['vault_secret_id'] }}"
tasks:
- name: Create AppRole directory
ansible.builtin.file:
path: /var/lib/vault/approle
state: directory
mode: "0700"
owner: root
group: root
- name: Write role-id
ansible.builtin.copy:
content: "{{ vault_role_id }}"
dest: /var/lib/vault/approle/role-id
mode: "0600"
owner: root
group: root
- name: Write secret-id
ansible.builtin.copy:
content: "{{ vault_secret_id }}"
dest: /var/lib/vault/approle/secret-id
mode: "0600"
owner: root
group: root
- name: Display success
ansible.builtin.debug:
msg: "AppRole credentials provisioned to {{ inventory_hostname }}"

View File

@@ -0,0 +1,40 @@
---
# Restart a systemd service on target hosts
#
# Usage examples:
# # Restart unbound on all DNS servers
# ansible-playbook restart-service.yml -l role_dns -e service=unbound
#
# # Restart nginx on a specific host
# ansible-playbook restart-service.yml -l http-proxy.home.2rjus.net -e service=nginx
#
# # Restart promtail on all prod hosts
# ansible-playbook restart-service.yml -l tier_prod -e service=promtail
- name: Restart systemd service
hosts: all
gather_facts: false
tasks:
- name: Validate service name provided
ansible.builtin.fail:
msg: |
The 'service' variable is required.
Usage: ansible-playbook restart-service.yml -l <target> -e service=<name>
Examples:
-e service=nginx
-e service=unbound
-e service=promtail
when: service is not defined
run_once: true
- name: Restart {{ service }}
ansible.builtin.systemd:
name: "{{ service }}"
state: restarted
register: restart_result
- name: Display result
ansible.builtin.debug:
msg: "Service {{ service }} restarted on {{ inventory_hostname }}"

View File

@@ -0,0 +1,9 @@
---
- name: Trigger nixos-upgrade job on all hosts
hosts: all
remote_user: root
tasks:
- ansible.builtin.systemd_service:
name: nixos-upgrade.service
state: started