Some checks failed
Run nix flake check / flake-check (push) Has been cancelled
550 lines
20 KiB
Markdown
550 lines
20 KiB
Markdown
# TODO: Automated Host Deployment Pipeline
|
|
|
|
## Vision
|
|
|
|
Automate the entire process of creating, configuring, and deploying new NixOS hosts on Proxmox from a single command or script.
|
|
|
|
**Desired workflow:**
|
|
```bash
|
|
./scripts/create-host.sh --hostname myhost --ip 10.69.13.50
|
|
# Script creates config, deploys VM, bootstraps NixOS, and you're ready to go
|
|
```
|
|
|
|
**Current manual workflow (from CLAUDE.md):**
|
|
1. Create `/hosts/<hostname>/` directory structure
|
|
2. Add host to `flake.nix`
|
|
3. Add DNS entries
|
|
4. Clone template VM manually
|
|
5. Run `prepare-host.sh` on new VM
|
|
6. Add generated age key to `.sops.yaml`
|
|
7. Configure networking
|
|
8. Commit and push
|
|
9. Run `nixos-rebuild boot --flake URL#<hostname>` on host
|
|
|
|
## The Plan
|
|
|
|
### Phase 1: Parameterized OpenTofu Deployments ✅ COMPLETED
|
|
|
|
**Status:** Fully implemented and tested
|
|
|
|
**Implementation:**
|
|
- Locals-based structure using `for_each` pattern for multiple VM deployments
|
|
- All VM parameters configurable with smart defaults (CPU, memory, disk, IP, storage, etc.)
|
|
- Automatic DHCP vs static IP detection based on `ip` field presence
|
|
- Dynamic outputs showing deployed VM IPs and specifications
|
|
- Successfully tested deploying multiple VMs simultaneously
|
|
|
|
**Tasks:**
|
|
- [x] Create module/template structure in terraform for repeatable VM deployments
|
|
- [x] Parameterize VM configuration (hostname, CPU, memory, disk, IP)
|
|
- [x] Support both DHCP and static IP configuration via cloud-init
|
|
- [x] Test deploying multiple VMs from same template
|
|
|
|
**Deliverable:** ✅ Can deploy multiple VMs with custom parameters via OpenTofu in a single `tofu apply`
|
|
|
|
**Files:**
|
|
- `terraform/vms.tf` - VM definitions using locals map
|
|
- `terraform/outputs.tf` - Dynamic outputs for all VMs
|
|
- `terraform/variables.tf` - Configurable defaults
|
|
- `terraform/README.md` - Complete documentation
|
|
|
|
---
|
|
|
|
### Phase 2: Host Configuration Generator ✅ COMPLETED
|
|
|
|
**Status:** ✅ Fully implemented and tested
|
|
**Completed:** 2025-02-01
|
|
**Enhanced:** 2025-02-01 (added --force flag)
|
|
|
|
**Goal:** Automate creation of host configuration files
|
|
|
|
**Implementation:**
|
|
- Python CLI tool packaged as Nix derivation
|
|
- Available as `create-host` command in devShell
|
|
- Rich terminal UI with configuration previews
|
|
- Comprehensive validation (hostname format/uniqueness, IP subnet/uniqueness)
|
|
- Jinja2 templates for NixOS configurations
|
|
- Automatic updates to flake.nix and terraform/vms.tf
|
|
- `--force` flag for regenerating existing configurations (useful for testing)
|
|
|
|
**Tasks:**
|
|
- [x] Create Python CLI with typer framework
|
|
- [x] Takes parameters: hostname, IP, CPU cores, memory, disk size
|
|
- [x] Generates `/hosts/<hostname>/` directory structure
|
|
- [x] Creates `configuration.nix` with proper hostname and networking
|
|
- [x] Generates `default.nix` with standard imports
|
|
- [x] References shared `hardware-configuration.nix` from template
|
|
- [x] Add host entry to `flake.nix` programmatically
|
|
- [x] Text-based manipulation (regex insertion)
|
|
- [x] Inserts new nixosConfiguration entry
|
|
- [x] Maintains proper formatting
|
|
- [x] Generate corresponding OpenTofu configuration
|
|
- [x] Adds VM definition to `terraform/vms.tf`
|
|
- [x] Uses parameters from CLI input
|
|
- [x] Supports both static IP and DHCP modes
|
|
- [x] Package as Nix derivation with templates
|
|
- [x] Add to flake packages and devShell
|
|
- [x] Implement dry-run mode
|
|
- [x] Write comprehensive README
|
|
|
|
**Usage:**
|
|
```bash
|
|
# In nix develop shell
|
|
create-host \
|
|
--hostname test01 \
|
|
--ip 10.69.13.50/24 \ # optional, omit for DHCP
|
|
--cpu 4 \ # optional, default 2
|
|
--memory 4096 \ # optional, default 2048
|
|
--disk 50G \ # optional, default 20G
|
|
--dry-run # optional preview mode
|
|
```
|
|
|
|
**Files:**
|
|
- `scripts/create-host/` - Complete Python package with Nix derivation
|
|
- `scripts/create-host/README.md` - Full documentation and examples
|
|
|
|
**Deliverable:** ✅ Tool generates all config files for a new host, validated with Nix and Terraform
|
|
|
|
---
|
|
|
|
### Phase 3: Bootstrap Mechanism ✅ COMPLETED
|
|
|
|
**Status:** ✅ Fully implemented and tested
|
|
**Completed:** 2025-02-01
|
|
**Enhanced:** 2025-02-01 (added branch support for testing)
|
|
|
|
**Goal:** Get freshly deployed VM to apply its specific host configuration
|
|
|
|
**Implementation:** Systemd oneshot service that runs on first boot after cloud-init
|
|
|
|
**Approach taken:** Systemd service (variant of Option A)
|
|
- Systemd service `nixos-bootstrap.service` runs on first boot
|
|
- Depends on `cloud-config.service` to ensure hostname is set
|
|
- Reads hostname from `hostnamectl` (set by cloud-init via Terraform)
|
|
- Supports custom git branch via `NIXOS_FLAKE_BRANCH` environment variable
|
|
- Runs `nixos-rebuild boot --flake git+https://git.t-juice.club/torjus/nixos-servers.git?ref=$BRANCH#${hostname}`
|
|
- Reboots into new configuration on success
|
|
- Fails gracefully without reboot on errors (network issues, missing config)
|
|
- Service self-destructs after successful bootstrap (not in new config)
|
|
|
|
**Tasks:**
|
|
- [x] Create bootstrap service module in template2
|
|
- [x] systemd oneshot service with proper dependencies
|
|
- [x] Reads hostname from hostnamectl (cloud-init sets it)
|
|
- [x] Checks network connectivity via HTTPS (curl)
|
|
- [x] Runs nixos-rebuild boot with flake URL
|
|
- [x] Reboots on success, fails gracefully on error
|
|
- [x] Configure cloud-init datasource
|
|
- [x] Use ConfigDrive datasource (Proxmox provider)
|
|
- [x] Add cloud-init disk to Terraform VMs (disks.ide.ide2.cloudinit)
|
|
- [x] Hostname passed via cloud-init user-data from Terraform
|
|
- [x] Test bootstrap service execution on fresh VM
|
|
- [x] Handle failure cases (flake doesn't exist, network issues)
|
|
- [x] Clear error messages in journald
|
|
- [x] No reboot on failure
|
|
- [x] System remains accessible for debugging
|
|
|
|
**Files:**
|
|
- `hosts/template2/bootstrap.nix` - Bootstrap service definition
|
|
- `hosts/template2/configuration.nix` - Cloud-init ConfigDrive datasource
|
|
- `terraform/vms.tf` - Cloud-init disk configuration
|
|
|
|
**Deliverable:** ✅ VMs automatically bootstrap and reboot into host-specific configuration on first boot
|
|
|
|
---
|
|
|
|
### Phase 4: Secrets Management with HashiCorp Vault
|
|
|
|
**Challenge:** Current sops-nix approach has chicken-and-egg problem with age keys
|
|
|
|
**Current workflow:**
|
|
1. VM boots, generates age key at `/var/lib/sops-nix/key.txt`
|
|
2. User runs `prepare-host.sh` which prints public key
|
|
3. User manually adds public key to `.sops.yaml`
|
|
4. User commits, pushes
|
|
5. VM can now decrypt secrets
|
|
|
|
**Selected approach:** Migrate to HashiCorp Vault for centralized secrets management
|
|
|
|
**Benefits:**
|
|
- Industry-standard secrets management (Vault experience transferable to work)
|
|
- Eliminates manual age key distribution step
|
|
- Secrets-as-code via OpenTofu (infrastructure-as-code aligned)
|
|
- Centralized PKI management (replaces step-ca, consolidates TLS + SSH CA)
|
|
- Automatic secret rotation capabilities
|
|
- Audit logging for all secret access
|
|
- AppRole authentication enables automated bootstrap
|
|
|
|
**Architecture:**
|
|
```
|
|
vault.home.2rjus.net
|
|
├─ KV Secrets Engine (replaces sops-nix)
|
|
├─ PKI Engine (replaces step-ca for TLS)
|
|
├─ SSH CA Engine (replaces step-ca SSH CA)
|
|
└─ AppRole Auth (per-host authentication)
|
|
↓
|
|
New hosts authenticate on first boot
|
|
Fetch secrets via Vault API
|
|
No manual key distribution needed
|
|
```
|
|
|
|
---
|
|
|
|
#### Phase 4a: Vault Server Setup
|
|
|
|
**Goal:** Deploy and configure Vault server with auto-unseal
|
|
|
|
**Tasks:**
|
|
- [ ] Create `hosts/vault01/` configuration
|
|
- [ ] Basic NixOS configuration (hostname, networking, etc.)
|
|
- [ ] Vault service configuration
|
|
- [ ] Firewall rules (8200 for API, 8201 for cluster)
|
|
- [ ] Add to flake.nix and terraform
|
|
- [ ] Implement auto-unseal mechanism
|
|
- [ ] **Preferred:** TPM-based auto-unseal if hardware supports it
|
|
- [ ] Use tpm2-tools to seal/unseal Vault keys
|
|
- [ ] Systemd service to unseal on boot
|
|
- [ ] **Fallback:** Shamir secret sharing with systemd automation
|
|
- [ ] Generate 3 keys, threshold 2
|
|
- [ ] Store 2 keys on disk (encrypted), keep 1 offline
|
|
- [ ] Systemd service auto-unseals using 2 keys
|
|
- [ ] Initial Vault setup
|
|
- [ ] Initialize Vault
|
|
- [ ] Configure storage backend (integrated raft or file)
|
|
- [ ] Set up root token management
|
|
- [ ] Enable audit logging
|
|
- [ ] Deploy to infrastructure
|
|
- [ ] Add DNS entry for vault.home.2rjus.net
|
|
- [ ] Deploy VM via terraform
|
|
- [ ] Bootstrap and verify Vault is running
|
|
|
|
**Deliverable:** Running Vault server that auto-unseals on boot
|
|
|
|
---
|
|
|
|
#### Phase 4b: Vault-as-Code with OpenTofu
|
|
|
|
**Goal:** Manage all Vault configuration (secrets structure, policies, roles) as code
|
|
|
|
**Tasks:**
|
|
- [ ] Set up Vault Terraform provider
|
|
- [ ] Create `terraform/vault/` directory
|
|
- [ ] Configure Vault provider (address, auth)
|
|
- [ ] Store Vault token securely (terraform.tfvars, gitignored)
|
|
- [ ] Enable and configure secrets engines
|
|
- [ ] Enable KV v2 secrets engine at `secret/`
|
|
- [ ] Define secret path structure (per-service, per-host)
|
|
- [ ] Example: `secret/monitoring/grafana`, `secret/postgres/ha1`
|
|
- [ ] Define policies as code
|
|
- [ ] Create policies for different service tiers
|
|
- [ ] Principle of least privilege (hosts only read their secrets)
|
|
- [ ] Example: monitoring-policy allows read on `secret/monitoring/*`
|
|
- [ ] Set up AppRole authentication
|
|
- [ ] Enable AppRole auth backend
|
|
- [ ] Create role per host type (monitoring, dns, database, etc.)
|
|
- [ ] Bind policies to roles
|
|
- [ ] Configure TTL and token policies
|
|
- [ ] Migrate existing secrets from sops-nix
|
|
- [ ] Create migration script/playbook
|
|
- [ ] Decrypt sops secrets and load into Vault KV
|
|
- [ ] Verify all secrets migrated successfully
|
|
- [ ] Keep sops as backup during transition
|
|
- [ ] Implement secrets-as-code patterns
|
|
- [ ] Secret values in gitignored terraform.tfvars
|
|
- [ ] Or use random_password for auto-generated secrets
|
|
- [ ] Secret structure/paths in version-controlled .tf files
|
|
|
|
**Example OpenTofu:**
|
|
```hcl
|
|
resource "vault_kv_secret_v2" "monitoring_grafana" {
|
|
mount = "secret"
|
|
name = "monitoring/grafana"
|
|
data_json = jsonencode({
|
|
admin_password = var.grafana_admin_password
|
|
smtp_password = var.smtp_password
|
|
})
|
|
}
|
|
|
|
resource "vault_policy" "monitoring" {
|
|
name = "monitoring-policy"
|
|
policy = <<EOT
|
|
path "secret/data/monitoring/*" {
|
|
capabilities = ["read"]
|
|
}
|
|
EOT
|
|
}
|
|
|
|
resource "vault_approle_auth_backend_role" "monitoring01" {
|
|
backend = "approle"
|
|
role_name = "monitoring01"
|
|
token_policies = ["monitoring-policy"]
|
|
}
|
|
```
|
|
|
|
**Deliverable:** All secrets and policies managed as OpenTofu code in `terraform/vault/`
|
|
|
|
---
|
|
|
|
#### Phase 4c: PKI Migration (Replace step-ca)
|
|
|
|
**Goal:** Consolidate PKI infrastructure into Vault
|
|
|
|
**Tasks:**
|
|
- [ ] Set up Vault PKI engines
|
|
- [ ] Create root CA in Vault (`pki/` mount, 10 year TTL)
|
|
- [ ] Create intermediate CA (`pki_int/` mount, 5 year TTL)
|
|
- [ ] Sign intermediate with root CA
|
|
- [ ] Configure CRL and OCSP
|
|
- [ ] Enable ACME support
|
|
- [ ] Enable ACME on intermediate CA (Vault 1.14+)
|
|
- [ ] Create PKI role for homelab domain
|
|
- [ ] Set certificate TTLs and allowed domains
|
|
- [ ] Configure SSH CA in Vault
|
|
- [ ] Enable SSH secrets engine (`ssh/` mount)
|
|
- [ ] Generate SSH signing keys
|
|
- [ ] Create roles for host and user certificates
|
|
- [ ] Configure TTLs and allowed principals
|
|
- [ ] Migrate hosts from step-ca to Vault
|
|
- [ ] Update system/acme.nix to use Vault ACME endpoint
|
|
- [ ] Change server to `https://vault.home.2rjus.net:8200/v1/pki_int/acme/directory`
|
|
- [ ] Test certificate issuance on one host
|
|
- [ ] Roll out to all hosts via auto-upgrade
|
|
- [ ] Migrate SSH CA trust
|
|
- [ ] Distribute Vault SSH CA public key to all hosts
|
|
- [ ] Update sshd_config to trust Vault CA
|
|
- [ ] Test SSH certificate authentication
|
|
- [ ] Decommission step-ca
|
|
- [ ] Verify all services migrated
|
|
- [ ] Stop step-ca service on ca host
|
|
- [ ] Archive step-ca configuration for backup
|
|
|
|
**Deliverable:** All TLS and SSH certificates issued by Vault, step-ca retired
|
|
|
|
---
|
|
|
|
#### Phase 4d: Bootstrap Integration
|
|
|
|
**Goal:** New hosts automatically authenticate to Vault on first boot, no manual steps
|
|
|
|
**Tasks:**
|
|
- [ ] Update create-host tool
|
|
- [ ] Generate AppRole role_id + secret_id for new host
|
|
- [ ] Or create wrapped token for one-time bootstrap
|
|
- [ ] Add host-specific policy to Vault (via terraform)
|
|
- [ ] Store bootstrap credentials for cloud-init injection
|
|
- [ ] Update template2 for Vault authentication
|
|
- [ ] Create Vault authentication module
|
|
- [ ] Reads bootstrap credentials from cloud-init
|
|
- [ ] Authenticates to Vault, retrieves permanent AppRole credentials
|
|
- [ ] Stores role_id + secret_id locally for services to use
|
|
- [ ] Create NixOS Vault secrets module
|
|
- [ ] Replacement for sops.secrets
|
|
- [ ] Fetches secrets from Vault at nixos-rebuild/activation time
|
|
- [ ] Or runtime secret fetching for services
|
|
- [ ] Handle Vault token renewal
|
|
- [ ] Update bootstrap service
|
|
- [ ] After authenticating to Vault, fetch any bootstrap secrets
|
|
- [ ] Run nixos-rebuild with host configuration
|
|
- [ ] Services automatically fetch their secrets from Vault
|
|
- [ ] Update terraform cloud-init
|
|
- [ ] Inject Vault address and bootstrap credentials
|
|
- [ ] Pass via cloud-init user-data or write_files
|
|
- [ ] Credentials scoped to single use or short TTL
|
|
- [ ] Test complete flow
|
|
- [ ] Run create-host to generate new host config
|
|
- [ ] Deploy with terraform
|
|
- [ ] Verify host bootstraps and authenticates to Vault
|
|
- [ ] Verify services can fetch secrets
|
|
- [ ] Confirm no manual steps required
|
|
|
|
**Bootstrap flow:**
|
|
```
|
|
1. terraform apply (deploys VM with cloud-init)
|
|
2. Cloud-init sets hostname + Vault bootstrap credentials
|
|
3. nixos-bootstrap.service runs:
|
|
- Authenticates to Vault with bootstrap credentials
|
|
- Retrieves permanent AppRole credentials
|
|
- Stores locally for service use
|
|
- Runs nixos-rebuild
|
|
4. Host services fetch secrets from Vault as needed
|
|
5. Done - no manual intervention
|
|
```
|
|
|
|
**Deliverable:** Fully automated secrets access from first boot, zero manual steps
|
|
|
|
---
|
|
|
|
### Phase 5: DNS Automation
|
|
|
|
**Goal:** Automatically generate DNS entries from host configurations
|
|
|
|
**Approach:** Leverage Nix to generate zone file entries from flake host configurations
|
|
|
|
Since most hosts use static IPs defined in their NixOS configurations, we can extract this information and automatically generate A records. This keeps DNS in sync with the actual host configs.
|
|
|
|
**Tasks:**
|
|
- [ ] Add optional CNAME field to host configurations
|
|
- [ ] Add `networking.cnames = [ "alias1" "alias2" ]` or similar option
|
|
- [ ] Document in host configuration template
|
|
- [ ] Create Nix function to extract DNS records from all hosts
|
|
- [ ] Parse each host's `networking.hostName` and IP configuration
|
|
- [ ] Collect any defined CNAMEs
|
|
- [ ] Generate zone file fragment with A and CNAME records
|
|
- [ ] Integrate auto-generated records into zone files
|
|
- [ ] Keep manual entries separate (for non-flake hosts/services)
|
|
- [ ] Include generated fragment in main zone file
|
|
- [ ] Add comments showing which records are auto-generated
|
|
- [ ] Update zone file serial number automatically
|
|
- [ ] Test zone file validity after generation
|
|
- [ ] Either:
|
|
- [ ] Automatically trigger DNS server reload (Ansible)
|
|
- [ ] Or document manual step: merge to master, run upgrade on ns1/ns2
|
|
|
|
**Deliverable:** DNS A records and CNAMEs automatically generated from host configs
|
|
|
|
---
|
|
|
|
### Phase 6: Integration Script
|
|
|
|
**Goal:** Single command to create and deploy a new host
|
|
|
|
**Tasks:**
|
|
- [ ] Create `scripts/create-host.sh` master script that orchestrates:
|
|
1. Prompts for: hostname, IP (or DHCP), CPU, memory, disk
|
|
2. Validates inputs (IP not in use, hostname unique, etc.)
|
|
3. Calls host config generator (Phase 2)
|
|
4. Generates OpenTofu config (Phase 2)
|
|
5. Handles secrets (Phase 4)
|
|
6. Updates DNS (Phase 5)
|
|
7. Commits all changes to git
|
|
8. Runs `tofu apply` to deploy VM
|
|
9. Waits for bootstrap to complete (Phase 3)
|
|
10. Prints success message with IP and SSH command
|
|
- [ ] Add `--dry-run` flag to preview changes
|
|
- [ ] Add `--interactive` mode vs `--batch` mode
|
|
- [ ] Error handling and rollback on failures
|
|
|
|
**Deliverable:** `./scripts/create-host.sh --hostname myhost --ip 10.69.13.50` creates a fully working host
|
|
|
|
---
|
|
|
|
### Phase 7: Testing & Documentation
|
|
|
|
**Status:** 🚧 In Progress (testing improvements completed)
|
|
|
|
**Testing Improvements Implemented (2025-02-01):**
|
|
|
|
The pipeline now supports efficient testing without polluting master branch:
|
|
|
|
**1. --force Flag for create-host**
|
|
- Re-run `create-host` to regenerate existing configurations
|
|
- Updates existing entries in flake.nix and terraform/vms.tf (no duplicates)
|
|
- Skip uniqueness validation checks
|
|
- Useful for iterating on configuration templates during testing
|
|
|
|
**2. Branch Support for Bootstrap**
|
|
- Bootstrap service reads `NIXOS_FLAKE_BRANCH` environment variable
|
|
- Defaults to `master` if not set
|
|
- Allows testing pipeline changes on feature branches
|
|
- Cloud-init passes branch via `/etc/environment`
|
|
|
|
**3. Cloud-init Disk for Branch Configuration**
|
|
- Terraform generates custom cloud-init snippets for test VMs
|
|
- Set `flake_branch` field in VM definition to use non-master branch
|
|
- Production VMs omit this field and use master (default)
|
|
- Files automatically uploaded to Proxmox via SSH
|
|
|
|
**Testing Workflow:**
|
|
|
|
```bash
|
|
# 1. Create test branch
|
|
git checkout -b test-pipeline
|
|
|
|
# 2. Generate or update host config
|
|
create-host --hostname testvm01 --ip 10.69.13.100/24
|
|
|
|
# 3. Edit terraform/vms.tf to add test VM with branch
|
|
# vms = {
|
|
# "testvm01" = {
|
|
# ip = "10.69.13.100/24"
|
|
# flake_branch = "test-pipeline" # Bootstrap from this branch
|
|
# }
|
|
# }
|
|
|
|
# 4. Commit and push test branch
|
|
git add -A && git commit -m "test: add testvm01"
|
|
git push origin test-pipeline
|
|
|
|
# 5. Deploy VM
|
|
cd terraform && tofu apply
|
|
|
|
# 6. Watch bootstrap (VM fetches from test-pipeline branch)
|
|
ssh root@10.69.13.100
|
|
journalctl -fu nixos-bootstrap.service
|
|
|
|
# 7. Iterate: modify templates and regenerate with --force
|
|
cd .. && create-host --hostname testvm01 --ip 10.69.13.100/24 --force
|
|
git commit -am "test: update config" && git push
|
|
|
|
# Redeploy to test fresh bootstrap
|
|
cd terraform
|
|
tofu destroy -target=proxmox_vm_qemu.vm[\"testvm01\"] && tofu apply
|
|
|
|
# 8. Clean up when done: squash commits, merge to master, remove test VM
|
|
```
|
|
|
|
**Files:**
|
|
- `scripts/create-host/create_host.py` - Added --force parameter
|
|
- `scripts/create-host/manipulators.py` - Update vs insert logic
|
|
- `hosts/template2/bootstrap.nix` - Branch support via environment variable
|
|
- `terraform/vms.tf` - flake_branch field support
|
|
- `terraform/cloud-init.tf` - Custom cloud-init disk generation
|
|
- `terraform/variables.tf` - proxmox_host variable for SSH uploads
|
|
|
|
**Remaining Tasks:**
|
|
- [ ] Test full pipeline end-to-end on feature branch
|
|
- [ ] Update CLAUDE.md with testing workflow
|
|
- [ ] Add troubleshooting section
|
|
- [ ] Create examples for common scenarios (DHCP host, static IP host, etc.)
|
|
|
|
---
|
|
|
|
## Open Questions
|
|
|
|
1. **Bootstrap method:** Cloud-init runcmd vs Terraform provisioner vs Ansible?
|
|
2. **Secrets handling:** Pre-generate keys vs post-deployment injection?
|
|
3. **DNS automation:** Auto-commit or manual merge?
|
|
4. **Git workflow:** Auto-push changes or leave for user review?
|
|
5. **Template selection:** Single template2 or multiple templates for different host types?
|
|
6. **Networking:** Always DHCP initially, or support static IP from start?
|
|
7. **Error recovery:** What happens if bootstrap fails? Manual intervention or retry?
|
|
|
|
## Implementation Order
|
|
|
|
Recommended sequence:
|
|
1. Phase 1: Parameterize OpenTofu (foundation for testing)
|
|
2. Phase 3: Bootstrap mechanism (core automation)
|
|
3. Phase 2: Config generator (automate the boilerplate)
|
|
4. Phase 4: Secrets (solves biggest chicken-and-egg)
|
|
5. Phase 5: DNS (nice-to-have automation)
|
|
6. Phase 6: Integration script (ties it all together)
|
|
7. Phase 7: Testing & docs
|
|
|
|
## Success Criteria
|
|
|
|
When complete, creating a new host should:
|
|
- Take < 5 minutes of human time
|
|
- Require minimal user input (hostname, IP, basic specs)
|
|
- Result in a fully configured, secret-enabled, DNS-registered host
|
|
- Be reproducible and documented
|
|
- Handle common errors gracefully
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
- Keep incremental commits at each phase
|
|
- Test each phase independently before moving to next
|
|
- Maintain backward compatibility with manual workflow
|
|
- Document any manual steps that can't be automated
|