improve-bootstrap-visibility #29

Merged
torjus merged 11 commits from improve-bootstrap-visibility into master 2026-02-07 15:00:09 +00:00
Owner

Summary

This branch improves visibility into the VM bootstrap process, making it possible to monitor bootstrap status without console access.

Bootstrap Logging to Loki

  • Added log_to_loki function that sends structured log entries directly to Loki via curl during bootstrap
  • Logs include labels: job="bootstrap", host, branch, stage
  • Bootstrap stages tracked: starting, network_ok, vault_ok/vault_skip/vault_warn, building, success, failed
  • Fails silently if Loki is unreachable (won't break bootstrap)

TTY Visibility

  • Bootstrap output now displays on tty1 via journal+console
  • Getty greeting updated to indicate bootstrap image status
  • Added ncurses for clear command availability

Infrastructure Updates

  • Ansible playbook auto-updates terraform template name after deploy
  • Template updated to nixos-25.11.20260203.e576e3c

CLAUDE.md Improvements

  • Document bootstrap log queries in Loki
  • Prefer nix develop -c <command> for devshell tools
  • Use tofu -chdir= instead of cd for OpenTofu commands

Query Examples

{job="bootstrap"}                      # All bootstrap logs
{job="bootstrap", host="testvm01"}     # Specific host
{job="bootstrap", stage="failed"}      # All failures
## Summary This branch improves visibility into the VM bootstrap process, making it possible to monitor bootstrap status without console access. ### Bootstrap Logging to Loki - Added `log_to_loki` function that sends structured log entries directly to Loki via curl during bootstrap - Logs include labels: `job="bootstrap"`, `host`, `branch`, `stage` - Bootstrap stages tracked: `starting`, `network_ok`, `vault_ok`/`vault_skip`/`vault_warn`, `building`, `success`, `failed` - Fails silently if Loki is unreachable (won't break bootstrap) ### TTY Visibility - Bootstrap output now displays on tty1 via `journal+console` - Getty greeting updated to indicate bootstrap image status - Added ncurses for `clear` command availability ### Infrastructure Updates - Ansible playbook auto-updates terraform template name after deploy - Template updated to nixos-25.11.20260203.e576e3c ### CLAUDE.md Improvements - Document bootstrap log queries in Loki - Prefer `nix develop -c <command>` for devshell tools - Use `tofu -chdir=` instead of `cd` for OpenTofu commands ### Query Examples ``` {job="bootstrap"} # All bootstrap logs {job="bootstrap", host="testvm01"} # Specific host {job="bootstrap", stage="failed"} # All failures ```
torjus added 11 commits 2026-02-07 15:00:03 +00:00
- Display bootstrap banner and live progress on tty1 instead of login prompt
- Add custom getty greeting on other ttys indicating this is a bootstrap image
- Disable getty on tty1 during bootstrap so output is visible

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a third play to build-and-deploy-template.yml that updates
terraform/variables.tf with the new template name after deploying
to Proxmox. Only updates if the template name has changed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
terraform: update template to nixos-25.11.20260203.e576e3c
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
0cf72ec191
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
template2: add ncurses for clear command in bootstrap
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
78e8d7a600
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
terraform: add flake_branch and token for testvm01
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
4ca3c8890f
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
template2: revert to journal+console output for bootstrap
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
11261c4636
TTY output was causing nixos-rebuild to fail. Keep the custom
greeting line to indicate bootstrap image, but use journal+console
for reliable logging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds log_to_loki function that pushes structured log entries to Loki
at key bootstrap stages (starting, network_ok, vault_*, building,
success, failed). Enables querying bootstrap state via LogQL without
console access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CLAUDE.md: prefer nix develop -c for devshell commands
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
a90d9c33d5
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CLAUDE.md: use tofu -chdir instead of cd
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
f19ba2f4b6
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CLAUDE.md: document bootstrap logs in Loki
Some checks failed
Run nix flake check / flake-check (push) Failing after 1s
Run nix flake check / flake-check (pull_request) Failing after 4s
eea000b337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
torjus merged commit 13d6d0ea3a into master 2026-02-07 15:00:09 +00:00
torjus deleted branch improve-bootstrap-visibility 2026-02-07 15:00:10 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: torjus/nixos-servers#29