# Bare Metal Forgejo Actions Runner on nix-cache02 ## Goal Add a second Forgejo Actions runner instance on nix-cache02 that executes jobs directly on the host (bare metal). This allows CI builds to populate the nix binary cache automatically, reducing reliance on manually triggered builds before deployments. ## Motivation Currently the workflow for updating a flake input (e.g. nixos-exporter) is: 1. Update flake lock 2. Push to master 3. Manually trigger a build on nix-cache02 (or wait for the scheduled builder) 4. Deploy to hosts With a bare metal runner, repos like nixos-exporter can have CI workflows that run `nix build`, and those derivations automatically end up in the cache (served by harmonia). By the time hosts auto-upgrade, everything is already cached. ## Design ### Two Runner Instances - **actions1** (existing) — Container-based, available to all Forgejo repos. Unchanged. - **actions2** (new) — Host-based, restricted to trusted repos only via Forgejo runner scoping. ### Trusted Repos Repos that should be allowed to use the bare metal runner: - `torjus/nixos-servers` - `torjus/nixos-exporter` - `torjus/nixos` (gunter/magicman configs) - Other repos with nix builds that benefit from cache population (add as needed) Restriction is configured in the Forgejo web UI when registering the runner — scope it to specific repos or the org. ### Label Configuration The new instance would use a host label: ```nix labels = [ "native:host" ]; ``` Workflow files in trusted repos would target this with `runs-on: native`. ### Host Packages The runner needs nix and basic tools available: ```nix hostPackages = with pkgs; [ bash coreutils curl gawk gitMinimal gnused nodejs wget nix ]; ``` ## Security Analysis ### What the runner CAN access - **Nix store** — Can read and write derivations. This is the whole point; harmonia serves the store to all hosts. - **Network** — Full network access during job execution. - **World-readable files** — Standard for any process on the system. ### What the runner CANNOT access - **Cache signing key** — `/run/secrets/cache-secret` is mode `0400` root-owned. Harmonia signs derivations on serve, not on store write. - **Vault AppRole credentials** — `/var/lib/vault/approle/` is root-owned. - **Other vault secrets** — All in `/run/secrets/` with restrictive permissions. ### Mitigations - **Trusted repos only** — Forgejo runner scoping restricts which repos can submit jobs. Only repos we control should have access. - **DynamicUser** — The runner uses systemd DynamicUser, so no persistent user account. Each invocation gets an ephemeral UID. - **Separate instance** — Container-based jobs (untrusted repos) remain on actions1 and never get host access. ### Accepted Risks - A compromised trusted repo could inject bad derivations into the nix store/cache. This is an accepted risk since those repos already have deploy access to production hosts. - Jobs can consume host resources (CPU, memory, disk). The `runner.capacity` setting limits concurrent jobs. ## Implementation ### 1. NixOS Configuration **File:** `hosts/nix-cache02/actions-runner.nix` Add a second instance alongside the existing overrides: ```nix { pkgs, ... }: { # ... existing actions1 overrides ... services.gitea-actions-runner.instances.actions2 = { enable = true; name = "nix-cache02-native"; url = "https://code.t-juice.club"; tokenFile = "/run/secrets/forgejo-runner-token-native"; labels = [ "native:host" ]; hostPackages = with pkgs; [ bash coreutils curl gawk gitMinimal gnused nodejs wget nix ]; settings = { runner.capacity = 4; cache = { enabled = true; dir = "/var/lib/gitea-runner/actions2/cache"; }; }; }; } ``` ### 2. Vault Secret The native runner needs its own registration token (separate from actions1): - Add `hosts/nix-cache02/forgejo-runner-token-native` to `terraform/vault/secrets.tf` - Add `forgejo_runner_token_native` variable to `terraform/vault/variables.tf` - Add vault secret config in `actions-runner.nix` pointing to the new path ### 3. Forgejo Setup 1. Generate a new runner token in Forgejo, scoped to trusted repos only 2. Store in Vault: `bao kv put secret/hosts/nix-cache02/forgejo-runner-token-native token=` 3. Set the tfvar and run `tofu apply` in `terraform/vault/` ### 4. Example Workflow In a trusted repo (e.g. nixos-exporter): ```yaml name: Build on: [push] jobs: build: runs-on: native steps: - uses: actions/checkout@v4 - run: nix build ``` ## Open Questions - Should `hostPackages` include additional tools (e.g. `cachix`, `nix-prefetch-*`)? - Should we set resource limits on the runner (systemd MemoryMax, CPUQuota)? - Do we want a separate capacity for the native runner vs container runner, or is 4 fine for both?