# Native Nix Forgejo Runner on nix-cache02 ## Goal Add a second Forgejo Actions runner instance on nix-cache02 that executes jobs directly on the host (no containers). This allows CI builds to populate the nix binary cache automatically, reducing reliance on manually triggered builds before deployments. ## Motivation - **Nix store caching**: The container-based `nix` label runs in ephemeral Podman containers, losing all nix store paths between jobs. Native execution uses the host's persistent store, so builds reuse cached paths automatically. - **Binary cache integration**: nix-cache02 *is* the binary cache server (Harmonia). Paths built by CI are immediately available to all hosts. - **Faster deploy cycle**: Currently updating a flake input (e.g. nixos-exporter) requires pushing to master, then waiting for the scheduled builder or manually triggering a build. With a native runner, repos can have CI workflows that run `nix build`, and those derivations are in the cache by the time hosts auto-upgrade. - **NixOS config builds**: Enables future workflows that build `nixosConfigurations.*` from this repo, populating the cache as a side effect of CI. ## Design ### Two Runner Instances - **actions1** (existing) — Container-based, global runner available to all Forgejo repos. Unchanged. - **actions-native** (new) — Host-based, registered as a user-level runner under the `torjus` Forgejo account, so only repos owned by that user can target it. ### Trusted Repos Repos that should be allowed to use the native runner: - `torjus/nixos-servers` - `torjus/nixos-exporter` - `torjus/nixos` (gunter/magicman configs) - Other repos with nix builds that benefit from cache population (add as needed) Restriction is configured in the Forgejo web UI when registering the runner — scope it to the user or specific repos. ### Label Configuration ```nix labels = [ "native-nix:host" ]; ``` Workflow files in trusted repos target this with `runs-on: native-nix`. ### Host Packages The runner needs nix and basic tools available on the host: ```nix hostPackages = with pkgs; [ bash coreutils curl gawk git gnused nodejs wget nix ]; ``` ## Security Analysis ### What the runner CAN access - **Nix store** — Can read and write derivations. This is the whole point; harmonia serves the store to all hosts. - **Network** — Full network access during job execution. - **World-readable files** — Standard for any process on the system. ### What the runner CANNOT access - **Cache signing key** — `/run/secrets/cache-secret` is mode `0400` root-owned. Harmonia signs derivations on serve, not on store write. - **Vault AppRole credentials** — `/var/lib/vault/approle/` is root-owned. - **Other vault secrets** — All in `/run/secrets/` with restrictive permissions. ### Mitigations - **User-level runner** — Registered to the `torjus` user on Forgejo (not global), so only repos owned by that user can submit jobs. - **DynamicUser** — The runner uses systemd DynamicUser, so no persistent user account. Each invocation gets an ephemeral UID. - **Nix sandbox** — Nix builds already run sandboxed by default. Non-nix `run:` steps execute as the runner's system user but have no special privileges. - **Separate instance** — Container-based jobs (untrusted repos) remain on actions1 and never get host access. ### Accepted Risks - A compromised trusted repo could inject bad derivations into the nix store/cache. This is an accepted risk since those repos already have deploy access to production hosts. - Jobs can consume host resources (CPU, memory, disk). The `runner.capacity` setting limits concurrent jobs. ## Implementation ### 1. Register runner on Forgejo and store token in Vault - In Forgejo web UI: go to user settings > Actions > Runners, create a new runner registration token. - Store the token in Vault via Terraform. **terraform/vault/variables.tf** — add variable: ```hcl variable "forgejo_native_runner_token" { description = "Forgejo Actions runner token for native nix runner on nix-cache02" type = string default = "PLACEHOLDER" sensitive = true } ``` **terraform/vault/secrets.tf** — add secret: ```hcl "hosts/nix-cache02/forgejo-native-runner-token" = { auto_generate = false data = { token = var.forgejo_native_runner_token } } ``` ### 2. Add NixOS configuration for native runner instance Note: nix-cache02 already has an AppRole with access to `secret/data/hosts/nix-cache02/*` (defined in `terraform/vault/hosts-generated.tf`), so no approle changes are needed. **File:** `hosts/nix-cache02/actions-runner.nix` Add vault secret and runner instance alongside the existing overrides: ```nix # Fetch native runner token from Vault vault.secrets.forgejo-native-runner-token = { secretPath = "hosts/nix-cache02/forgejo-native-runner-token"; extractKey = "token"; mode = "0444"; services = [ "gitea-runner-actions-native" ]; }; # Native nix runner instance services.gitea-actions-runner.instances.actions-native = { enable = true; name = "${config.networking.hostName}-native"; url = "https://code.t-juice.club"; tokenFile = "/run/secrets/forgejo-native-runner-token"; labels = [ "native-nix:host" ]; hostPackages = with pkgs; [ bash coreutils curl gawk git gnused nodejs wget nix ]; settings = { runner.capacity = 4; cache = { enabled = true; dir = "/var/lib/gitea-runner/actions-native/cache"; }; }; }; ``` ### 3. Build and deploy 1. Create feature branch 2. Apply Terraform changes (variables + secrets + approle policy) 3. Set the actual token value in `terraform.tfvars` 4. Run `tofu apply` in `terraform/vault/` 5. Build the NixOS configuration: `nix build .#nixosConfigurations.nix-cache02.config.system.build.toplevel` 6. Deploy to nix-cache02 7. Verify the native runner appears as online in Forgejo UI ### 4. Test with a workflow In a trusted repo (e.g. nixos-exporter): ```yaml name: Build on: [push] jobs: build: runs-on: native-nix steps: - uses: actions/checkout@v4 - run: nix build ``` ## Future Work - **NixOS config CI**: Workflow that builds all `nixosConfigurations` on push to master, populating the binary cache. - **Nix store GC policy**: CI builds will accumulate store paths. Since this host is the binary cache, GC needs to be conservative — only delete paths not referenced by current system configurations. Defer to a follow-up. - **Resource limits**: Consider systemd MemoryMax/CPUQuota on the native runner if resource contention becomes an issue. - **Additional host packages**: Evaluate whether tools like `cachix` or `nix-prefetch-*` should be added. ## Open Questions - Should `hostPackages` include additional tools beyond the basics listed above? - Do we want a separate capacity for the native runner vs container runner, or is 4 fine for both?