4.8 KiB
Bare Metal Forgejo Actions Runner on nix-cache02
Goal
Add a second Forgejo Actions runner instance on nix-cache02 that executes jobs directly on the host (bare metal). This allows CI builds to populate the nix binary cache automatically, reducing reliance on manually triggered builds before deployments.
Motivation
Currently the workflow for updating a flake input (e.g. nixos-exporter) is:
- Update flake lock
- Push to master
- Manually trigger a build on nix-cache02 (or wait for the scheduled builder)
- Deploy to hosts
With a bare metal runner, repos like nixos-exporter can have CI workflows that run nix build, and those derivations automatically end up in the cache (served by harmonia). By the time hosts auto-upgrade, everything is already cached.
Design
Two Runner Instances
- actions1 (existing) — Container-based, available to all Forgejo repos. Unchanged.
- actions2 (new) — Host-based, restricted to trusted repos only via Forgejo runner scoping.
Trusted Repos
Repos that should be allowed to use the bare metal runner:
torjus/nixos-serverstorjus/nixos-exportertorjus/nixos(gunter/magicman configs)- Other repos with nix builds that benefit from cache population (add as needed)
Restriction is configured in the Forgejo web UI when registering the runner — scope it to specific repos or the org.
Label Configuration
The new instance would use a host label:
labels = [ "native:host" ];
Workflow files in trusted repos would target this with runs-on: native.
Host Packages
The runner needs nix and basic tools available:
hostPackages = with pkgs; [
bash
coreutils
curl
gawk
gitMinimal
gnused
nodejs
wget
nix
];
Security Analysis
What the runner CAN access
- Nix store — Can read and write derivations. This is the whole point; harmonia serves the store to all hosts.
- Network — Full network access during job execution.
- World-readable files — Standard for any process on the system.
What the runner CANNOT access
- Cache signing key —
/run/secrets/cache-secretis mode0400root-owned. Harmonia signs derivations on serve, not on store write. - Vault AppRole credentials —
/var/lib/vault/approle/is root-owned. - Other vault secrets — All in
/run/secrets/with restrictive permissions.
Mitigations
- Trusted repos only — Forgejo runner scoping restricts which repos can submit jobs. Only repos we control should have access.
- DynamicUser — The runner uses systemd DynamicUser, so no persistent user account. Each invocation gets an ephemeral UID.
- Separate instance — Container-based jobs (untrusted repos) remain on actions1 and never get host access.
Accepted Risks
- A compromised trusted repo could inject bad derivations into the nix store/cache. This is an accepted risk since those repos already have deploy access to production hosts.
- Jobs can consume host resources (CPU, memory, disk). The
runner.capacitysetting limits concurrent jobs.
Implementation
1. NixOS Configuration
File: hosts/nix-cache02/actions-runner.nix
Add a second instance alongside the existing overrides:
{ pkgs, ... }:
{
# ... existing actions1 overrides ...
services.gitea-actions-runner.instances.actions2 = {
enable = true;
name = "nix-cache02-native";
url = "https://code.t-juice.club";
tokenFile = "/run/secrets/forgejo-runner-token-native";
labels = [ "native:host" ];
hostPackages = with pkgs; [
bash coreutils curl gawk gitMinimal gnused nodejs wget nix
];
settings = {
runner.capacity = 4;
cache = {
enabled = true;
dir = "/var/lib/gitea-runner/actions2/cache";
};
};
};
}
2. Vault Secret
The native runner needs its own registration token (separate from actions1):
- Add
hosts/nix-cache02/forgejo-runner-token-nativetoterraform/vault/secrets.tf - Add
forgejo_runner_token_nativevariable toterraform/vault/variables.tf - Add vault secret config in
actions-runner.nixpointing to the new path
3. Forgejo Setup
- Generate a new runner token in Forgejo, scoped to trusted repos only
- Store in Vault:
bao kv put secret/hosts/nix-cache02/forgejo-runner-token-native token=<token> - Set the tfvar and run
tofu applyinterraform/vault/
4. Example Workflow
In a trusted repo (e.g. nixos-exporter):
name: Build
on: [push]
jobs:
build:
runs-on: native
steps:
- uses: actions/checkout@v4
- run: nix build
Open Questions
- Should
hostPackagesinclude additional tools (e.g.cachix,nix-prefetch-*)? - Should we set resource limits on the runner (systemd MemoryMax, CPUQuota)?
- Do we want a separate capacity for the native runner vs container runner, or is 4 fine for both?