Files
nixos-servers/docs/plans/bare-metal-actions-runner.md

4.8 KiB

Bare Metal Forgejo Actions Runner on nix-cache02

Goal

Add a second Forgejo Actions runner instance on nix-cache02 that executes jobs directly on the host (bare metal). This allows CI builds to populate the nix binary cache automatically, reducing reliance on manually triggered builds before deployments.

Motivation

Currently the workflow for updating a flake input (e.g. nixos-exporter) is:

  1. Update flake lock
  2. Push to master
  3. Manually trigger a build on nix-cache02 (or wait for the scheduled builder)
  4. Deploy to hosts

With a bare metal runner, repos like nixos-exporter can have CI workflows that run nix build, and those derivations automatically end up in the cache (served by harmonia). By the time hosts auto-upgrade, everything is already cached.

Design

Two Runner Instances

  • actions1 (existing) — Container-based, available to all Forgejo repos. Unchanged.
  • actions2 (new) — Host-based, restricted to trusted repos only via Forgejo runner scoping.

Trusted Repos

Repos that should be allowed to use the bare metal runner:

  • torjus/nixos-servers
  • torjus/nixos-exporter
  • torjus/nixos (gunter/magicman configs)
  • Other repos with nix builds that benefit from cache population (add as needed)

Restriction is configured in the Forgejo web UI when registering the runner — scope it to specific repos or the org.

Label Configuration

The new instance would use a host label:

labels = [ "native:host" ];

Workflow files in trusted repos would target this with runs-on: native.

Host Packages

The runner needs nix and basic tools available:

hostPackages = with pkgs; [
  bash
  coreutils
  curl
  gawk
  gitMinimal
  gnused
  nodejs
  wget
  nix
];

Security Analysis

What the runner CAN access

  • Nix store — Can read and write derivations. This is the whole point; harmonia serves the store to all hosts.
  • Network — Full network access during job execution.
  • World-readable files — Standard for any process on the system.

What the runner CANNOT access

  • Cache signing key/run/secrets/cache-secret is mode 0400 root-owned. Harmonia signs derivations on serve, not on store write.
  • Vault AppRole credentials/var/lib/vault/approle/ is root-owned.
  • Other vault secrets — All in /run/secrets/ with restrictive permissions.

Mitigations

  • Trusted repos only — Forgejo runner scoping restricts which repos can submit jobs. Only repos we control should have access.
  • DynamicUser — The runner uses systemd DynamicUser, so no persistent user account. Each invocation gets an ephemeral UID.
  • Separate instance — Container-based jobs (untrusted repos) remain on actions1 and never get host access.

Accepted Risks

  • A compromised trusted repo could inject bad derivations into the nix store/cache. This is an accepted risk since those repos already have deploy access to production hosts.
  • Jobs can consume host resources (CPU, memory, disk). The runner.capacity setting limits concurrent jobs.

Implementation

1. NixOS Configuration

File: hosts/nix-cache02/actions-runner.nix

Add a second instance alongside the existing overrides:

{ pkgs, ... }:
{
  # ... existing actions1 overrides ...

  services.gitea-actions-runner.instances.actions2 = {
    enable = true;
    name = "nix-cache02-native";
    url = "https://code.t-juice.club";
    tokenFile = "/run/secrets/forgejo-runner-token-native";
    labels = [ "native:host" ];
    hostPackages = with pkgs; [
      bash coreutils curl gawk gitMinimal gnused nodejs wget nix
    ];
    settings = {
      runner.capacity = 4;
      cache = {
        enabled = true;
        dir = "/var/lib/gitea-runner/actions2/cache";
      };
    };
  };
}

2. Vault Secret

The native runner needs its own registration token (separate from actions1):

  • Add hosts/nix-cache02/forgejo-runner-token-native to terraform/vault/secrets.tf
  • Add forgejo_runner_token_native variable to terraform/vault/variables.tf
  • Add vault secret config in actions-runner.nix pointing to the new path

3. Forgejo Setup

  1. Generate a new runner token in Forgejo, scoped to trusted repos only
  2. Store in Vault: bao kv put secret/hosts/nix-cache02/forgejo-runner-token-native token=<token>
  3. Set the tfvar and run tofu apply in terraform/vault/

4. Example Workflow

In a trusted repo (e.g. nixos-exporter):

name: Build
on: [push]
jobs:
  build:
    runs-on: native
    steps:
      - uses: actions/checkout@v4
      - run: nix build

Open Questions

  • Should hostPackages include additional tools (e.g. cachix, nix-prefetch-*)?
  • Should we set resource limits on the runner (systemd MemoryMax, CPUQuota)?
  • Do we want a separate capacity for the native runner vs container runner, or is 4 fine for both?