Files
nixos-servers/docs/plans/completed/native-forgejo-runner.md
2026-03-12 23:25:01 +01:00

6.7 KiB

Native Nix Forgejo Runner on nix-cache02

Goal

Add a second Forgejo Actions runner instance on nix-cache02 that executes jobs directly on the host (no containers). This allows CI builds to populate the nix binary cache automatically, reducing reliance on manually triggered builds before deployments.

Motivation

  • Nix store caching: The container-based nix label runs in ephemeral Podman containers, losing all nix store paths between jobs. Native execution uses the host's persistent store, so builds reuse cached paths automatically.
  • Binary cache integration: nix-cache02 is the binary cache server (Harmonia). Paths built by CI are immediately available to all hosts.
  • Faster deploy cycle: Currently updating a flake input (e.g. nixos-exporter) requires pushing to master, then waiting for the scheduled builder or manually triggering a build. With a native runner, repos can have CI workflows that run nix build, and those derivations are in the cache by the time hosts auto-upgrade.
  • NixOS config builds: Enables future workflows that build nixosConfigurations.* from this repo, populating the cache as a side effect of CI.

Design

Two Runner Instances

  • actions1 (existing) — Container-based, global runner available to all Forgejo repos. Unchanged.
  • actions-native (new) — Host-based, registered as a user-level runner under the torjus Forgejo account, so only repos owned by that user can target it.

Trusted Repos

Repos that should be allowed to use the native runner:

  • torjus/nixos-servers
  • torjus/nixos-exporter
  • torjus/nixos (gunter/magicman configs)
  • Other repos with nix builds that benefit from cache population (add as needed)

Restriction is configured in the Forgejo web UI when registering the runner — scope it to the user or specific repos.

Label Configuration

labels = [ "native-nix:host" ];

Workflow files in trusted repos target this with runs-on: native-nix.

Host Packages

The runner needs nix and basic tools available on the host:

hostPackages = with pkgs; [
  bash
  coreutils
  curl
  gawk
  git
  gnused
  nodejs
  wget
  nix
];

Security Analysis

What the runner CAN access

  • Nix store — Can read and write derivations. This is the whole point; harmonia serves the store to all hosts.
  • Network — Full network access during job execution.
  • World-readable files — Standard for any process on the system.

What the runner CANNOT access

  • Cache signing key/run/secrets/cache-secret is mode 0400 root-owned. Harmonia signs derivations on serve, not on store write.
  • Vault AppRole credentials/var/lib/vault/approle/ is root-owned.
  • Other vault secrets — All in /run/secrets/ with restrictive permissions.

Mitigations

  • User-level runner — Registered to the torjus user on Forgejo (not global), so only repos owned by that user can submit jobs.
  • DynamicUser — The runner uses systemd DynamicUser, so no persistent user account. Each invocation gets an ephemeral UID.
  • Nix sandbox — Nix builds already run sandboxed by default. Non-nix run: steps execute as the runner's system user but have no special privileges.
  • Separate instance — Container-based jobs (untrusted repos) remain on actions1 and never get host access.

Accepted Risks

  • A compromised trusted repo could inject bad derivations into the nix store/cache. This is an accepted risk since those repos already have deploy access to production hosts.
  • Jobs can consume host resources (CPU, memory, disk). The runner.capacity setting limits concurrent jobs.

Implementation

1. Register runner on Forgejo and store token in Vault

  • In Forgejo web UI: go to user settings > Actions > Runners, create a new runner registration token.
  • Store the token in Vault via Terraform.

terraform/vault/variables.tf — add variable:

variable "forgejo_native_runner_token" {
  description = "Forgejo Actions runner token for native nix runner on nix-cache02"
  type        = string
  default     = "PLACEHOLDER"
  sensitive   = true
}

terraform/vault/secrets.tf — add secret:

"hosts/nix-cache02/forgejo-native-runner-token" = {
  auto_generate = false
  data          = { token = var.forgejo_native_runner_token }
}

2. Add NixOS configuration for native runner instance

Note: nix-cache02 already has an AppRole with access to secret/data/hosts/nix-cache02/* (defined in terraform/vault/hosts-generated.tf), so no approle changes are needed.

File: hosts/nix-cache02/actions-runner.nix

Add vault secret and runner instance alongside the existing overrides:

# Fetch native runner token from Vault
vault.secrets.forgejo-native-runner-token = {
  secretPath = "hosts/nix-cache02/forgejo-native-runner-token";
  extractKey = "token";
  mode = "0444";
  services = [ "gitea-runner-actions-native" ];
};

# Native nix runner instance
services.gitea-actions-runner.instances.actions-native = {
  enable = true;
  name = "${config.networking.hostName}-native";
  url = "https://code.t-juice.club";
  tokenFile = "/run/secrets/forgejo-native-runner-token";
  labels = [ "native-nix:host" ];
  hostPackages = with pkgs; [
    bash coreutils curl gawk git gnused nodejs wget nix
  ];
  settings = {
    runner.capacity = 4;
    cache = {
      enabled = true;
      dir = "/var/lib/gitea-runner/actions-native/cache";
    };
  };
};

3. Build and deploy

  1. Create feature branch
  2. Apply Terraform changes (variables + secrets + approle policy)
  3. Set the actual token value in terraform.tfvars
  4. Run tofu apply in terraform/vault/
  5. Build the NixOS configuration: nix build .#nixosConfigurations.nix-cache02.config.system.build.toplevel
  6. Deploy to nix-cache02
  7. Verify the native runner appears as online in Forgejo UI

4. Test with a workflow

In a trusted repo (e.g. nixos-exporter):

name: Build
on: [push]
jobs:
  build:
    runs-on: native-nix
    steps:
      - uses: actions/checkout@v4
      - run: nix build

Future Work

  • NixOS config CI: Workflow that builds all nixosConfigurations on push to master, populating the binary cache.
  • Nix store GC policy: CI builds will accumulate store paths. Since this host is the binary cache, GC needs to be conservative — only delete paths not referenced by current system configurations. Defer to a follow-up.
  • Resource limits: Consider systemd MemoryMax/CPUQuota on the native runner if resource contention becomes an issue.
  • Additional host packages: Evaluate whether tools like cachix or nix-prefetch-* should be added.

Open Questions

  • Should hostPackages include additional tools beyond the basics listed above?
  • Do we want a separate capacity for the native runner vs container runner, or is 4 fine for both?