Skip to main content

Enabling Plan Runner Groups

Alpha

Plan Runner Groups are an alpha feature with requirements on cluster nodes for running OCI based actions. While many default Kubernetes clusters work without additional intervention, some may require intervention to enable user namespaces on the relevant nodes for supporting OCI based actions in Plans. If you'd like more information, please reach out on our community Slack or at info@signadot.com.

Overview

Plan Runner Groups (PRGs) are turned on per cluster, from the Signadot dashboard. There's nothing else to configure: the runner ships with sensible defaults, and Signadot adapts the underlying deployment to your environment as needed.

Enable from the dashboard

Open the Managed Runners page in your Signadot dashboard and find your cluster in the table. The Plan Runners column tells you where things stand:

  • N/N runner ready means the Plan Runner Group is up and serving plans on that cluster.
  • Unsupported operator means the Signadot Operator on that cluster is older than the minimum version that supports Plan Runner Groups. Upgrade the operator and the status will clear on its own.
Managed Runners page showing per-cluster Plan Runners status
Playground Clusters

Plan Runner Groups aren't available on Playground Clusters. You'll need a Kubernetes cluster with the Signadot Operator installed.

The Plan Runner Actions panel on the same page lets you choose which actions are available to plans running on your clusters.

If the operator is current but the cluster still won't go ready, it's almost always a node-level prerequisite. See Node requirement below.

Node requirement

Many current Kubernetes node bases satisfy the PRG node prerequisite out of the box. A few require a one-line node-level adjustment (sysctl, AMI userdata, or LSM policy update); those environments are listed under Setting up support for OCI based actions.

The requirement itself is that worker nodes permit unprivileged user-namespace creation by non-root, no-capability processes. Plan steps run inside a per-step rootless sandbox on the runner pod, which depends on this. If the prerequisite is not met, OCI-based plan steps fail at start with a clear "operation not permitted" error rather than degrading silently.

Setting up support for OCI based actions

The simplest path is to enable the PRG and run a plan with an OCI-based action. If the node pool meets the prerequisite, the step runs as expected; if not, the step fails at start with an "operation not permitted" error. The sub-sections below cover the environments where this is known to occur and the matching one-line fix. To verify a node without running a real plan --- for example, when the cluster admin wants to confirm a fix landed --- see How to verify a node with a probe pod at the end.

EKS with Bottlerocket

Bottlerocket nodes default user.max_user_namespaces to 0, which prevents unprivileged user-namespace creation. Raise the limit by adding the following to the launch template userdata for the node group running the PRG:

[settings.kernel.sysctl]
'user.max_user_namespaces' = '15000'

The quotes around the dotted key are required. Without them, Bottlerocket's settings API treats user.max_user_namespaces as a nested table and rejects the configuration with invalid type: map, expected a string.

OpenShift

OpenShift clusters typically need two adjustments. Both are familiar OpenShift node-customization steps and can be rolled out via the standard MachineConfig workflow:

  • Raise user.max_user_namespaces on worker nodes (some RHCOS releases ship with it set to 0).
  • Ensure the SELinux policy for container_t includes the userns_create permission. The cleanest path is to update container-selinux to ≥ 2.232.0; alternatively, ship a local .te policy module that grants userns_create to container_t.

Local development (Minikube, kind, Docker Desktop)

Most local Kubernetes setups satisfy the prerequisite without intervention:

  • Minikube --driver=kvm2 or --driver=virtualbox: works out of the box.
  • Minikube --driver=docker on macOS, or Docker Desktop's built-in Kubernetes on macOS: works out of the box in our testing.
  • kind: inherits the host kernel; behaves like Minikube --driver=docker on the same host.

The one common exception is using the Docker driver (Minikube or kind) on a recent Ubuntu host (23.10+), where AppArmor restricts unprivileged user-namespace creation. If you hit the "operation not permitted" error in that case, run on the host:

sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0

(Local clusters all inherit the kernel of the host they run on, so the host's user-namespace policy is what determines pass or fail.)

Considerations

User namespace support is a standard primitive on modern Linux kernels and a foundation of rootless container security --- it lets containers run as unprivileged processes without relying on extra capabilities. Some hardened node images disable it by default under a "disabled-if-not-used" posture; enabling it for the PRG node pool is a security-positive change, not an expansion of attack surface. It is also where Kubernetes itself is headed: pod-level user namespaces (hostUsers: false) graduated to GA in Kubernetes 1.331 and rely on the same node-level prerequisite.

How to verify a node with a probe pod

This section is for cluster admins who want to confirm the node-level prerequisite white-box, without enabling a PRG or running a plan. The probe below mirrors the current PRG pod template --- a non-root container with seccompProfile: Unconfined --- and attempts the same unshare(CLONE_NEWUSER) the plan-runner does. On a node that meets the prerequisite it prints OK; if any kernel- or LSM-level gate denies, it prints FAILED.

Save as prg-probe.yaml:

apiVersion: v1
kind: Pod
metadata:
name: prg-probe
spec:
restartPolicy: Never
securityContext:
runAsNonRoot: true
runAsUser: 3377
runAsGroup: 3377
containers:
- name: probe
image: alpine:3.19
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
seccompProfile:
type: Unconfined
command: ["sh", "-c"]
args: ["unshare -U echo OK || echo FAILED"]

Run it and read the output:

kubectl apply -f prg-probe.yaml
kubectl wait --for=jsonpath='{.status.phase}'=Succeeded \
pod/prg-probe --timeout=60s
kubectl logs prg-probe
kubectl delete pod prg-probe

OK means the node pool supports OCI-based plan steps under the current PRG deployment. FAILED means one of the kernel- or LSM-level gates is denying --- see Setting up support for OCI based actions for the matching one-line fix. If your environment isn't listed there or the fix doesn't apply, send us the probe output and we will pinpoint which gate is involved.

Footnotes

  1. Pod-level user namespaces (hostUsers: false) graduated to GA in Kubernetes 1.33. Clusters whose nodes do not permit unprivileged user-namespace creation cannot make use of this feature.