Skip to content

Managed: Kubernetes backend

Open in ChatGPT ↗
Ask ChatGPT about this page
Open in Claude ↗
Ask Claude about this page
Copied!

Deploy the Oz managed worker into a Kubernetes cluster with the included Helm chart. Each agent task runs as a Kubernetes Job in your cluster.

Deploy the oz-agent-worker daemon into a Kubernetes cluster using the included Helm chart. Each agent task runs as a Kubernetes Job in your cluster. Oz orchestrates runs end to end (Slack, Linear, schedules, API, oz agent run-cloud); your cluster provides the compute, scheduling, and policy enforcement.

  • You already operate a Kubernetes cluster and want agents to run there.
  • You need Kubernetes-native scheduling, resource management, or policy enforcement.
  • You want to use Kubernetes Secrets, ServiceAccounts, and admission policies to control task behavior.

  1. The worker connects to the Kubernetes API server (using in-cluster auth by default, or an explicit kubeconfig).
  2. On startup, the worker runs a short-lived preflight Job to verify that cluster permissions, admission policies, and Pod Security Standards are compatible. If the preflight fails, the worker exits with a diagnostic error before accepting any tasks.
  3. For each assigned task, the worker creates a Kubernetes Job in the configured namespace.
  4. The worker monitors the Job and Pod status via Kubernetes Watch (with a 30-second safety-net poll for watch disconnects).
  5. After the task completes, the Job is cleaned up (unless --no-cleanup is set).

  • Enterprise plan with self-hosting enabledContact sales if self-hosting is not yet enabled for your team.
  • A Kubernetes cluster with the worker process able to reach the API server. The cluster must:
    • Allow the worker’s namespace to create Jobs with a root init container (sidecar materialization depends on this pattern).
    • Grant the worker these namespace-scoped permissions: create, get, list, watch, delete on jobs; get, list, watch on pods; get on pods/log; list on events.
  • Helm installed locally, plus kubectl authenticated against the target cluster.
  • A team API key — In the Warp app, go to Settings > Cloud platform > Oz Cloud API Keys to create a team-scoped API key. See API Keys for details.

The oz-agent-worker repository includes a namespace-scoped Helm chart at charts/oz-agent-worker. This is the recommended way to deploy the worker into a cluster.

  • A long-lived Deployment running oz-agent-worker with the Kubernetes backend.
  • A namespaced ServiceAccount for the worker.
  • A namespaced Role / RoleBinding with the minimum permissions needed to manage task Jobs and Pods.
  • A ConfigMap containing the worker config YAML.
  • An optional Secret for WARP_API_KEY (or a reference to an existing Secret).

The chart does not create CRDs or cluster-scoped RBAC resources.

Terminal window
export WARP_API_KEY="your_team_api_key"

Create the namespace if it doesn’t exist:

Terminal window
kubectl create namespace warp-oz

If you’re not using an existing Secret, create one with the API key:

Terminal window
kubectl create secret generic oz-agent-worker \
--from-literal=WARP_API_KEY="$WARP_API_KEY" \
--namespace warp-oz

Expected outcome: kubectl get secret -n warp-oz oz-agent-worker shows the Secret.

Clone the worker repo and install the chart:

Terminal window
git clone https://github.com/warpdotdev/oz-agent-worker.git
helm install oz-agent-worker ./oz-agent-worker/charts/oz-agent-worker \
--namespace warp-oz \
--set worker.workerId=oz-k8s-worker \
--set image.tag=<version>

Expected outcome: kubectl get pods -n warp-oz shows the worker Deployment pod as Running, and the worker logs show Connected to Oz / Listening for tasks.

To scale horizontally, deploy multiple Helm releases with distinct worker IDs rather than increasing replicas on a single release.


Required:

  • worker.workerId — The worker ID (same as --worker-id).
  • image.tag — The worker image tag to deploy.

Worker configuration:

  • worker.logLevel — Log verbosity (debug, info, warn, error). Defaults to info.
  • worker.cleanup — Whether to clean up task Jobs after execution. Defaults to true.
  • worker.maxConcurrentTasks — Maximum concurrent tasks. Defaults to 0 (unlimited).
  • worker.idleOnComplete — Duration to keep the oz process alive after task completion.
  • worker.resources — Resource requests/limits for the worker Deployment. Defaults to 100m CPU and 128Mi memory.
  • worker.livenessProbe — Liveness probe for the worker Deployment. Defaults to an exec probe (kill -0 1). Override with a custom probe or set to null to disable.
  • worker.nodeSelector, worker.tolerations, worker.affinity — Scheduling constraints for the worker Deployment pod.

Kubernetes backend:

  • kubernetesBackend.namespace — Namespace for task Jobs. Defaults to the release namespace.
  • kubernetesBackend.defaultImage — Default Docker image for task pods when the run has no Warp environment image. Leave empty (default) to fall back to ubuntu:22.04.
  • kubernetesBackend.imagePullPolicy — Image pull policy for task pods. Defaults to IfNotPresent.
  • kubernetesBackend.preflightImage — Image for the startup preflight Job. Set this if your cluster restricts allowed registries.
  • kubernetesBackend.unschedulableTimeout — How long a pod may remain unschedulable before failing. Defaults to 30s.
  • kubernetesBackend.setupCommand — Shell command to run before each task.
  • kubernetesBackend.teardownCommand — Shell command to run after each task.
  • kubernetesBackend.extraLabels — Additional labels for task Jobs and Pods.
  • kubernetesBackend.extraAnnotations — Additional annotations for task Jobs and Pods.
  • kubernetesBackend.activeDeadlineSeconds — Maximum task Job lifetime.
  • kubernetesBackend.workspaceSizeLimit — Size limit for workspace emptyDir volume.
  • kubernetesBackend.podTemplate — Raw PodSpec YAML for task Jobs (same as backend.kubernetes.pod_template in the config file).

API key Secret:

  • warp.apiKeySecret.create — Set to true to have the chart create a Secret from warp.apiKeySecret.value. Defaults to false (expects a pre-existing Secret).
  • warp.apiKeySecret.value — The API key value to store in the chart-managed Secret. Only used when warp.apiKeySecret.create is true.
  • warp.apiKeySecret.name — Name of the Secret containing WARP_API_KEY. Defaults to oz-agent-worker.
  • warp.apiKeySecret.key — Key within the Secret. Defaults to WARP_API_KEY.

See the self-hosted worker reference for the full config file schema.


Cluster selection follows Kubernetes client config conventions:

  • Set backend.kubernetes.kubeconfig to use an explicit kubeconfig file.
  • If kubeconfig is omitted and the worker runs inside a Kubernetes pod, the worker uses in-cluster config automatically.
  • Otherwise, the worker falls back to the default kubeconfig loading rules and uses the current context.

namespace selects the namespace inside the chosen cluster. It defaults to default when omitted.


The pod_template field accepts standard Kubernetes PodSpec YAML and is the declarative way to configure task pod scheduling, service accounts, image pull secrets, resources, and environment variables.

When using pod_template, define a container named task to customize the main task container directly. Otherwise, the worker appends its own task container to the PodSpec.

Use valueFrom.secretKeyRef to inject Kubernetes Secret values into task container environment variables:

pod_template:
serviceAccountName: agent-task-sa
imagePullSecrets:
- name: my-registry-creds
containers:
- name: task
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
memory: 8Gi
env:
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: my-k8s-secret
key: github-token
tolerations:
- key: "dedicated"
operator: "Equal"
value: "agents"
effect: "NoSchedule"

On startup, the worker creates a short-lived preflight Job to verify that:

  • The worker has sufficient RBAC permissions in the target namespace.
  • Cluster admission policies (Pod Security Standards, OPA Gatekeeper, Kyverno, etc.) allow the worker’s task pod shape.
  • The preflight image can be pulled.

If the preflight fails, the worker logs a diagnostic error and exits before accepting any tasks. This surfaces policy and configuration issues at deploy time rather than at task execution time.

The preflight image defaults to busybox:1.36. If your cluster restricts allowed registries or images, set preflight_image to an allowlisted image. When imagePullSecrets is configured in pod_template, those secrets apply to the preflight Job as well, so you can point preflight_image at an image in your private registry.


Environment variables for Kubernetes tasks

Section titled “Environment variables for Kubernetes tasks”

There are two ways to pass environment variables to Kubernetes task containers:

  1. pod_template (recommended for Kubernetes-native config) — Use standard Kubernetes env syntax in the task container, including valueFrom.secretKeyRef for Kubernetes Secrets.
  2. -e / --env flags — Backend-agnostic runtime overrides that work across all managed backends.

When configuring the Kubernetes backend via YAML or Helm, declarative task-container env belongs in pod_template rather than a separate top-level list.


Use kubernetesBackend.setupCommand (Helm value) or backend.kubernetes.setup_command (config file) to run a shell command before each task. Use teardownCommand / teardown_command for cleanup after the task finishes. These run inside the task Pod and are useful for workspace bootstrapping or post-run reporting.


The Helm chart includes built-in support for exporting OpenTelemetry metrics from the worker. Enable metrics by setting metrics.enabled=true:

Terminal window
helm install oz-agent-worker ./charts/oz-agent-worker \
--namespace warp-oz \
--set worker.workerId=oz-k8s-worker \
--set image.tag=VERSION \
--set metrics.enabled=true

With the default metrics.exporter=prometheus, the chart creates a Service with Prometheus scrape annotations and exposes port 9464. For clusters using the Prometheus Operator, set metrics.podMonitor.create=true to create a PodMonitor.

To push metrics to an OTLP collector instead, set metrics.exporter=otlp and configure the endpoint via metrics.extraEnv.

See Monitoring for the full list of Helm values, the metric catalog, and sample PromQL queries.


  • Scaling — The chart always deploys a single replica for a given worker.workerId. To run multiple workers, deploy multiple Helm releases with distinct worker IDs rather than scaling a single release horizontally.
  • Security context — The Deployment defaults to a non-root security context (runAsUser: 10001) with allowPrivilegeEscalation: false and all capabilities dropped.
  • Liveness probe — The Deployment includes a default exec liveness probe (kill -0 1). Override worker.livenessProbe for a custom probe, or set it to null to disable.
  • In-cluster auth — The chart assumes the worker runs inside the target cluster and uses in-cluster Kubernetes auth by default.
  • Root init containers — The worker Deployment itself is non-root, but task Jobs require a root init container for sidecar materialization. Ensure the task namespace’s Pod Security Standards allow this.