Containerized Execution
This guide covers running Agor’s executor component in containerized environments like Kubernetes, enabling scalable and isolated execution of AI coding agents.
Overview
Agor’s architecture separates the daemon (orchestration layer) from the executor (isolated execution environment). This separation enables flexible deployment models:
- Local mode: Daemon spawns executors as subprocesses (default)
- Containerized mode: Daemon spawns executors in containers/pods via configurable templates
The Executor Model
┌─────────────────────────────────────────────────────────────┐
│ Daemon (Orchestration Layer) │
│ - REST/WebSocket API │
│ - Database (sessions, tasks, worktrees) │
│ - User authentication │
│ - Never touches git data directly │
│ │
│ Spawns executor via: │
│ - Local: spawn('agor-executor', ['--stdin']) │
│ - Remote: template → kubectl run ... | agor-executor │
└─────────────────────────────────────────────────────────────┘
│
│ JSON payload via stdin
▼
┌─────────────────────────────────────────────────────────────┐
│ Executor (Isolation Boundary) │
│ - Receives typed JSON payload │
│ - Connects back to daemon via WebSocket │
│ - Runs agent SDKs (Claude, Gemini, Codex) │
│ - Manages git operations (clone, worktree) │
│ - Handles terminal sessions (Zellij) │
│ - Short-lived: exits when command completes │
└─────────────────────────────────────────────────────────────┘What the Executor Does
| Command | Purpose |
|---|---|
prompt | Execute agent SDK (Claude Code, Gemini, Codex) |
git.clone | Clone repository with Unix group setup |
git.worktree.add | Create git worktree with permissions |
git.worktree.remove | Remove worktree and cleanup |
zellij.attach | Attach to terminal session |
unix.sync-* | Synchronize Unix users/groups |
Architecture Considerations
This is NOT Static Deployment
Running Agor in Kubernetes is fundamentally different from deploying static web applications:
| Static Deploys | Agor Dev Environments |
|---|---|
| Immutable containers | Interactive development |
| No shared state | Shared git worktrees |
| Scale out replicas | User-specific sessions |
| Ephemeral storage | Persistent filesystem |
Key insight: Agor executors need access to a shared filesystem where git worktrees live. This is similar to how traditional development servers work, not like microservice deployments.
Infrastructure Prerequisites
Containerized Agor requires infrastructure that most organizations already have for multi-server environments. Agor assumes this infrastructure exists - it does not provide it.
Two Requirements
| Requirement | What It Provides | Common Solutions |
|---|---|---|
| Shared Filesystem | Same files accessible from all nodes | EFS, NFS, GlusterFS, Ceph |
| Shared Identity | UID/GID → username/groupname resolution | LDAP, Active Directory, sssd, synced files |
These are standard requirements for any multi-server development environment. If your organization runs shared /home directories across servers, you likely already have both.
Why Shared Identity Matters
A shared filesystem only stores numeric UIDs and GIDs. Without shared identity:
# What you see without identity resolution
$ ls -la /data/agor/worktrees/myproject
drwxrwxr-x 1001 2001 4096 Dec 18 10:00 .
$ whoami
1001
# What you see WITH identity resolution
$ ls -la /data/agor/worktrees/myproject
drwxrwxr-x alice developers 4096 Dec 18 10:00 .
$ whoami
aliceAgor’s Unix isolation features (per-user UIDs, per-worktree GIDs) require that containers can resolve these IDs to names.
How Identity Resolution Works
Unix systems use NSS (Name Service Switch) to resolve identities. The /etc/nsswitch.conf file controls where lookups go:
# Simple setup (local files only)
passwd: files
group: files
# Enterprise setup (LDAP via sssd)
passwd: files sss
group: files sssWhen you run ls -la or whoami, the system queries NSS, which queries the configured backend (files, LDAP, etc.).
Container Identity Options
For executor containers to resolve UIDs/GIDs, choose one:
Option 1: Mount host’s identity files (simplest)
If your host already has identity configured, containers can use it:
volumeMounts:
- name: passwd
mountPath: /etc/passwd
readOnly: true
- name: group
mountPath: /etc/group
readOnly: true
volumes:
- name: passwd
hostPath:
path: /etc/passwd
- name: group
hostPath:
path: /etc/groupOption 2: Mount sssd socket (if host uses LDAP/AD)
If your host runs sssd for LDAP/AD integration, containers can share it:
volumeMounts:
- name: sssd-pipes
mountPath: /var/lib/sss/pipes
- name: nsswitch
mountPath: /etc/nsswitch.conf
subPath: nsswitch.conf
volumes:
- name: sssd-pipes
hostPath:
path: /var/lib/sss/pipes
- name: nsswitch
configMap:
name: nsswitch-sssOption 3: Run sssd in container (most complex)
Install and configure sssd in executor containers to query LDAP directly.
Recommended: Dedicated Cluster
For production Agor deployments with containerized execution, we recommend:
- Dedicated Kubernetes cluster or namespace with specialized configuration
- Shared filesystem (EFS, NFS, GlusterFS) for worktree storage
- Shared identity already configured at the node level
- Pod security policies tuned for development workloads
- Longer timeouts than typical production workloads
Shared Filesystem Setup
Why Shared Storage?
Agor’s development model requires:
- Watch mode: File changes detected in real-time
- Agent access: AI agents read/write to worktrees
- Terminal access: Users interact with files via Zellij
- Environment execution: Docker Compose, npm, etc. run in worktrees
All of these need access to the same filesystem.
Directory Separation
Agor supports separating daemon config from git data:
# ~/.agor/config.yaml
paths:
# Daemon operating files (config, database, logs)
# Default: ~/.agor/
# Storage: local SSD (fast, daemon-local)
# Git data (repos, worktrees)
# Default: same as agor_home
# Storage: shared filesystem (EFS, NFS)
data_home: /data/agorThis enables:
Local SSD Shared Storage (EFS)
┌──────────────────┐ ┌──────────────────────────┐
│ ~/.agor/ │ │ /data/agor/ │
│ ├── config.yaml │ │ ├── repos/ │
│ ├── agor.db │ │ │ └── github.com/ │
│ └── logs/ │ │ │ └── org/repo.git │
└──────────────────┘ │ ├── worktrees/ │
│ │ └── org/repo/ │
│ │ ├── main/ │
│ │ └── feature/ │
│ └── zellij/ │
│ └── sessions/ │
└──────────────────────────┘AWS EFS Configuration
For Amazon EKS deployments:
# StorageClass for EFS
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: agor-efs
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: fs-xxxxxxxxx
directoryPerms: "755"
basePath: "/agor"
reclaimPolicy: Retain
volumeBindingMode: Immediate
---
# PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: agor-data
spec:
accessModes:
- ReadWriteMany # Critical: multiple pods need access
storageClassName: agor-efs
resources:
requests:
storage: 100GiNFS Alternative
For on-premises or non-AWS deployments:
apiVersion: v1
kind: PersistentVolume
metadata:
name: agor-nfs
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
nfs:
server: nfs-server.internal
path: /exports/agor
mountOptions:
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- hard
- timeo=600
- retrans=2Executor Command Template
Configuration
The daemon spawns executors using a configurable template:
# ~/.agor/config.yaml
execution:
# Local execution (default)
# executor_command_template: null
# Kubernetes execution
executor_command_template: |
kubectl run executor-{task_id} \
--image=ghcr.io/preset-io/agor-executor:latest \
--rm -i --restart=Never \
--overrides='{
"spec": {
"securityContext": {
"runAsUser": {unix_user_uid},
"runAsGroup": {unix_user_gid},
"fsGroup": {unix_user_gid}
},
"containers": [{
"name": "executor",
"stdin": true,
"stdinOnce": true,
"volumeMounts": [{
"name": "data",
"mountPath": "/data/agor"
}]
}],
"volumes": [{
"name": "data",
"persistentVolumeClaim": {
"claimName": "agor-data"
}
}]
}
}' \
-- agor-executor --stdinTemplate Variables
| Variable | Description | Example |
|---|---|---|
{task_id} | Unique task identifier (auto-generated) | a1b2c3d4 |
{command} | Executor command | prompt, git.clone |
{unix_user} | Target Unix username | agor_alice |
{unix_user_uid} | Target Unix UID | 1001 |
{unix_user_gid} | Target Unix GID | 1001 |
{session_id} | Agor session ID (if available) | 01abc123... |
{worktree_id} | Worktree ID (if available) | 01xyz789... |
How It Works
- Daemon receives request (e.g., start agent session)
- Daemon constructs JSON payload with all parameters
- Daemon substitutes template variables
- Daemon executes template command via
sh -c - Payload is piped to executor via stdin
- Daemon returns immediately (fire-and-forget)
- Executor connects back to daemon via WebSocket
- Executor performs work, updates database via Feathers API
- Pod terminates when executor exits
Fire-and-Forget Design
The daemon never waits for executor completion. This is critical for:
- Responsiveness: UI gets immediate feedback
- Scalability: Daemon doesn’t hold connections open
- Resilience: Executor failures don’t block daemon
The executor is responsible for:
- Status updates: Updating task/session status via Feathers API
- Error reporting: Logging and broadcasting errors via WebSocket
- Database operations: All mutations happen in the executor
- User notifications: Emitting events the UI can display as toasts
Security Context
Pod Security Best Practices
spec:
securityContext:
# Run as non-root user (mapped from Agor user)
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
# Match filesystem group for shared storage
fsGroup: 1001
# Prevent privilege escalation
allowPrivilegeEscalation: false
containers:
- name: executor
securityContext:
# Read-only root filesystem where possible
readOnlyRootFilesystem: false # Agents need /tmp
# Drop all capabilities except what's needed
capabilities:
drop:
- ALLUser Mapping Considerations
In containerized environments, Unix users map differently:
| Local Execution | Container Execution |
|---|---|
sudo su - agor_alice | runAsUser: 1001 |
| Fresh group memberships | fsGroup: 1001 |
| Full Unix user exists | UID mapping only |
For full Unix isolation in containers, consider:
- User namespace mapping in the container runtime
- LDAP/SSSD for consistent user/group resolution
- Init container to set up user environment
Timeout and Resource Management
Long-Running Sessions
AI agent sessions can run 20-60 minutes. Configure appropriately:
# Pod spec
spec:
# Allow 2 hours for agent sessions
activeDeadlineSeconds: 7200
containers:
- name: executor
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "4"
memory: "8Gi" # Agents can be memory-intensiveCommand-Specific Timeouts
Different executor commands have different timeout needs:
| Command | Typical Duration | Recommended Timeout |
|---|---|---|
prompt | 5-60 minutes | 2 hours |
git.clone | 1-10 minutes | 30 minutes |
git.worktree.add | 1-30 seconds | 5 minutes |
zellij.attach | Session duration | 8 hours |
Timeout Configuration
Since the daemon uses fire-and-forget spawning, timeouts are managed at the Kubernetes level:
spec:
# Pod-level deadline (failsafe)
activeDeadlineSeconds: 7200
containers:
- name: executor
# Liveness probe - detect stuck processes
livenessProbe:
exec:
command:
- cat
- /tmp/executor-alive
initialDelaySeconds: 60
periodSeconds: 30
failureThreshold: 3Network Configuration
Executor-to-Daemon Communication
Executors connect back to the daemon via WebSocket:
# Daemon Service (internal)
apiVersion: v1
kind: Service
metadata:
name: agor-daemon
spec:
selector:
app: agor-daemon
ports:
- port: 3030
targetPort: 3030
type: ClusterIP # Internal only for executors
---
# Ingress for external UI access
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: agor-ui
spec:
rules:
- host: agor.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: agor-ui
port:
number: 80Environment Variables
Executor receives daemon URL in payload:
{
"command": "prompt",
"daemonUrl": "http://agor-daemon.agor.svc.cluster.local:3030",
"sessionToken": "eyJhbGc...",
"params": { ... }
}Egress Requirements
Executors need outbound access to:
| Service | Purpose |
|---|---|
api.anthropic.com | Claude API |
api.openai.com | OpenAI/Codex API |
generativelanguage.googleapis.com | Gemini API |
github.com, gitlab.com | Git operations |
Configure NetworkPolicy or egress controls accordingly.
Daemon URL Resolution
The daemon (not the executor) is responsible for knowing its own URL. It passes this URL to executors in the JSON payload:
{
"command": "prompt",
"daemonUrl": "http://agor-daemon.agor.svc.cluster.local:3030",
"sessionToken": "eyJhbGc...",
"params": { ... }
}The executor simply uses payload.daemonUrl to connect back - it never reads config.yaml.
Local mode: The daemon defaults to http://localhost:{PORT}.
Containerized mode: Configure daemon.public_url so the daemon knows its k8s service URL:
# ~/.agor/config.yaml (read by daemon only)
daemon:
port: 3030
# URL that executors use to reach the daemon (k8s internal service DNS)
public_url: http://agor-daemon.agor.svc.cluster.local:3030At startup, the daemon calls configureDaemonUrl() which sets this URL globally. All subsequent executor payloads automatically include the correct daemonUrl.
High Availability
Current Status
Agor’s daemon currently runs as a single instance. For high availability with multiple daemon replicas, additional configuration is required.
Shared Filesystem Simplifies HA
With shared storage (EFS/NFS) for both AGOR_DATA_HOME and /home, the daemon becomes largely stateless:
- Executors: Make fresh connections to daemon - no sticky sessions needed
- Database: SQLite on shared storage, or use Turso/LibSQL for distributed access
- Filesystem state: Consistent across all replicas via shared mount
What Still Needs Redis
FeathersJS uses Socket.io for real-time UI updates. With multiple daemon replicas:
- UI WebSocket connections: Long-lived, need event broadcasting across replicas
- Redis adapter: Required so events from one replica reach all connected clients
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Browser │ │ Browser │ │ Browser │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Daemon-1 │◄───►│ Redis │◄───►│ Daemon-2 │
└─────────────┘ └─────────────┘ └─────────────┘
│ │
└──────────────┬────────────────────────┘
▼
┌─────────────┐
│ EFS / NFS │ (shared storage)
└─────────────┘Configuration (Future)
HA support is not built into Agor configuration today. To implement HA:
- Mount
/homeandAGOR_DATA_HOMEon shared storage (EFS/NFS) - Configure Socket.io Redis adapter for event broadcasting
- Use a load balancer (sticky sessions optional with Redis)
References:
For most deployments, a single daemon replica is sufficient. Scale horizontally by running multiple executor pods instead.
Unix Groups and Filesystem Permissions
How Agor Uses Unix Permissions
Agor’s isolation model assigns:
- One UID per user - your Unix identity
- One GID per repository - shared access for all repo collaborators
- One GID per worktree - scoped access for worktree owners
A user working on multiple worktrees will be a member of multiple groups.
Running as Users (Not Just UIDs)
In local mode, executors run via sudo su - {username}, which:
- Switches to the user’s UID
- Loads ALL their group memberships from
/etc/group - Sets up their home directory and shell environment
In containerized mode, we want the same behavior. The key insight: run as the user, and their groups come with them.
Container Security Context
Kubernetes provides supplementalGroups for this:
spec:
securityContext:
runAsUser: 1001 # User's UID
runAsGroup: 1001 # User's primary GID
supplementalGroups: # All groups user belongs to
- 2001 # repo-myproject GID
- 3001 # worktree-feature-x GID
- 3002 # worktree-bugfix-y GIDThe daemon knows the user’s group memberships (from worktree ownership records) and can pass them to the executor template.
With Shared Identity (Recommended)
If your infrastructure has shared identity (LDAP/sssd), containers can look up group memberships dynamically - just like local mode. The executor runs as the user, and NSS resolves their groups.
This is the cleanest approach: containers behave like any other server in your environment.
Without Shared Identity (Explicit GIDs)
If you can’t set up shared identity, the daemon must explicitly pass all required GIDs via supplementalGroups. This works but:
- Requires daemon to enumerate all user’s group memberships
- Can result in long lists of GIDs for active users
- Less flexible than dynamic resolution
Template Variables for Security Context
The executor command template supports these variables:
| Variable | Description |
|---|---|
{unix_user} | Username |
{unix_user_uid} | User’s UID |
{unix_user_primary_gid} | User’s primary GID |
{supplemental_gids} | JSON array of all GIDs user needs |
{repo_gid} | Current repo’s GID (if scoped) |
{worktree_gid} | Current worktree’s GID (if scoped) |
Dev Environments in Containerized Mode
The Three-Tier Architecture
In a fully containerized Agor deployment, there are three distinct tiers:
┌─────────────────────────────────────────────────────────────┐
│ Tier 1: Agor Control Plane │
│ - Daemon pod (orchestration, API, database) │
│ - UI pod (web interface) │
│ - Always running │
└─────────────────────────────────────────────────────────────┘
│
│ spawns
▼
┌─────────────────────────────────────────────────────────────┐
│ Tier 2: Executor Pods │
│ - Short-lived pods for agent sessions │
│ - Run Claude/Gemini/Codex SDKs │
│ - Mount shared storage (EFS) │
│ - Exit when task completes │
└─────────────────────────────────────────────────────────────┘
│
│ may spawn
▼
┌─────────────────────────────────────────────────────────────┐
│ Tier 3: Dev Environment Containers │
│ - Docker Compose, npm, pytest, etc. │
│ - Defined in worktree's environment config │
│ - May run as sidecar or separate pods │
└─────────────────────────────────────────────────────────────┘Dev Environment Strategies
In containerized mode, worktree dev environments should also be containerized:
Option A: Kubernetes-Native Commands
Replace Docker Compose with kubectl commands in environment config:
# worktree environment config
environment:
start_command: |
kubectl apply -f k8s/dev-environment.yaml
stop_command: |
kubectl delete -f k8s/dev-environment.yaml
health_command: |
kubectl get pods -l app=myapp-dev -o jsonpath='{.items[0].status.phase}'Option B: Docker-in-Docker (DinD)
Run Docker daemon inside executor pods:
spec:
containers:
- name: executor
image: ghcr.io/preset-io/agor-executor:latest
- name: dind
image: docker:dind
securityContext:
privileged: trueOption C: Podman (Rootless)
Use Podman for rootless container execution within pods.
Recommendation
For production deployments, Option A (Kubernetes-native) is recommended:
- No privileged containers required
- Better resource isolation
- Consistent with cluster security policies
Deployment Patterns
Pattern 1: Daemon + On-Demand Executor Pods
Best for: Variable workloads, cost optimization
┌─────────────────┐
│ Daemon Pod │ (always running)
│ + UI Pod │
└────────┬────────┘
│ spawns on demand
▼
┌─────────────────┐
│ Executor Pod │ (ephemeral, exits when done)
│ Session ABC │
└─────────────────┘# Daemon Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: agor-daemon
spec:
replicas: 1 # Single daemon
template:
spec:
serviceAccountName: agor-daemon
containers:
- name: daemon
image: ghcr.io/preset-io/agor:latest
command: ["agor", "daemon", "start"]
volumeMounts:
- name: config
mountPath: /home/agor/.agor
- name: data
mountPath: /data/agorPattern 2: Executor DaemonSet (Pre-warmed)
Best for: Low latency requirements
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: agor-executor-pool
spec:
selector:
matchLabels:
app: agor-executor
template:
spec:
containers:
- name: executor
image: ghcr.io/preset-io/agor-executor:latest
command: ["sleep", "infinity"] # Warm pool
volumeMounts:
- name: data
mountPath: /data/agorNote: Pre-warmed executors require custom orchestration to reuse pods.
Monitoring and Observability
Metrics to Track
| Metric | Description |
|---|---|
agor_executor_spawn_total | Total executor spawns |
agor_executor_duration_seconds | Execution duration histogram |
agor_executor_failures_total | Failed executions |
agor_prompt_tokens_total | Token usage by session |
Pod Events
Monitor executor pod lifecycle:
kubectl get events --field-selector involvedObject.name=executor-abc123Logging
Executor logs appear in:
- Pod stdout/stderr (captured by Kubernetes)
- Daemon logs (spawn events, timeouts)
# Daemon config
logging:
level: info
format: jsonTroubleshooting
Pod Stuck in Pending
kubectl describe pod executor-abc123Common causes:
- Insufficient resources (request more CPU/memory)
- PVC not bound (check EFS provisioning)
- Node selector mismatch
Filesystem Permission Denied
kubectl exec -it executor-abc123 -- ls -la /data/agor/worktreesVerify:
fsGroupmatches worktree group- EFS access point configured correctly
- PVC mounted with correct permissions
Executor Timeout
Check daemon logs:
kubectl logs -f deployment/agor-daemon | grep "EXECUTOR_TIMEOUT"Solutions:
- Increase
activeDeadlineSeconds - Increase daemon spawn timeout
- Check network connectivity to APIs
WebSocket Connection Failed
Verify daemon is accessible from executor pods:
kubectl exec -it executor-abc123 -- \
curl -s http://agor-daemon:3030/healthCheck:
- Service exists and has endpoints
- NetworkPolicy allows traffic
- Daemon pod is healthy
Migration from Local to Containerized
Step 1: Enable Directory Separation
# ~/.agor/config.yaml
paths:
data_home: /data/agorStep 2: Migrate Existing Data
# Move repos and worktrees to shared storage
mv ~/.agor/repos /data/agor/repos
mv ~/.agor/worktrees /data/agor/worktrees
# Create symlinks for backward compatibility
ln -s /data/agor/repos ~/.agor/repos
ln -s /data/agor/worktrees ~/.agor/worktreesStep 3: Test Local with New Paths
Verify everything works before adding containerization:
agor repo list
agor worktree listStep 4: Configure Daemon URL and Executor Template
Add the containerized execution configuration:
# ~/.agor/config.yaml
daemon:
port: 3030
# URL that executors use to reach the daemon (k8s service DNS)
public_url: http://agor-daemon.agor.svc.cluster.local:3030
execution:
executor_command_template: |
kubectl run executor-{task_id} \
--image=ghcr.io/preset-io/agor-executor:latest \
--rm -i --restart=Never \
--overrides='{"spec":{"securityContext":{"runAsUser":{unix_user_uid},"fsGroup":{unix_user_gid}}}}' \
-- agor-executor --stdinStep 5: Test Single Executor
Spawn one executor pod manually:
kubectl run test-executor \
--image=ghcr.io/preset-io/agor-executor:latest \
--rm -it -- agor-executor --versionStep 6: Full Integration Test
Create a worktree and run an agent session through the UI.
Related Documentation
- Full Multiplayer Mode - Unix isolation and RBAC
- Environments - Worktree environment templates
- Architecture - System design overview