SDK Feature Comparison
Agor integrates with multiple AI coding agents. Each SDK has different capabilities based on their native features and API design.
Feature Matrix
| Feature | Claude Code | Codex | Gemini | Notes |
|---|---|---|---|---|
| Streaming responses | ✅ Yes | ✅ Yes | ✅ Yes | All support token-level streaming for typewriter effect |
| Stop mid-execution | ✅ Yes | ⚠️ Limited | ⚠️ Limited | Claude has interrupt(), others may lose partial work |
| Session import/export | ✅ Yes | ❌ No | ❌ No | Only Claude Code stores sessions as JSONL transcripts |
| Session forkable | ✅ Yes (via replay) | ✅ Yes (via replay) | ✅ Yes (via replay) | Agor emulates forking by replaying messages to new session |
| MCP integration | ✅ Native | ⚠️ STDIO only (not wired) | ✅ Native | Claude Code has first-class MCP server support |
| Permission requests | ✅ Rich | ⚠️ Basic | ⚠️ Basic | Claude has granular tool permissions, others use simpler modes |
| Project instructions | ✅ CLAUDE.md | ⚠️ Manual | ⚠️ Manual | Claude auto-loads CLAUDE.md, others need manual injection |
| Tool execution | ✅ Rich widgets | ⚠️ Basic | ⚠️ Basic | Claude exposes detailed tool metadata for rich visualization |
| Session continuity | ✅ SDK-managed | ✅ History-based | ✅ History-based | Claude uses session_id, others replay message history |
| Token usage tracking | ✅ Full | ✅ Full | ❌ No | Claude & Codex now surface SDK usage metadata (input/output/cache) for pills + cost |
| Context window tracking | 🟡 Estimated | 🟡 Estimated | ❌ No | Agor derives cumulative conversation usage; Codex uses the same formula via SDK metrics |
Legend:
- ✅ Full support - Feature works natively via SDK
- ⚠️ Limited - Partial support or workarounds needed
- 🟡 Emulated - Agor implements the feature
- ❌ Not supported - Not available
Detailed Feature Breakdown
1. Streaming Responses
All SDKs support streaming, but with different patterns:
Claude Code:
- Token-level streaming via async generators
- True real-time with minimal latency
- Supports stopping mid-stream
Codex:
- Streaming via OpenAI’s SSE (Server-Sent Events)
- Chunks may be larger than tokens
- Good real-time performance
Gemini:
- Streaming via
sendMessageStream() - Token-level chunks
- Reliable streaming performance
Agor Implementation: StreamingCallbacks interface provides unified streaming for all agents.
2. Stop Mid-Execution
Claude Code: ✅ Full support
- Native
interrupt()method in Agent SDK - Gracefully stops execution
- Returns partial results
- Safe cleanup of resources
Codex: ⚠️ Limited
- Can abort HTTP request
- May lose partial work
- Less graceful than Claude
Gemini: ⚠️ Limited
- Can abort stream
- May lose partial tool execution results
- Similar to Codex limitations
3. Session Import/Export
Claude Code: ✅ Full support
- Sessions stored as JSONL transcript files in
~/.claude/projects/ - Rich metadata (git state, tool uses, timestamps)
- Agor can parse and replay transcripts
- Enables session portability
Codex: ❌ Not supported
- No native session persistence format
- Agor stores sessions in own database
- Cannot import external Codex sessions
Gemini: ❌ Not supported
- Has checkpoint system but format undocumented
- Agor stores sessions in own database
- Cannot import external Gemini sessions
4. Session Forking
All agents support forking via Agor’s replay mechanism:
- User requests fork at specific message
- Agor creates new session
- Replays messages up to fork point
- New prompts diverge from original
- Full genealogy tracking in database
Note: This is an Agor feature, not SDK-native. Works identically across all agents.
5. MCP Integration
Claude Code: ✅ Native support
- MCP servers configured in Agent SDK options
- Tools automatically exposed to Claude
- Supports MCP prompts, resources, and tools
- Agor UI for managing MCP server configs per session
Codex: ⚠️ STDIO only (not yet integrated)
- SDK supports MCP via STDIO transport
- Agor hasn’t wired Codex MCP configuration yet
- HTTP/SSE transports still unsupported by the SDK
Gemini: ✅ Native support
- MCP servers configured directly through the SDK config
- Supports advanced options (tool filtering, transport selection) shared with Claude
- Agor provides the same session-level MCP selection UI for Gemini sessions
See Architecture Guide: Agor as an MCP Server for details on using MCP with Claude Code and Gemini.
6. Permission Requests
Claude Code: ✅ Rich permission system
- Granular tool-level permissions (
Read,Write,Edit,Bash, etc.) - Permission modes:
'ask','auto','allow-all' - Tool-specific context (file paths, command previews)
- Diff previews for file edits
- Agor shows rich permission modals with previews
Codex: ⚠️ Basic permission modes
- Simpler permission system
- Modes:
'ask','auto','allow-all' - Less granular than Claude
Gemini: ⚠️ Basic permission modes
- Similar to Codex
- Function calling requires explicit approval
- Less detailed context than Claude
Agor Implementation: Permission modals adapt based on available SDK metadata. Claude shows richest UX.
7. Project Instructions
Claude Code: ✅ CLAUDE.md auto-loading
- Agent SDK automatically loads
CLAUDE.mdfrom working directory - Supports both project-level (
.claude/) and personal (~/.claude/) instructions - No manual injection needed
- Agor sessions inherit project instructions automatically
Codex: ⚠️ Manual injection
- No standard instruction file format
- Agor can inject instructions via system prompt
- Requires explicit configuration
Gemini: ⚠️ Manual injection
- No standard instruction file format
- Agor can inject instructions via system prompt
- Requires explicit configuration
Recommendation: Use Claude Code for projects with rich context requirements. For Codex/Gemini, manage instructions in Agor session config.
8. Tool Execution & Visualization
Claude Code: ✅ Rich tool metadata
- Detailed tool use objects with
id,name,input - Tool result blocks with rich content types
- Supports rendering custom tool widgets (Todo lists, diffs, etc.)
- Agor can build sophisticated tool visualizations
Codex: ⚠️ Basic function calling
- OpenAI function calling format
- Less structured than Claude’s tools
- Agor normalizes to common format
Gemini: ⚠️ Basic function calling
- Google function calling format
- Similar to Codex in structure
- Agor normalizes to common format
Agor’s Tool Widgets:
- Todo lists - Rendered from Claude’s TodoWrite tool
- File diffs - Syntax-highlighted edits (Claude Write/Edit tools)
- Bash output - Terminal-style output rendering
- Permission requests - Interactive approval modals
Works best with Claude Code due to richer tool metadata.
9. Session Continuity
Claude Code: ✅ SDK-managed conversation
- Agent SDK assigns
session_idfor multi-turn conversations - Agor captures and stores
sdk_session_id - Seamless continuity across multiple prompts
- No message replay needed
Codex: 🟡 History-based continuity
- No native session concept
- Agor passes message history array to each API call
- Seamless from user perspective, more work behind the scenes
Gemini: 🟡 History-based continuity
- Uses
setHistory()to restore conversation state - Similar to Codex pattern
- Works reliably but requires explicit history management
Performance Consideration: Claude’s session_id approach scales better for very long conversations (no need to resend full history). Codex/Gemini history-based approach works fine for typical sessions.
10. Token Usage Tracking
Claude Code: ✅ Comprehensive token tracking
- Full extraction from SDK
token_usageevents - Tracks:
input_tokens,output_tokens,total_tokens - Prompt caching metrics:
cache_read_tokens,cache_creation_tokens - Context window tracking via
model_usagemetadata - Real-time usage display in Agor UI
- Historical usage analytics per session/task
Codex: ✅ Full tracking (input/output/cache)
turn.completedevents now emit usage payloads that Agor maps toTokenUsage- Tracks:
input_tokens,output_tokens, derivedtotal_tokens - Maps
cached_input_tokens→cache_read_tokensfor cost + context math - Populates TaskHeader + Session footer pills with counts + $ estimate
- Feeds daemon cost calculator + context utilities automatically
Gemini: ❌ Not implemented
- Gemini API provides usage data but not yet extracted
- Future: Will track input/output token counts
Why it matters:
- Cost tracking - Monitor API spending per session/project
- Performance optimization - Identify prompt caching opportunities (Claude)
- Context management - Track how close you are to context limits
- Analytics - Compare efficiency across different tasks
Agor Implementation: Token data stored in messages table, aggregated for session-level analytics. Claude and Codex sessions display token/context pills today; Gemini support is next.
11. Context Window Tracking
Claude Code: 🟡 Estimated (conversation size only)
- Tracks cumulative conversation tokens (input + output)
- Resets counter on compaction events
- Does NOT include system overhead (~46K tokens for CLAUDE.md + tools + MCP)
- Labeled as “Estimated” in UI to reflect limitations
- Shows 0-200K range accurately for conversation size
Codex: 🟡 Estimated (conversation + cache reads)
- Uses SDK
input_tokens + cache_read_tokensper task - Reuses Claude context-window utility for TaskHeader + footer pills
- Provides percentage vs. model-specific limit (gpt-5-codex, gpt-4o, etc.)
- Same limitations as Claude (system overhead not included)
Gemini: ❌ Not implemented
- No context tracking available yet
The Challenge:
Accurately tracking context window usage is surprisingly difficult because Anthropic doesn’t document which tokens count toward the 200K limit. Here’s what Agor does:
What we know:
input_tokens- Fresh user prompt for this turn (definitely counts)output_tokens- Claude’s response (definitely counts)cache_read_tokens- Reading cached content (counts toward context but inflated by tool executions)cache_creation_tokens- Creating new cache entries (~46K baseline for CLAUDE.md + tools + MCP)
The problem with cache tokens:
cache_read_tokenscan be 10x inflated due to tool calls- Example: 38 tool calls × 40K cached system = 1.6M tokens reported (8x the 200K limit!)
- This is a billing metric, not an accurate context measure
cache_creation_tokensrepresents system overhead but grows unpredictably- First task: ~46K (CLAUDE.md + system prompt + tools)
- Later tasks: Can grow to 100K+ as more content gets cached
Agor’s approach:
// Sum conversation tokens across all tasks
context_window = Σ(task.input_tokens + task.output_tokens);
// Reset on compaction events
if (compaction_detected) {
context_window = 0; // Start fresh
}What this gives you:
- ✅ Accurate tracking of conversation size (0-200K range)
- ✅ Proper accumulation across turns
- ✅ Automatic reset on compaction
- ❌ Excludes ~46K system overhead (CLAUDE.md, tools, MCP)
Example session:
Task 1: 8,956 tokens → 4.5% (conversation only)
Task 2: 10,593 tokens → 5.3% (growing)
Task 3: 13,540 tokens → 6.8%
Task 7: 23,546 tokens → 11.8%
Actual context: ~70K tokens (24K conversation + 46K system overhead)Why “Estimated”:
Without official Anthropic documentation on the exact formula, we can’t be 100% certain our calculation matches their internal tracking. We’re confident in the conversation size tracking, but system overhead remains an educated guess.
Future: If Anthropic publishes official context window calculation docs, we’ll update our implementation immediately.
Advanced Claude Code SDK Features
Claude Code’s Agent SDK exposes several advanced features that Agor fully supports. These are Claude-exclusive and not available in Codex or Gemini integrations.
Task Tool & Nested Operations
What it is: The Task tool spawns subsessions (e.g., Explore, Plan agents) to handle complex multi-step operations autonomously.
Agor Implementation:
- Nested operation tracking - Captures
parent_tool_use_idfrom SDK to link child operations (Read, Grep, etc.) to parent Task invocation - Chronological grouping - Task nested blocks appear exactly where the parent Task was called, not at the bottom
- Collapsed by default - Shows summary (tool count, success/error stats) when collapsed
- Expandable inspection - Click to see full tool chain executed inside the subsession
- Purple visual treatment - Distinct styling to differentiate from regular tool chains
UI Display:
┌─────────────────────────────────────────────┐
│ ⚡ Explore · Find session settings modal │
│ 3 tools · Read, Grep · ✓ 3 │
│ Found SessionSettingsModal in src/... │ [Collapsed]
└─────────────────────────────────────────────┘
When expanded:
├─ Read: src/components/SessionSettingsModal.tsx
├─ Grep: "Custom Context" in **/*.tsx
└─ Read: src/pages/Settings.tsxDatabase Storage: messages.parent_tool_use_id links all nested operations to their spawning Task tool use ID.
Extended Thinking Mode
What it is: Allocates additional tokens for Claude to use an internal “scratchpad” for reasoning through complex problems before generating responses.
Agor Implementation:
- Auto Mode (Default) - Detects keywords in prompts:
think→ 4,000 tokensthink hard,think deeply→ 10,000 tokensthink harder,ultrathink→ 31,999 tokens
- Manual Mode - Set explicit budget (0-32k) in session settings
- Off Mode - Disable thinking to save costs
- Real-time streaming - Thinking blocks stream token-by-token via WebSocket (
thinking:chunkevents)
Example Prompts:
"think about the best architecture for this feature"
"think hard about potential edge cases before implementing"
"ultrathink this critical database migration strategy"UI Display: Thinking blocks appear as separate content blocks in messages, visually distinct from response text. Rendered with gray background and ”🧠 Thinking…” header.
Cost Consideration: Extended thinking uses additional input tokens. Use judiciously for complex tasks where reasoning quality matters.
TodoWrite Tool & Sticky Display
What it is: Claude can create structured task lists to track progress during complex implementations.
Agor Implementation:
- Real-time updates - Todo status changes stream live via WebSocket
- Sticky display - Latest todo always visible at bottom of running task, above typing indicator
- Visual states:
- ⏳ Pending - Not started yet
- ⚙️ In Progress - Currently working (spinning icon)
- ✅ Completed - Done
- Automatic cleanup - Completed todos collapse after a few seconds
- Historical view - All todos preserved in message history
UI Display:
Current Task:
┌─────────────────────────────────────────┐
│ ⚙️ Implementing database migration │ ← Always visible
└─────────────────────────────────────────┘
[Typing indicator appears below]Best Practice: TodoWrite tool activates automatically for multi-step tasks. Users can see exactly what Claude is working on in real-time.
Permission Scopes
What it is: Granular control over tool approvals with three scope levels.
Agor Implementation:
- Once - Approve this single tool execution
- Session - Approve all future uses of this tool in the current session
- Repo/Project - Approve all future uses of this tool in the current repository
UI Display:
┌─────────────────────────────────────────────────┐
│ Claude wants to run: npm install │
│ │
│ Scope: │
│ ○ Once │
│ ● Session ← Selected │
│ ○ Repo │
│ │
│ [Approve] [Deny] │
└─────────────────────────────────────────────────┘Database Storage: permission_requests table tracks scope decisions. Repo-level permissions stored in repo_settings.
Permission Modes:
'ask'- Prompt for every tool use (default)'auto'- Auto-approve with scope rules applied'allow-all'- Never prompt (dangerous, use with caution)
Real-Time Tool Execution Indicators
What it is: Live feedback when tools are executing, before results are available.
Agor Implementation:
- WebSocket events -
tool:executingevents broadcast when tools start - Visual indicators - Spinning icons next to tool names
- Multiple tools - Shows all concurrently executing tools
- Dismisses automatically - When tool results arrive
UI Display:
⚙️ Running tools:
• Read src/components/TaskBlock.tsx
• Grep "parent_tool_use_id" in **/*.tsWhy it matters: For long-running tools (Bash commands, large file reads), users see progress immediately instead of staring at typing indicator.
AgentChain Visualization
What it is: Groups consecutive tool-only messages into collapsed chains, hiding implementation details unless user wants to inspect.
Agor Implementation:
- Auto-grouping - 3+ consecutive messages with only tools/thinking → collapsed chain
- Summary display - Shows tool count and types when collapsed
- Expandable - Click to see full tool execution sequence
- Mixed messages - If assistant adds text response, chain breaks and text displays normally
UI Display:
┌─────────────────────────────────────────┐
│ 🔗 Agent Operations (5 tools) │ [Collapsed]
│ Read, Edit, Bash, Grep, Write │
└─────────────────────────────────────────┘
When expanded:
├─ Read: src/types/Message.ts
├─ Edit: Added parent_tool_use_id field
├─ Grep: "parent_tool_use_id" in **/*.ts
├─ Write: TaskNestedBlock.tsx (NEW)
└─ Bash: pnpm buildProgressive disclosure: Users see high-level progress by default, can drill into details when debugging or learning.
Other Claude-Exclusive Features in Agor
Rich Tool Metadata:
- Detailed
tool_useobjects with ID, name, input parameters - Tool result blocks with structured content types
- Enables sophisticated visualizations (file diffs, syntax highlighting)
Git State Tracking:
- SDK automatically captures branch, commit SHA, dirty state
- Displayed in task metadata pills
- No manual git querying needed
Session Import/Export:
- Parse Claude Code JSONL transcripts from
~/.claude/projects/ - Replay sessions in Agor
- Full fidelity preservation of tool uses, timestamps, git state
MCP Integration:
- First-class MCP server support
- Session-level MCP server selection
- Agor UI for managing MCP configs
Choosing an Agent
Use Claude Code when:
- You need rich project instructions (CLAUDE.md)
- MCP server integration is required
- You want detailed tool execution visualization
- Session import/export is important
- You need the best permission request UX
- Token usage tracking and cost monitoring is important
Use Codex when:
- You prefer OpenAI’s models
- You don’t need MCP integration (STDIO support exists in SDK but isn’t wired up in Agor yet)
- Basic permissions are sufficient
- You want token/cost/context pills powered by OpenAI usage data
Use Gemini when:
- You prefer Google’s models
- Cost is a primary concern (Gemini pricing)
- Streaming is important (Gemini has good streaming)
- You need MCP integration with Google’s SDK (full support via Agor UI)
Future Improvements
As agent SDKs evolve, Agor will gain new capabilities.
Agent integration priorities:
- Token tracking for Gemini - Extract usage data from Google APIs (Codex shipped Jan 2025)
- Codex MCP support - If OpenAI adds MCP protocol
- Gemini project files - If Google adds instruction file support
- Better stop controls - More graceful cancellation for all agents
- Richer tool metadata - If Codex/Gemini expose more tool details