Agent Teams Report
A survey of how individuals and teams are running multi-agent coding setups (Feb 2026).
1. Boris Cherny -- Creator of Claude Code, Head of Claude Code @ Anthropic
Scale: 10-15 concurrent sessions, 20-27 PRs/day, 100% AI-written code Business: Employee at Anthropic. Claude Code ~$1B annualized revenue in 6 months.
Boris (human)
|
|──> 5 terminal tabs (iTerm, OS notifications)
|──> 5-10 browser sessions (claude.ai/code)
|──> mobile sessions (fire-and-forget)
|
v
Each session = independent Claude Code instance
|
|── Model: Opus 4.5 + extended thinking (always)
|── CLAUDE.md: shared knowledge base (updated weekly)
|── Plan Mode first, then auto-accept
|
v
┌───────────── ──┐
│ PostToolUse │ <-- formatting hooks fix style drift
│ hooks │
└───────┬───────┘
|
v
┌───────────────┐
│ Verification │ <-- Chrome extension, agent self-tests
│ loops │
└───────┬───────┘
|
v
┌───────────────┐
│ PR │ <-- /commit-push-pr slash command
└───────────────┘
"Teleport" hands sessions between terminal ↔ browser ↔ mobile
Key practices:
- CLAUDE.md (not AGENTS.md) as living knowledge base -- errors get documented so they never repeat
/permissionspre-allows safe bash commands- Subagents:
code-simplifier,verify-app - 259 PRs in 30 days. 90% of Claude Code's own codebase written by Claude Code.
2. Claude Code Native Multi-Agent -- Four Layers
Status: Subagents + SDK stable, Agent Teams experimental Business: Part of Claude Code ($200/mo Max plan, or API usage)
Layer 1: SUBAGENTS (in-session)
────────────────────────────────
Parent agent
|
|── Task("Explore", "find all API routes") <-- Haiku, read-only
|── Task("code-reviewer", "review changes") <-- custom .claude/agents/*.md
|── Task("general", "refactor auth") <-- background, full tools
|
v
Results summarized back to parent
Subagents CANNOT talk to each other
Layer 2: AGENT TEAMS (cross-session, experimental)
───────────────────────────────────────────────────
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
tmux -CC
|
v
┌──────────────┐
│ Team Lead │ <-- Opus, plans work, assigns tasks
└──────┬───────┘
|
┌────┼────┐
v v v
[T1] [T2] [T3] <-- Teammates (Sonnet/Haiku), each in tmux pane
| | |
v v v
Shared task list + mailbox system
Direct inter-agent messaging
Dependencies: task A blocks task B
Display: in-process OR tmux split panes
Quality gates: TeammateIdle, TaskCompleted hooks
Layer 3: AGENT SDK (programmatic)
─────────────────────────────────
from claude_code import Agent, AgentDefinition
agents = [
Agent("planner", model="opus", tools=[...]),
Agent("coder", model="sonnet", tools=[...]),
Agent("tester", model="haiku", tools=[...]),
]
results = await asyncio.gather(*[a.run(task) for a in agents])
Full control: hooks as callbacks, MCP, permissions, session resume
Layer 4: GIT WORKTREES (manual)
───────────────────────────────
claude -w feature-1 & claude -w feature-2 & claude -w feature-3
| | |
v v v
.worktrees/feature-1 .worktrees/feature-2 .worktrees/feature-3
(independent branch) (independent branch) (independent branch)
Human merges when done. No coordination.
3. Simon Willison -- Parallel Agents, Different Models
Scale: 2-3 research projects/day across multiple agents Business: Independent developer, creator of Datasette. No product to sell -- writes about what works.
Simon (human)
|
├──> Claude Code (Sonnet 4.5) <-- primary terminal agent
├──> Codex CLI (GPT-5-Codex) <-- second terminal agent
├──> Claude Code for Web <-- async, fire-and-forget
├──> Codex Cloud <-- async
└──> Jules <-- async
|
v
Each in separate terminal / browser tab
Isolation: fresh /tmp checkouts per task
No coordination framework -- human is the router
── tools ──────────────────────────
llm CLI <-- logs everything to SQLite, analyzed via Datasette
files-to-prompt <-- convert repo files to LLM context
shot-scraper <-- automated screenshots for visual testing
Key concepts:
- "Agents = models using tools in a loop" (his canonical definition, 211 competing definitions collected)
- Vibe Engineering (not vibe coding): 12 practices including automated tests, git discipline, code review
- Bottleneck is human review, not agent speed
- Skills > MCP for simplicity and low token overhead
- "Lethal trifecta" security model: private data + untrusted content + external communication = danger
4. dmux -- Parallel Agents via tmux + Worktrees
Scale: N concurrent agents, git worktree isolation per pane Business: MIT, fully free. Creator (Justin Schroeder) monetizes FormKit Pro ($149-$1,250). Open source: github.com/standardagents/dmux
dmux TUI
|
|──> press 'n'
|
v
┌─────────────────┐
│ AI-generate slug │ <-- OpenRouter (gpt-4o-mini)
└────────┬────────┘
|
v
┌─────────────────┐
│ Create git │ <-- .dmux/worktrees/<slug>/
│ worktree │ full independent working copy
└────────┬────────┘
|
v
┌─────────────────┐
│ Split tmux pane │
│ Launch agent │ <-- claude/codex/opencode (--acceptEdits)
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent works │ <-- status via LLM analysis of terminal (1s poll)
│ autonomously │
└────────┬────────┘
|
v
press 'm' to merge
|
v
┌─────────────────┐
│ AI commit msg │ <-- conventional commits via OpenRouter
│ Merge to main │
│ Remove worktree │
└─────────────────┘
Hooks: worktree_created, pre_merge, post_merge
A/B mode: two agents, same prompt, side-by-side
Web dashboard + REST API for programmatic control
5. OpenClaw -- Open-Source AI Agent Framework
Scale: 213K+ GitHub stars, 50+ integrations Business: MIT license, free to self-host. OpenClaw Cloud planned at $39/mo. Real cost: $5-30/mo in LLM API fees. Creator: Peter Steinberger (ex-PSPDFKit, acqui-hired by OpenAI Feb 2026)
User prompt
|
v
┌─────────────────┐
│ OpenClaw gateway │ <-- local-first, 50+ integrations
│ (agent router) │ messaging, coding, browser, etc.
└────────┬────────┘
|
┌─────┼─────┐
v v v
[Sub-1][Sub-2][Sub-3] <-- sub-agent collaboration
| | | 40% accuracy boost vs monolithic prompting
└─────┼─────┘
|
v
┌─────────────────┐
│ Output │ <-- declarative agent config in YAML
└─────────────────┘
Not primarily a coding tool -- general-purpose AI assistant
Can run with local models (Ollama + Llama 3.3) for $0/mo
Will remain open source under OpenAI stewardship
6. Superconductor -- Parallel Cloud Agents with Live Previews
Scale: N agents per ticket, cloud sandboxes, live browser previews Business: Closed-source SaaS by Volition (Gradescope founders). BYOK model. Pricing undisclosed, early access.
Create ticket (informal)
|
v
┌─────────────────┐
│ Launch N agents │ <-- each in isolated cloud container
│ on same ticket │ (Modal / Morph Cloud)
└────────┬────────┘
|
┌─────┼─────┐
v v v
[Agent1][Agent2][Agent3] <-- Claude/Codex/Amp/Gemini
| | |
v v v
[Live] [Live] [Live] <-- browser previews ~30s
[prev] [prev] [prev]
| | |
└─────┼─────┘
|
v
┌─────────────────┐
│ Compare previews │ <-- visual diff, interact with each
│ Select best │
│ One-click PR │
└─────────────────┘
7. 8090 Software Factory -- Enterprise Agent Platform
Scale: Multi-repo code modernization Business: Proprietary. $200/seat/mo (Team), custom Enterprise, managed delivery from $1M/yr. Funded by Chamath Palihapitiya personally.
┌─────────────────┐
│ Refinery │ <-- reverse-engineer codebase into knowledge graph
└────────┬────────┘
|
v
┌─────────────────┐
│ Planner │ <-- AI generates migration/transformation plans
└────────┬────────┘
|
v
┌─────────────────┐
│ Foundry │ <-- specialized agents execute plan
│ (agent workers) │ across multiple repos
└────────┬────────┘
|
v
┌─────────────────┐
│ Validator │ <-- quality gate, CI, tests
└────────┬────────┘
|
v
┌─────────────────┐
│ Factory Line │ <-- full pipeline for enterprise
│ output: PRs │ code modernization at scale
└─────────────────┘
8. Terragon -- Background Fire-and-Forget (SHUT DOWN)
Scale: ~30 concurrent tasks/day, auto-PR creation Business: SaaS subscription. Shut down Feb 9, 2026. Code released Apache-2.0. Why: Native background agents from Claude Code and Codex commoditized the orchestration layer.
Create task (web / CLI / GitHub / Slack / mobile)
|
v
┌─────────────────┐
│ Cloud sandbox │ <-- isolated container, clone repo, create branch
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent works │ <-- background, checkpoints pushed to GitHub
│ autonomously │ AI-generated commits
└────────┬────────┘
|
v
┌─────────────────┐
│ PR created │ <-- automatic when done
└────────┬────────┘
|
v
Human reviews and merges
DEAD: Codex reached 28% agent usage on Terragon within 1 month
Native background agents made the wrapper unnecessary
9. Vadim Strizheus -- "AI Employees" for VugolaAI
Scale: Claims 14 AI employees, 95% automated Business: VugolaAI (video clipping/scheduling SaaS). Free tier. Solana token (VGLA).
Long-form video input
|
v
┌─────────────────┐
│ AI Moment │ <-- "AI employee" 1: detect viral-worthy segments
│ Detection │
└────────┬────────┘
|
v
┌─────────────────┐
│ Auto-Clipping │ <-- "AI employee" 2-N: extract, reframe, caption
│ + Captioning │
└────────┬────────┘
|
v
┌─────────────────┐
│ Branding + │ <-- template application
│ Formatting │
└────────┬────────┘
|
v
┌─────────────────┐
│ Multi-Platform │ <-- TikTok, YouTube, Instagram, X, LinkedIn
│ Scheduling │
└─────────────────┘
Note: Specific agent breakdown from video tweet, not independently verified.
The product itself IS the AI automation -- "employees" = AI pipeline stages.
10. Notable Voices
Francois Chollet (@fchollet)
"Sufficiently advanced agentic coding is essentially machine learning"
Does NOT run a multi-agent setup. Warns about maintaining "sprawling mess of AI-generated legacy code." Useful contrarian check.
Andrej Karpathy
Coined "vibe coding" (Feb 2025), then abandoned it for "agentic engineering" (Feb 2026). Evolution: accept all AI output → require specs, review, test suites.
Addy Osmani
Defined Conductor (sequential) vs Orchestrator (parallel) agent frameworks. Identified the "80% problem" -- last 20% takes as long as first 80%.
Comparison Matrix
| System | Type | Open Source | Pricing | Agents | Key Feature |
|---|---|---|---|---|---|
| Boris Cherny | Individual workflow | N/A (uses Claude Code) | $200/mo Max | 10-15 parallel CC | Teleport between devices |
| Claude Code Teams | Built-in | N/A (product feature) | $200/mo Max or API | N (tmux panes) | Shared task list + mailbox |
| Claude Agent SDK | Library | MIT | API usage | Programmatic | Full orchestration control |
| Simon Willison | Individual workflow | N/A | Multi-subscription | CC + Codex + async | Human as router |
| dmux | OSS tool | MIT | Free | N (tmux + worktrees) | A/B agent comparison |
| OpenClaw | OSS framework | MIT | Free / $39 Cloud | Sub-agents | 213K stars, joined OpenAI |
| Superconductor | SaaS | No | Undisclosed (BYOK) | N per ticket | Live browser previews |
| 8090 | Enterprise | No | $200/seat/mo+ | Factory Line | Knowledge graph + modernization |
| Terragon | SaaS (dead) | Apache-2.0 (post-shutdown) | Was subscription | Background agents | Shut down Feb 2026 |
| VugolaAI | Product | No | Free tier | 14 "AI employees" | Video pipeline automation |
Common Patterns
What works across all setups:
1. ISOLATION -- worktrees, containers, or separate sessions
agents must not conflict with each other
2. PLAN FIRST -- Opus/expensive model plans, cheaper model executes
Boris: Plan Mode → auto-accept
Agent Teams: team lead plans, teammates execute
3. MEMORY -- CLAUDE.md / AGENTS.md / progress.txt
errors documented so they never repeat
updated by the agent, not the human
4. VERIFICATION -- automated tests, browser screenshots, self-review
humans review throughput, not individual lines
5. MODEL TIERING -- Opus for planning ($$$), Sonnet for coding ($$), Haiku for tests ($)
"correct answer costs less total iteration time than fast wrong ones"
What doesn't work:
1. NO TESTS -- agents spiral without verification signals
2. NO MEMORY -- same mistakes repeat across sessions
3. SHARED STATE -- agents editing same files = merge hell
4. NO REVIEW -- "vibe coding" produces unmaintainable code (Chollet, Karpathy)
Business Model Summary
FREE / OSS:
dmux (MIT) -- monetizes separately via FormKit Pro
OpenClaw (MIT) -- Cloud tier planned $39/mo, creator joined OpenAI
claude-flow (MIT) -- reputation/consulting play
Ralph/Compound (MIT) -- promotes Amp (Sourcegraph)
Terragon (Apache-2.0) -- released on shutdown
SAAS / COMMERCIAL:
Superconductor -- BYOK, undisclosed platform fee, early access
8090 -- $200/seat/mo, $1M/yr managed delivery
VugolaAI -- free tier + crypto token (VGLA)
PLATFORM:
Claude Code -- $200/mo Max plan or API usage (~$1B ARR)
OpenAI Codex -- subscription + API
GitHub Agent HQ -- Copilot subscription (multi-vendor agents)
The trend: orchestration tools struggle to monetize when platforms
add native multi-agent features (see: Terragon shutdown).
Survivors either go enterprise (8090) or stay free and build community (dmux, OpenClaw).