Skip to main content

Agent Teams Report

· 10 min read

A survey of how individuals and teams are running multi-agent coding setups (Feb 2026).


1. Boris Cherny -- Creator of Claude Code, Head of Claude Code @ Anthropic

Scale: 10-15 concurrent sessions, 20-27 PRs/day, 100% AI-written code Business: Employee at Anthropic. Claude Code ~$1B annualized revenue in 6 months.

  Boris (human)
|
|──> 5 terminal tabs (iTerm, OS notifications)
|──> 5-10 browser sessions (claude.ai/code)
|──> mobile sessions (fire-and-forget)
|
v
Each session = independent Claude Code instance
|
|── Model: Opus 4.5 + extended thinking (always)
|── CLAUDE.md: shared knowledge base (updated weekly)
|── Plan Mode first, then auto-accept
|
v
┌───────────────┐
│ PostToolUse │ <-- formatting hooks fix style drift
│ hooks │
└───────┬───────┘
|
v
┌───────────────┐
│ Verification │ <-- Chrome extension, agent self-tests
│ loops │
└───────┬───────┘
|
v
┌───────────────┐
│ PR │ <-- /commit-push-pr slash command
└───────────────┘

"Teleport" hands sessions between terminal ↔ browser ↔ mobile

Key practices:

  • CLAUDE.md (not AGENTS.md) as living knowledge base -- errors get documented so they never repeat
  • /permissions pre-allows safe bash commands
  • Subagents: code-simplifier, verify-app
  • 259 PRs in 30 days. 90% of Claude Code's own codebase written by Claude Code.

Full reference | Boris's Twitter thread


2. Claude Code Native Multi-Agent -- Four Layers

Status: Subagents + SDK stable, Agent Teams experimental Business: Part of Claude Code ($200/mo Max plan, or API usage)

  Layer 1: SUBAGENTS (in-session)
────────────────────────────────
Parent agent
|
|── Task("Explore", "find all API routes") <-- Haiku, read-only
|── Task("code-reviewer", "review changes") <-- custom .claude/agents/*.md
|── Task("general", "refactor auth") <-- background, full tools
|
v
Results summarized back to parent
Subagents CANNOT talk to each other


Layer 2: AGENT TEAMS (cross-session, experimental)
───────────────────────────────────────────────────
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
tmux -CC
|
v
┌──────────────┐
│ Team Lead │ <-- Opus, plans work, assigns tasks
└──────┬───────┘
|
┌────┼────┐
v v v
[T1] [T2] [T3] <-- Teammates (Sonnet/Haiku), each in tmux pane
| | |
v v v
Shared task list + mailbox system
Direct inter-agent messaging
Dependencies: task A blocks task B

Display: in-process OR tmux split panes
Quality gates: TeammateIdle, TaskCompleted hooks


Layer 3: AGENT SDK (programmatic)
─────────────────────────────────
from claude_code import Agent, AgentDefinition

agents = [
Agent("planner", model="opus", tools=[...]),
Agent("coder", model="sonnet", tools=[...]),
Agent("tester", model="haiku", tools=[...]),
]
results = await asyncio.gather(*[a.run(task) for a in agents])

Full control: hooks as callbacks, MCP, permissions, session resume


Layer 4: GIT WORKTREES (manual)
───────────────────────────────
claude -w feature-1 & claude -w feature-2 & claude -w feature-3
| | |
v v v
.worktrees/feature-1 .worktrees/feature-2 .worktrees/feature-3
(independent branch) (independent branch) (independent branch)

Human merges when done. No coordination.

Full reference | Agent Teams docs


3. Simon Willison -- Parallel Agents, Different Models

Scale: 2-3 research projects/day across multiple agents Business: Independent developer, creator of Datasette. No product to sell -- writes about what works.

  Simon (human)
|
├──> Claude Code (Sonnet 4.5) <-- primary terminal agent
├──> Codex CLI (GPT-5-Codex) <-- second terminal agent
├──> Claude Code for Web <-- async, fire-and-forget
├──> Codex Cloud <-- async
└──> Jules <-- async
|
v
Each in separate terminal / browser tab
Isolation: fresh /tmp checkouts per task
No coordination framework -- human is the router

── tools ──────────────────────────
llm CLI <-- logs everything to SQLite, analyzed via Datasette
files-to-prompt <-- convert repo files to LLM context
shot-scraper <-- automated screenshots for visual testing

Key concepts:

  • "Agents = models using tools in a loop" (his canonical definition, 211 competing definitions collected)
  • Vibe Engineering (not vibe coding): 12 practices including automated tests, git discipline, code review
  • Bottleneck is human review, not agent speed
  • Skills > MCP for simplicity and low token overhead
  • "Lethal trifecta" security model: private data + untrusted content + external communication = danger

Full reference | simonwillison.net


4. dmux -- Parallel Agents via tmux + Worktrees

Scale: N concurrent agents, git worktree isolation per pane Business: MIT, fully free. Creator (Justin Schroeder) monetizes FormKit Pro ($149-$1,250). Open source: github.com/standardagents/dmux

  dmux TUI
|
|──> press 'n'
|
v
┌─────────────────┐
│ AI-generate slug │ <-- OpenRouter (gpt-4o-mini)
└────────┬────────┘
|
v
┌─────────────────┐
│ Create git │ <-- .dmux/worktrees/<slug>/
│ worktree │ full independent working copy
└────────┬────────┘
|
v
┌─────────────────┐
│ Split tmux pane │
│ Launch agent │ <-- claude/codex/opencode (--acceptEdits)
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent works │ <-- status via LLM analysis of terminal (1s poll)
│ autonomously │
└────────┬────────┘
|
v
press 'm' to merge
|
v
┌─────────────────┐
│ AI commit msg │ <-- conventional commits via OpenRouter
│ Merge to main │
│ Remove worktree │
└─────────────────┘

Hooks: worktree_created, pre_merge, post_merge
A/B mode: two agents, same prompt, side-by-side
Web dashboard + REST API for programmatic control

Full reference | dmux.ai


5. OpenClaw -- Open-Source AI Agent Framework

Scale: 213K+ GitHub stars, 50+ integrations Business: MIT license, free to self-host. OpenClaw Cloud planned at $39/mo. Real cost: $5-30/mo in LLM API fees. Creator: Peter Steinberger (ex-PSPDFKit, acqui-hired by OpenAI Feb 2026)

  User prompt
|
v
┌─────────────────┐
│ OpenClaw gateway │ <-- local-first, 50+ integrations
│ (agent router) │ messaging, coding, browser, etc.
└────────┬────────┘
|
┌─────┼─────┐
v v v
[Sub-1][Sub-2][Sub-3] <-- sub-agent collaboration
| | | 40% accuracy boost vs monolithic prompting
└─────┼─────┘
|
v
┌─────────────────┐
│ Output │ <-- declarative agent config in YAML
└─────────────────┘

Not primarily a coding tool -- general-purpose AI assistant
Can run with local models (Ollama + Llama 3.3) for $0/mo
Will remain open source under OpenAI stewardship

Full reference | github.com/openclaw


6. Superconductor -- Parallel Cloud Agents with Live Previews

Scale: N agents per ticket, cloud sandboxes, live browser previews Business: Closed-source SaaS by Volition (Gradescope founders). BYOK model. Pricing undisclosed, early access.

  Create ticket (informal)
|
v
┌─────────────────┐
│ Launch N agents │ <-- each in isolated cloud container
│ on same ticket │ (Modal / Morph Cloud)
└────────┬────────┘
|
┌─────┼─────┐
v v v
[Agent1][Agent2][Agent3] <-- Claude/Codex/Amp/Gemini
| | |
v v v
[Live] [Live] [Live] <-- browser previews ~30s
[prev] [prev] [prev]
| | |
└─────┼─────┘
|
v
┌─────────────────┐
│ Compare previews │ <-- visual diff, interact with each
│ Select best │
│ One-click PR │
└─────────────────┘

Full reference | superconductor.com


7. 8090 Software Factory -- Enterprise Agent Platform

Scale: Multi-repo code modernization Business: Proprietary. $200/seat/mo (Team), custom Enterprise, managed delivery from $1M/yr. Funded by Chamath Palihapitiya personally.

  ┌─────────────────┐
│ Refinery │ <-- reverse-engineer codebase into knowledge graph
└────────┬────────┘
|
v
┌─────────────────┐
│ Planner │ <-- AI generates migration/transformation plans
└────────┬────────┘
|
v
┌─────────────────┐
│ Foundry │ <-- specialized agents execute plan
│ (agent workers) │ across multiple repos
└────────┬────────┘
|
v
┌─────────────────┐
│ Validator │ <-- quality gate, CI, tests
└────────┬────────┘
|
v
┌─────────────────┐
│ Factory Line │ <-- full pipeline for enterprise
│ output: PRs │ code modernization at scale
└─────────────────┘

Full reference | 8090.ai


8. Terragon -- Background Fire-and-Forget (SHUT DOWN)

Scale: ~30 concurrent tasks/day, auto-PR creation Business: SaaS subscription. Shut down Feb 9, 2026. Code released Apache-2.0. Why: Native background agents from Claude Code and Codex commoditized the orchestration layer.

  Create task (web / CLI / GitHub / Slack / mobile)
|
v
┌─────────────────┐
│ Cloud sandbox │ <-- isolated container, clone repo, create branch
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent works │ <-- background, checkpoints pushed to GitHub
│ autonomously │ AI-generated commits
└────────┬────────┘
|
v
┌─────────────────┐
│ PR created │ <-- automatic when done
└────────┬────────┘
|
v
Human reviews and merges

DEAD: Codex reached 28% agent usage on Terragon within 1 month
Native background agents made the wrapper unnecessary

Full reference | terragon-labs/terragon-oss


9. Vadim Strizheus -- "AI Employees" for VugolaAI

Scale: Claims 14 AI employees, 95% automated Business: VugolaAI (video clipping/scheduling SaaS). Free tier. Solana token (VGLA).

  Long-form video input
|
v
┌─────────────────┐
│ AI Moment │ <-- "AI employee" 1: detect viral-worthy segments
│ Detection │
└────────┬────────┘
|
v
┌─────────────────┐
│ Auto-Clipping │ <-- "AI employee" 2-N: extract, reframe, caption
│ + Captioning │
└────────┬────────┘
|
v
┌─────────────────┐
│ Branding + │ <-- template application
│ Formatting │
└────────┬────────┘
|
v
┌─────────────────┐
│ Multi-Platform │ <-- TikTok, YouTube, Instagram, X, LinkedIn
│ Scheduling │
└─────────────────┘

Note: Specific agent breakdown from video tweet, not independently verified.
The product itself IS the AI automation -- "employees" = AI pipeline stages.

Full reference | @VadimStrizheus


10. Notable Voices

Francois Chollet (@fchollet)

"Sufficiently advanced agentic coding is essentially machine learning"

Does NOT run a multi-agent setup. Warns about maintaining "sprawling mess of AI-generated legacy code." Useful contrarian check.

Andrej Karpathy

Coined "vibe coding" (Feb 2025), then abandoned it for "agentic engineering" (Feb 2026). Evolution: accept all AI output → require specs, review, test suites.

Addy Osmani

Defined Conductor (sequential) vs Orchestrator (parallel) agent frameworks. Identified the "80% problem" -- last 20% takes as long as first 80%.


Comparison Matrix

SystemTypeOpen SourcePricingAgentsKey Feature
Boris ChernyIndividual workflowN/A (uses Claude Code)$200/mo Max10-15 parallel CCTeleport between devices
Claude Code TeamsBuilt-inN/A (product feature)$200/mo Max or APIN (tmux panes)Shared task list + mailbox
Claude Agent SDKLibraryMITAPI usageProgrammaticFull orchestration control
Simon WillisonIndividual workflowN/AMulti-subscriptionCC + Codex + asyncHuman as router
dmuxOSS toolMITFreeN (tmux + worktrees)A/B agent comparison
OpenClawOSS frameworkMITFree / $39 CloudSub-agents213K stars, joined OpenAI
SuperconductorSaaSNoUndisclosed (BYOK)N per ticketLive browser previews
8090EnterpriseNo$200/seat/mo+Factory LineKnowledge graph + modernization
TerragonSaaS (dead)Apache-2.0 (post-shutdown)Was subscriptionBackground agentsShut down Feb 2026
VugolaAIProductNoFree tier14 "AI employees"Video pipeline automation

Common Patterns

What works across all setups:

1. ISOLATION -- worktrees, containers, or separate sessions
agents must not conflict with each other

2. PLAN FIRST -- Opus/expensive model plans, cheaper model executes
Boris: Plan Mode → auto-accept
Agent Teams: team lead plans, teammates execute

3. MEMORY -- CLAUDE.md / AGENTS.md / progress.txt
errors documented so they never repeat
updated by the agent, not the human

4. VERIFICATION -- automated tests, browser screenshots, self-review
humans review throughput, not individual lines

5. MODEL TIERING -- Opus for planning ($$$), Sonnet for coding ($$), Haiku for tests ($)
"correct answer costs less total iteration time than fast wrong ones"

What doesn't work:

1. NO TESTS -- agents spiral without verification signals
2. NO MEMORY -- same mistakes repeat across sessions
3. SHARED STATE -- agents editing same files = merge hell
4. NO REVIEW -- "vibe coding" produces unmaintainable code (Chollet, Karpathy)

Business Model Summary

  FREE / OSS:
dmux (MIT) -- monetizes separately via FormKit Pro
OpenClaw (MIT) -- Cloud tier planned $39/mo, creator joined OpenAI
claude-flow (MIT) -- reputation/consulting play
Ralph/Compound (MIT) -- promotes Amp (Sourcegraph)
Terragon (Apache-2.0) -- released on shutdown

SAAS / COMMERCIAL:
Superconductor -- BYOK, undisclosed platform fee, early access
8090 -- $200/seat/mo, $1M/yr managed delivery
VugolaAI -- free tier + crypto token (VGLA)

PLATFORM:
Claude Code -- $200/mo Max plan or API usage (~$1B ARR)
OpenAI Codex -- subscription + API
GitHub Agent HQ -- Copilot subscription (multi-vendor agents)

The trend: orchestration tools struggle to monetize when platforms
add native multi-agent features (see: Terragon shutdown).
Survivors either go enterprise (8090) or stay free and build community (dmux, OpenClaw).