Skip to main content
Unlisted page
This page is unlisted. Search engines will not index it, and only users having a direct link can access it.

Popular Multi-Agent Coding Team Setups

· 19 min read

A practical reference of real-world multi-agent coding setups used by prominent developers and projects. Last updated: 2026-02-20.


Table of Contents

  1. Boris Cherny (@bcherny) -- Creator of Claude Code
  2. Claude Code Agent Teams (Opus/Sonnet/Haiku Pipeline)
  3. GitHub Agent HQ / Mission Control
  4. OpenClaw
  5. Francois Chollet (@fchollet)
  6. Simon Willison -- Parallel Coding Agents
  7. Andrej Karpathy -- From Vibe Coding to Agentic Engineering
  8. Addy Osmani -- Conductor to Orchestrator Model
  9. OpenAI Codex Multi-Agent Workflows
  10. Community Tools: NTM, Claude Squad, Claude-Flow
  11. Devin (Cognition AI)

1. Boris Cherny (@bcherny) -- Creator of Claude Code

What: The person who created Claude Code as a side project in September 2024, now head of Claude Code at Anthropic. Runs 10-15 concurrent Claude Code sessions to ship all his code with zero manual editing.

Setup

  • 5 parallel terminal sessions: Numbered iTerm tabs (1-5), each running Claude Code with OS-level notifications enabled so he knows when a session needs input.
  • 5-10 browser sessions: Running on claude.ai/code across Chrome and iOS. He starts mobile sessions in the morning and checks on them later.
  • "Teleport" command: Hands off sessions between the web interface and his local machine, bridging browser and terminal workflows.
  • Total: ~10-15 concurrent Claude Code sessions alive at any time.

Model

  • Opus 4.5 with extended thinking for everything. His reasoning: a single correct answer costs less total iteration time than multiple fast but flawed outputs requiring human correction.

Coordination

  • CLAUDE.md (not AGENTS.md): A single shared file checked into the git repo root, updated multiple times weekly. When Claude makes an error, the team documents it so it never repeats. Each team at Anthropic owns maintaining their version.
  • Code review as meta-process: During PR reviews, team members tag @.claude to add learnings back into CLAUDE.md using the Claude Code GitHub Action. Review improves the system itself.
  • /permissions: Pre-allows common safe bash commands, shared across teams, so correct behavior is the default without skipping security.
  • Slash commands in .claude/commands/: Include inline bash for data precomputation. Example: /commit-push-pr. Subagents like code-simplifier and verify-app encapsulate reusable workflows.
  • Plan Mode first: Iterates on the plan (Shift+Tab twice) until both human and Claude agree on scope. Then switches to auto-accept for single-burst PR completion.
  • PostToolUse hooks: Automated formatting hooks fix the last 10% of code quality issues, preventing CI failures from style drift.
  • Verification loops: Claude verifies its own work by opening browsers via the Claude Chrome extension, testing UI flows, and iterating until functional.

Results

  • 259 PRs in 30 days: 497 commits, 40k lines added, 38k lines removed. Every single line written by Claude Code + Opus 4.5.
  • 22 PRs in one day, 27 the next: Each 100% written by Claude.
  • 100% AI-written code for 2+ months: "I don't even make small edits by hand."
  • Usage stats: ~325 million tokens across 1.6k sessions. Longest single session: nearly 2 days (using Stop hooks for long-running tasks).
  • ~90% of Claude Code's own codebase is written by Claude Code itself.

Sources


2. Claude Code Agent Teams (Opus/Sonnet/Haiku Pipeline)

What: Claude Code's built-in multi-agent feature (experimental as of Feb 2026) that spawns multiple Claude Code instances coordinated through tmux, with different models assigned to different roles.

Setup

Enable the feature:

export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

Or add "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" to ~/.claude/settings.json.

tmux requirement: Use tmux -CC (control mode) so iTerm2 maps tmux panes to native tabs/windows. Without -CC, agents won't spawn in separate panes.

Architecture

  • Team Lead: Your main Claude Code session. Creates and coordinates the team.
  • Teammates: Separate Claude Code instances, each with its own context window, running in individual tmux panes.
  • Shared Task List: Central work queue visible to all agents. Tasks have states: pending, in-progress, completed.
  • Mailbox System: Direct messaging between team members. Teammates can message each other directly, not just through the lead.

Each teammate loads the same project context (CLAUDE.md, MCP servers, skills), but does NOT inherit the lead's conversation history.

Model Routing

The pipeline pattern that has emerged in the community:

RoleModelRationale
Planning / architectureOpusComplex reasoning, system design
Code generation / implementationSonnetFast, capable, cost-effective
Testing / documentation / lintingHaikuLightweight, cheap, sufficient

You specify model routing in the prompt: "The team leader uses Opus, teammates use Sonnet" or "Use Haiku for documentation tasks."

The opusplan model alias automates this: uses Opus in plan mode, switches to Sonnet in execution mode.

Coordination in Practice

  • One session becomes the team lead; others become independent contributors.
  • Agents work in isolated contexts but communicate results across the team.
  • Results can consolidate into a single markdown file (e.g., holistic.md).
  • Agents do NOT modify source code by default in analysis tasks.
  • 3-teammate team typically uses 3-4x tokens vs sequential single-session work.

Real-World Example (Dariusz Parys)

Spawned 3 agents for codebase analysis:

  • Agent 1 (Opus): Prompt analysis and optimization
  • Agent 2 (Sonnet): Code review
  • Agent 3 (Haiku): Documentation consistency checking

Completed in ~13 minutes. All results written to a shared holistic.md.

Sources


3. GitHub Agent HQ / Mission Control

What: GitHub's platform for orchestrating and managing AI agents from multiple vendors (Anthropic, OpenAI, Google, Cognition, xAI) directly within GitHub. Announced at GitHub Universe on October 28, 2025.

Setup

  • Mission Control Dashboard: A unified command center that appears across GitHub.com, VS Code, mobile apps, and CLI.
  • Multi-vendor: Runs agents from Anthropic, OpenAI, Google, xAI, Cognition, and more in a single neutral control plane inside the repository.
  • Copilot subscription required: Available to GitHub Copilot subscribers.

Architecture

  • Task assignment: Assign work to multiple agents simultaneously from Mission Control.
  • Live agent threads: Dashboard shows diffs, logs, and rationales per agent.
  • Dependency graph: Visualizes relationships (e.g., "Tests waiting on build artifacts").
  • File-scope locking: When agents begin edits, the scheduler locks file scopes to prevent conflicts. Agents share an up-to-date semantic map.
  • Parallel execution: ~16.8% faster than sequential in tests, thanks to file-scope locking and shared semantic map.

Coordination

For a given "mission," the platform can spin up:

  • A build agent
  • A refactor agent
  • A test agent

These coordinate to avoid touching the same files simultaneously. The system orchestrates planner, code, test, docs, and release agents with shared context, guardrails, and approvals -- posting diffs, logs, and artifacts on a single timeline.

Enterprise Governance

  • Branch-level access compartmentalization
  • Sandboxed GitHub Actions environments with firewall protections
  • Strict identity controls and audit logging
  • IT administrators set policies governing agent access to resources
  • Granular controls at the platform level (vs. standalone agents needing broad repo access)

Key Differentiator

GitHub's existing primitives stay intact -- developers still use Git, PRs, and issues. Agents operate within GitHub's existing security perimeter using the same branch permissions and audit logging.

Sources


4. OpenClaw

What: Open-source "operating system for AI agents" created by Peter Steinberger in November 2025. Originally named Clawdbot, renamed to Moltbot (Jan 27, 2026) after Anthropic trademark complaints, then OpenClaw (Jan 30, 2026). Grew to 180k+ GitHub stars in 8 weeks. Steinberger was acqui-hired by OpenAI in February 2026.

Setup

  • Gateway: A local-first control plane for sessions, channels, tools, and events. Conversation history, tool execution, session state, and orchestration logic stay on the user's infrastructure; model API calls go to the cloud.
  • Multi-model: Supports Claude, DeepSeek, OpenAI GPT models, and others.
  • 50+ integrations: Chat providers (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix), productivity tools, smart home devices, automation tools.

Architecture

  • Agent orchestration: Run multiple isolated AI agents in one OpenClaw gateway, each with its own workspace, auth profile, channel binding.
  • Sub-agent collaboration: Specialized sub-agents (e.g., "researcher" and "writer") collaborate. Internal benchmarks show this modular approach boosts accuracy by 40% vs monolithic prompting.
  • Declarative configuration: Define agent roles, connect them to tools/APIs, manage lifecycle via config files.
  • Session isolation: Each agent gets its own session state, tool access, and channel binding.

Key Distinction

OpenClaw treats AI as an infrastructure problem -- sessions, memory, tool sandboxing, access control, and orchestration. It's not an IDE plugin; it's a control plane that works across messaging platforms.

Current Status

OpenClaw continues as an independent open-source foundation supported by OpenAI, maintaining multi-model compatibility. Steinberger joined OpenAI to lead autonomous AI agent development.

Sources


5. Francois Chollet (@fchollet)

What: Creator of Keras, former Google Senior Staff Engineer. His views on AI coding are more skeptical and nuanced than the hype. Left Google in Nov 2024 to co-found Ndea, an AGI startup focused on program synthesis.

His Position on AI-Assisted Coding

Chollet does not run a multi-agent coding setup. His contribution to this space is a contrarian analytical framework that is worth understanding:

  • "Software engineers shouldn't fear being replaced by AI. They should fear being asked to maintain the sprawling mess of AI-generated legacy code their employer's systems will soon run on. Because that one will actually happen." -- This is his core critique. The problem isn't code generation; it's the maintenance debt of AI-generated code.

  • "Code is largely worthless, more of a liability than an asset. Problem-solving is where the value is." -- He sees code as a byproduct, not the goal. This aligns with his view that AI coding tools solve the easy part (writing code) while missing the hard part (understanding what to build and why).

  • Program synthesis over brute-force scaling: His startup Ndea and the ARC Prize foundation focus on systems that can generalize from few examples rather than memorizing patterns. He sees current LLM-based coding as pattern matching, not genuine problem-solving.

Relevance to Agent Teams

Chollet's framework suggests a quality check for multi-agent setups:

  • Are your agents just generating more code faster, or are they solving problems?
  • Is the velocity gain real, or are you accumulating maintenance debt?
  • Does your setup include mechanisms to prevent the "sprawling mess" he warns about?

Sources


6. Simon Willison -- Parallel Coding Agents

What: Prolific open-source developer (creator of Datasette, django co-creator) who runs multiple different coding agents in parallel across different tools and models, documenting everything publicly.

Setup

  • Claude Code on Sonnet 4.5
  • Codex CLI on GPT-5-Codex
  • Codex Cloud for asynchronous tasks (can launch from phone)
  • Also experiments with GitHub Copilot Coding Agent and Google Jules.

Coordination

  • Multiple terminal windows: Each running a different agent in a different directory.
  • YOLO mode (no approvals): Used for tasks where malicious instructions can't sneak into the context (mostly open-source work).
  • Repository isolation: Creates fresh checkouts into /tmp when running two agents against the same repo. Does not use git worktrees.
  • Asynchronous agents: Uses Codex Cloud with network access enabled for riskier tasks, since his work is mostly open source anyway.

Task Categories for Parallel Agents

  1. Research and POCs -- testing new libraries and approaches.
  2. System documentation -- understanding existing codebase sections.
  3. Low-stakes maintenance -- fixing deprecation warnings, resolving minor issues.
  4. Specified work -- carefully detailed assignments requiring less review overhead.

Key Insight

The bottleneck is human review capacity, not agent speed. "Code that started from your own specification is a lot less effort to review" -- detailed prompting reduces cognitive overhead during validation.

Sub-Agent Usage

Willison demonstrated that you can deliberately trigger Claude Code sub-agents just by telling it to do so. Example: "Use sub-agents to write markdown documentation for the context passed to each of the templates in this project."

Anthropic's own research showed a multi-agent system with Opus 4 as lead and Sonnet 4 subagents outperformed single-agent Opus 4 by 90.2% on internal research evals.

Sources


7. Andrej Karpathy -- From Vibe Coding to Agentic Engineering

What: Former OpenAI/Tesla AI lead. Coined "vibe coding" in February 2025. By February 2026 declared it passe and evolved his thinking toward "agentic engineering."

Vibe Coding (Feb 2025)

  • "Fully give in to the vibes, embrace exponentials, and forget that the code even exists."
  • Used Cursor Composer with SuperWhisper (voice transcription) to minimize keyboard use.
  • Accepted all AI-generated code changes without reviewing diffs.
  • Pasted error messages directly back to the AI for resolution.
  • Let the codebase grow organically.

Agentic Engineering (Feb 2026)

  • "Programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny."
  • Key differences from vibe coding:
    1. Start with a plan: Write a design doc or spec before prompting anything. Break work into well-defined tasks.
    2. Review rigorously: Review AI-generated code with the same rigor as a human teammate's PR.
    3. Testing is the differentiator: A solid test suite lets AI agents iterate in a loop until tests pass, giving high confidence.
    4. Delegation and supervision: Define goals, constraints, quality criteria, and workflows. AI agents work toward high-level targets under human direction.

Key Distinction

Karpathy's evolution captures the broader industry shift: from treating AI coding as magic ("vibes") to treating it as a managed workforce (delegation + verification). The agents themselves haven't changed as much as the engineering discipline around them.

Sources


8. Addy Osmani -- Conductor to Orchestrator Model

What: Google Chrome engineering lead. Wrote the most cited framework for thinking about multi-agent coding workflows, distinguishing between "conductor" (sequential) and "orchestrator" (parallel) patterns.

The Framework

Conductor pattern (what most people do today):

  • Implement each step sequentially through a single agent.
  • You are the conductor, directing one agent at a time.

Orchestrator pattern (where things are heading):

  • For a feature touching frontend, backend, and tests:
    • Assign backend implementation to Agent 1
    • Assign frontend UI changes to Agent 2
    • Assign test creation to Agent 3
    • Step back; receive 3 PRs to review and integrate.

Practical Requirements

  • Isolated git worktrees: Each agent needs its own worktree to avoid conflicts. Tools like Conductor provide this automatically.
  • Dashboard visibility: See all agents and their work in one place.
  • Testing is the single biggest differentiator: With a strong test suite, AI agents iterate until tests pass.
  • Planning before execution: Write design docs / specs. Break work into clearly scoped tasks. Then assign to agents.

The 80% Problem

Osmani identified that AI agents can get you 80% of the way there, but the last 20% (edge cases, integration, polish) often takes as long as the first 80%. Multi-agent setups help because different agents can specialize in different parts of that last 20%.

Sources


9. OpenAI Codex Multi-Agent Workflows

What: OpenAI's cloud-based coding agent with built-in multi-agent support. Released as the Codex app on February 2, 2026. Over 1 million developers used it in the past month. Uses GPT-5.2-Codex model (released Dec 2025).

Setup

Enable multi-agent from the CLI with /experimental. Configure agent roles in ~/.codex/config.toml or project-specific .codex/config.toml.

Architecture

  • Specialized agents in parallel: Define agents with different model configurations and instructions depending on role.
  • MCP server: Exposes the CLI as a Model Context Protocol server, orchestrated with the OpenAI Agents SDK.
  • Role-based config: Each role provides guidance for when Codex should use that agent, and optionally loads a role-specific config file.

Coordination Example

With design specs available:

  1. Hand off in parallel to Frontend Developer agent and Backend Developer agent.
  2. Each operates in its own sandbox.
  3. Results collected and merged in one response.

Good for highly parallel tasks: codebase exploration, implementing multi-step feature plans, testing, codebase modernization.

Sources


10. Community Tools: NTM, Claude Squad, Claude-Flow

Named Tmux Manager (NTM)

Author: Jeff Emanuel (Dicklesworthstone on GitHub)

A tmux-based command center for running multiple AI coding agents (Claude, Codex, Gemini) across tiled panes.

Key features:

  • One session, many agents in named panes.
  • Broadcast prompts to specific agent types with one command.
  • Persistent sessions that survive SSH disconnections.
  • Quick project setup: ntm quick myproject --template=go then ntm spawn myproject --cc=3 --cod=2 --gmi=1 creates 6 AI agents in tiled panes.
  • Part of a broader 13-tool ecosystem for agent coordination, work tracking, session search, and safety.

Source: NTM on GitHub

Claude Squad

A terminal app for managing multiple Claude Code, Aider, Codex, OpenCode, and Amp instances in separate workspaces.

Key features:

  • Each task gets its own isolated git workspace (no conflicts).
  • Manage all instances and tasks in one terminal window.
  • Review changes before applying them.
  • Install: brew install smtg-ai/tap/claude-squad (runs as cs).

Source: Claude Squad on GitHub

Claude-Flow (by Reuven Cohen / ruvnet)

The leading agent orchestration platform for Claude Code. V3 released January 2026; ~100,000 monthly active users across 80+ countries.

Key features:

  • 60+ specialized agents ready-to-use (coding, review, testing, security, docs, DevOps).
  • Agents spawn sub-workers, communicate, share context, divide work using hierarchical or mesh patterns.
  • Shared memory, consensus, and continuous learning.
  • Native Claude Code support via MCP protocol.
  • Integrates with Claude Code's experimental Agent Teams feature.
  • 250,000+ lines of TypeScript and WASM.

Source: Claude-Flow on GitHub


11. Devin (Cognition AI)

What: Commercial AI software engineer with its own sandboxed cloud environment (shell, code editor, browser). Multi-agent dispatch capability added in later revisions.

Setup

  • Each Devin session runs in its own sandboxed cloud IDE.
  • You can spin up multiple parallel Devins, each with its own environment.
  • Automatically indexes repositories every couple hours, creating detailed wikis with architecture diagrams (DeepWiki).

Multi-Agent Coordination

  • One AI agent can dispatch tasks to other AI agents.
  • Self-assessed confidence evaluation -- asks for clarification when not confident enough.
  • API available for integration with custom workflows and other agents.
  • Autonomous agents can trigger automatically in response to specific events.

Enterprise Usage

Goldman Sachs deployed Devin as "Employee #1" in their hybrid workforce, demonstrating enterprise-scale adoption of autonomous coding agents.

Sources


Common Patterns Across All Setups

What Works

  1. Planning before execution: Every successful setup emphasizes writing specs/plans before letting agents code. Boris Cherny iterates in Plan Mode. Karpathy says write a design doc first. Osmani says break into scoped tasks.

  2. Institutional memory via markdown files: CLAUDE.md, AGENTS.md, or equivalent files checked into git. Updated continuously. Mistakes become instructions. This is the single most impactful practice.

  3. Verification loops: Agents test their own output. The stronger the test suite, the more autonomous the agents can be. Testing is the "single biggest differentiator" (Karpathy, Osmani).

  4. Cost-tiered model routing: Opus/expensive models for planning and reasoning. Sonnet/mid-tier for implementation. Haiku/cheap for tests, docs, linting. Don't waste expensive tokens on cheap tasks.

  5. Isolation: Each agent gets its own git worktree, workspace, or tmux pane. File-scope locking when agents edit the same repo. Conflicts between agents are the number one failure mode.

  6. Human review remains the bottleneck: No matter how many agents you run, you still need to review the output. Detailed specs reduce review burden. The best setups optimize for reviewability, not just throughput.

What Doesn't Work

  1. One agent doing everything: Single-agent monolithic approaches consistently underperform specialized multi-agent setups (Anthropic's own research shows 90% improvement with multi-agent).

  2. No test suite: Without tests, agents can't verify their own work, and humans can't verify agent output at scale.

  3. No memory between sessions: Agents that forget everything between sessions repeat the same mistakes. CLAUDE.md / persistent context is essential.

  4. Uncontrolled parallel edits: Multiple agents editing the same files without coordination leads to conflicts and wasted work. File-scope locking or workspace isolation is required.

  5. Accepting all output without review: Karpathy's original "vibe coding" approach was explicitly abandoned in favor of rigorous review. The novelty wears off; the tech debt doesn't.


Note on @GrIlm14

The specific Twitter/X account @GrIlm14 referenced by the user was not directly found in web search results. The Opus-plans/Sonnet-codes/Haiku-tests pipeline pattern with bash+tmux+markdown coordination is a widely documented community pattern (see Section 2 above), with implementations by Dariusz Parys, the Claude Code official docs, and various community tools. The pattern may have originated or been popularized by this account, but the specific attribution could not be verified from available sources.