2 posts tagged with "ai-agents"

Agent Teams Report

February 20, 2026 · 10 min read

A survey of how individuals and teams are running multi-agent coding setups (Feb 2026).

1. Boris Cherny -- Creator of Claude Code, Head of Claude Code @ Anthropic

Scale: 10-15 concurrent sessions, 20-27 PRs/day, 100% AI-written code Business: Employee at Anthropic. Claude Code ~$1B annualized revenue in 6 months.

  Boris (human)
      |
      |──> 5 terminal tabs (iTerm, OS notifications)
      |──> 5-10 browser sessions (claude.ai/code)
      |──> mobile sessions (fire-and-forget)
      |
      v
  Each session = independent Claude Code instance
      |
      |── Model: Opus 4.5 + extended thinking (always)
      |── CLAUDE.md: shared knowledge base (updated weekly)
      |── Plan Mode first, then auto-accept
      |
      v
  ┌───────────────┐
  │ PostToolUse    │  <-- formatting hooks fix style drift
  │ hooks          │
  └───────┬───────┘
          |
          v
  ┌───────────────┐
  │ Verification   │  <-- Chrome extension, agent self-tests
  │ loops          │
  └───────┬───────┘
          |
          v
  ┌───────────────┐
  │ PR             │  <-- /commit-push-pr slash command
  └───────────────┘

  "Teleport" hands sessions between terminal ↔ browser ↔ mobile

Key practices:

CLAUDE.md (not AGENTS.md) as living knowledge base -- errors get documented so they never repeat
/permissions pre-allows safe bash commands
Subagents: code-simplifier, verify-app
259 PRs in 30 days. 90% of Claude Code's own codebase written by Claude Code.

Full reference | Boris's Twitter thread

2. Claude Code Native Multi-Agent -- Four Layers

Status: Subagents + SDK stable, Agent Teams experimental Business: Part of Claude Code ($200/mo Max plan, or API usage)

  Layer 1: SUBAGENTS (in-session)
  ────────────────────────────────
  Parent agent
      |
      |── Task("Explore", "find all API routes")  <-- Haiku, read-only
      |── Task("code-reviewer", "review changes")  <-- custom .claude/agents/*.md
      |── Task("general", "refactor auth")         <-- background, full tools
      |
      v
  Results summarized back to parent
  Subagents CANNOT talk to each other


  Layer 2: AGENT TEAMS (cross-session, experimental)
  ───────────────────────────────────────────────────
  CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
  tmux -CC
      |
      v
  ┌──────────────┐
  │  Team Lead    │  <-- Opus, plans work, assigns tasks
  └──────┬───────┘
         |
    ┌────┼────┐
    v    v    v
  [T1] [T2] [T3]   <-- Teammates (Sonnet/Haiku), each in tmux pane
    |    |    |
    v    v    v
  Shared task list + mailbox system
  Direct inter-agent messaging
  Dependencies: task A blocks task B

  Display: in-process OR tmux split panes
  Quality gates: TeammateIdle, TaskCompleted hooks


  Layer 3: AGENT SDK (programmatic)
  ─────────────────────────────────
  from claude_code import Agent, AgentDefinition

  agents = [
      Agent("planner", model="opus", tools=[...]),
      Agent("coder",   model="sonnet", tools=[...]),
      Agent("tester",  model="haiku",  tools=[...]),
  ]
  results = await asyncio.gather(*[a.run(task) for a in agents])

  Full control: hooks as callbacks, MCP, permissions, session resume


  Layer 4: GIT WORKTREES (manual)
  ───────────────────────────────
  claude -w feature-1  &  claude -w feature-2  &  claude -w feature-3
      |                       |                       |
      v                       v                       v
  .worktrees/feature-1   .worktrees/feature-2   .worktrees/feature-3
  (independent branch)   (independent branch)   (independent branch)

  Human merges when done. No coordination.

Full reference | Agent Teams docs

3. Simon Willison -- Parallel Agents, Different Models

Scale: 2-3 research projects/day across multiple agents Business: Independent developer, creator of Datasette. No product to sell -- writes about what works.

  Simon (human)
      |
      ├──> Claude Code (Sonnet 4.5)    <-- primary terminal agent
      ├──> Codex CLI (GPT-5-Codex)     <-- second terminal agent
      ├──> Claude Code for Web          <-- async, fire-and-forget
      ├──> Codex Cloud                  <-- async
      └──> Jules                        <-- async
           |
           v
      Each in separate terminal / browser tab
      Isolation: fresh /tmp checkouts per task
      No coordination framework -- human is the router

  ── tools ──────────────────────────
  llm CLI           <-- logs everything to SQLite, analyzed via Datasette
  files-to-prompt   <-- convert repo files to LLM context
  shot-scraper      <-- automated screenshots for visual testing

Key concepts:

"Agents = models using tools in a loop" (his canonical definition, 211 competing definitions collected)
Vibe Engineering (not vibe coding): 12 practices including automated tests, git discipline, code review
Bottleneck is human review, not agent speed
Skills > MCP for simplicity and low token overhead
"Lethal trifecta" security model: private data + untrusted content + external communication = danger

Full reference | simonwillison.net

4. dmux -- Parallel Agents via tmux + Worktrees

Scale: N concurrent agents, git worktree isolation per pane Business: MIT, fully free. Creator (Justin Schroeder) monetizes FormKit Pro ($149-$1,250). Open source: github.com/standardagents/dmux

  dmux TUI
    |
    |──> press 'n'
    |
    v
  ┌─────────────────┐
  │ AI-generate slug │  <-- OpenRouter (gpt-4o-mini)
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Create git       │  <-- .dmux/worktrees/<slug>/
  │ worktree         │      full independent working copy
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Split tmux pane  │
  │ Launch agent     │  <-- claude/codex/opencode (--acceptEdits)
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Agent works      │  <-- status via LLM analysis of terminal (1s poll)
  │ autonomously     │
  └────────┬────────┘
           |
           v
  press 'm' to merge
           |
           v
  ┌─────────────────┐
  │ AI commit msg    │  <-- conventional commits via OpenRouter
  │ Merge to main    │
  │ Remove worktree  │
  └─────────────────┘

  Hooks: worktree_created, pre_merge, post_merge
  A/B mode: two agents, same prompt, side-by-side
  Web dashboard + REST API for programmatic control

Full reference | dmux.ai

5. OpenClaw -- Open-Source AI Agent Framework

Scale: 213K+ GitHub stars, 50+ integrations Business: MIT license, free to self-host. OpenClaw Cloud planned at $39/mo. Real cost: $5-30/mo in LLM API fees. Creator: Peter Steinberger (ex-PSPDFKit, acqui-hired by OpenAI Feb 2026)

  User prompt
      |
      v
  ┌─────────────────┐
  │ OpenClaw gateway │  <-- local-first, 50+ integrations
  │ (agent router)   │      messaging, coding, browser, etc.
  └────────┬────────┘
           |
     ┌─────┼─────┐
     v     v     v
  [Sub-1][Sub-2][Sub-3]  <-- sub-agent collaboration
     |     |     |         40% accuracy boost vs monolithic prompting
     └─────┼─────┘
           |
           v
  ┌─────────────────┐
  │ Output           │  <-- declarative agent config in YAML
  └─────────────────┘

  Not primarily a coding tool -- general-purpose AI assistant
  Can run with local models (Ollama + Llama 3.3) for $0/mo
  Will remain open source under OpenAI stewardship

Full reference | github.com/openclaw

6. Superconductor -- Parallel Cloud Agents with Live Previews

Scale: N agents per ticket, cloud sandboxes, live browser previews Business: Closed-source SaaS by Volition (Gradescope founders). BYOK model. Pricing undisclosed, early access.

  Create ticket (informal)
      |
      v
  ┌─────────────────┐
  │ Launch N agents  │  <-- each in isolated cloud container
  │ on same ticket   │      (Modal / Morph Cloud)
  └────────┬────────┘
           |
     ┌─────┼─────┐
     v     v     v
  [Agent1][Agent2][Agent3]   <-- Claude/Codex/Amp/Gemini
     |     |     |
     v     v     v
  [Live] [Live] [Live]      <-- browser previews ~30s
  [prev] [prev] [prev]
     |     |     |
     └─────┼─────┘
           |
           v
  ┌─────────────────┐
  │ Compare previews │  <-- visual diff, interact with each
  │ Select best      │
  │ One-click PR     │
  └─────────────────┘

Full reference | superconductor.com

7. 8090 Software Factory -- Enterprise Agent Platform

Scale: Multi-repo code modernization Business: Proprietary. $200/seat/mo (Team), custom Enterprise, managed delivery from $1M/yr. Funded by Chamath Palihapitiya personally.

  ┌─────────────────┐
  │ Refinery         │  <-- reverse-engineer codebase into knowledge graph
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Planner          │  <-- AI generates migration/transformation plans
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Foundry          │  <-- specialized agents execute plan
  │ (agent workers)  │      across multiple repos
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Validator        │  <-- quality gate, CI, tests
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Factory Line     │  <-- full pipeline for enterprise
  │ output: PRs      │      code modernization at scale
  └─────────────────┘

Full reference | 8090.ai

8. Terragon -- Background Fire-and-Forget (SHUT DOWN)

Scale: ~30 concurrent tasks/day, auto-PR creation Business: SaaS subscription. Shut down Feb 9, 2026. Code released Apache-2.0. Why: Native background agents from Claude Code and Codex commoditized the orchestration layer.

  Create task (web / CLI / GitHub / Slack / mobile)
      |
      v
  ┌─────────────────┐
  │ Cloud sandbox    │  <-- isolated container, clone repo, create branch
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Agent works      │  <-- background, checkpoints pushed to GitHub
  │ autonomously     │      AI-generated commits
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ PR created       │  <-- automatic when done
  └────────┬────────┘
           |
           v
  Human reviews and merges

  DEAD: Codex reached 28% agent usage on Terragon within 1 month
        Native background agents made the wrapper unnecessary

Full reference | terragon-labs/terragon-oss

9. Vadim Strizheus -- "AI Employees" for VugolaAI

Scale: Claims 14 AI employees, 95% automated Business: VugolaAI (video clipping/scheduling SaaS). Free tier. Solana token (VGLA).

  Long-form video input
      |
      v
  ┌─────────────────┐
  │ AI Moment        │  <-- "AI employee" 1: detect viral-worthy segments
  │ Detection        │
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Auto-Clipping    │  <-- "AI employee" 2-N: extract, reframe, caption
  │ + Captioning     │
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Branding +       │  <-- template application
  │ Formatting       │
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Multi-Platform   │  <-- TikTok, YouTube, Instagram, X, LinkedIn
  │ Scheduling       │
  └─────────────────┘

  Note: Specific agent breakdown from video tweet, not independently verified.
  The product itself IS the AI automation -- "employees" = AI pipeline stages.

Full reference | @VadimStrizheus

10. Notable Voices

Francois Chollet (@fchollet)

"Sufficiently advanced agentic coding is essentially machine learning"

Does NOT run a multi-agent setup. Warns about maintaining "sprawling mess of AI-generated legacy code." Useful contrarian check.

Andrej Karpathy

Coined "vibe coding" (Feb 2025), then abandoned it for "agentic engineering" (Feb 2026). Evolution: accept all AI output → require specs, review, test suites.

Addy Osmani

Defined Conductor (sequential) vs Orchestrator (parallel) agent frameworks. Identified the "80% problem" -- last 20% takes as long as first 80%.

Comparison Matrix

System	Type	Open Source	Pricing	Agents	Key Feature
Boris Cherny	Individual workflow	N/A (uses Claude Code)	$200/mo Max	10-15 parallel CC	Teleport between devices
Claude Code Teams	Built-in	N/A (product feature)	$200/mo Max or API	N (tmux panes)	Shared task list + mailbox
Claude Agent SDK	Library	MIT	API usage	Programmatic	Full orchestration control
Simon Willison	Individual workflow	N/A	Multi-subscription	CC + Codex + async	Human as router
dmux	OSS tool	MIT	Free	N (tmux + worktrees)	A/B agent comparison
OpenClaw	OSS framework	MIT	Free / $39 Cloud	Sub-agents	213K stars, joined OpenAI
Superconductor	SaaS	No	Undisclosed (BYOK)	N per ticket	Live browser previews
8090	Enterprise	No	$200/seat/mo+	Factory Line	Knowledge graph + modernization
Terragon	SaaS (dead)	Apache-2.0 (post-shutdown)	Was subscription	Background agents	Shut down Feb 2026
VugolaAI	Product	No	Free tier	14 "AI employees"	Video pipeline automation

Common Patterns

What works across all setups:

1. ISOLATION     -- worktrees, containers, or separate sessions
                    agents must not conflict with each other

2. PLAN FIRST    -- Opus/expensive model plans, cheaper model executes
                    Boris: Plan Mode → auto-accept
                    Agent Teams: team lead plans, teammates execute

3. MEMORY        -- CLAUDE.md / AGENTS.md / progress.txt
                    errors documented so they never repeat
                    updated by the agent, not the human

4. VERIFICATION  -- automated tests, browser screenshots, self-review
                    humans review throughput, not individual lines

5. MODEL TIERING -- Opus for planning ($$$), Sonnet for coding ($$), Haiku for tests ($)
                    "correct answer costs less total iteration time than fast wrong ones"

What doesn't work:

1. NO TESTS      -- agents spiral without verification signals
2. NO MEMORY     -- same mistakes repeat across sessions
3. SHARED STATE  -- agents editing same files = merge hell
4. NO REVIEW     -- "vibe coding" produces unmaintainable code (Chollet, Karpathy)

Business Model Summary

  FREE / OSS:
    dmux (MIT)         -- monetizes separately via FormKit Pro
    OpenClaw (MIT)     -- Cloud tier planned $39/mo, creator joined OpenAI
    claude-flow (MIT)  -- reputation/consulting play
    Ralph/Compound (MIT) -- promotes Amp (Sourcegraph)
    Terragon (Apache-2.0) -- released on shutdown

  SAAS / COMMERCIAL:
    Superconductor     -- BYOK, undisclosed platform fee, early access
    8090               -- $200/seat/mo, $1M/yr managed delivery
    VugolaAI           -- free tier + crypto token (VGLA)

  PLATFORM:
    Claude Code        -- $200/mo Max plan or API usage (~$1B ARR)
    OpenAI Codex       -- subscription + API
    GitHub Agent HQ    -- Copilot subscription (multi-vendor agents)

  The trend: orchestration tools struggle to monetize when platforms
  add native multi-agent features (see: Terragon shutdown).
  Survivors either go enterprise (8090) or stay free and build community (dmux, OpenClaw).

Harness Engineering Report

February 20, 2026 · 8 min read

A survey of how teams are setting up automated coding agent pipelines (Feb 2026).

1. Stripe Minions -- Enterprise Internal Fleet

Scale: 1,300 PRs/week, 0 human-written code Trigger: Slack message, CLI, web UI, or automated (flaky test detected)

  Slack msg / CLI / auto-trigger
          |
          v
  ┌─────────────────┐
  │  Warm Devbox     │  <-- EC2, pre-cloned repo, ~10s ready
  │  (isolated)      │      no internet, no prod access
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │  Blueprint       │  <-- state machine: deterministic + agentic nodes
  │  Orchestration   │
  └────────┬────────┘
           |
     ┌─────┴──────┐
     |             |
     v             v
  [Agentic]    [Deterministic]
  "Implement"  "Run linters"
  "Fix CI"     "Push changes"
     |             |
     └─────┬───────┘
           |
           v
  ┌─────────────────┐
  │  Local Lint      │  <-- heuristic, <5s
  │  (shift left)    │
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │  CI: selective   │  <-- from 3M+ tests, only relevant
  │  test run        │
  └────────┬────────┘
           |
      pass? ──no──> autofix? ──yes──> apply, retry once
           |                    no──> hand to human
          yes
           |
           v
  ┌─────────────────┐
  │  PR created      │  <-- follows Stripe PR template
  │  (human review)  │
  └─────────────────┘

Context sources:

Rule files (Cursor format, directory-scoped)
MCP "Toolshed" (~500 internal tools, curated subset per agent)
Pre-hydrated links from conversation context

Key insight: "Often one, at most two CI runs." Forked Block's Goose as base agent.

Full reference | Source

2. OpenAI Harness Engineering -- Zero Human Code

Scale: ~1M LOC in 5 months, 3.5 PRs/engineer/day Trigger: Human writes a prompt describing a task

  Engineer writes prompt
          |
          v
  ┌─────────────────┐
  │  Codex agent     │  <-- reads AGENTS.md (table of contents, ~100 lines)
  │  (isolated       │      walks dir tree root -> CWD
  │   worktree)      │      loads docs/ as needed (progressive disclosure)
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │  Work depth-     │  <-- break goal into building blocks
  │  first           │      design -> code -> test -> review
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │  Custom linters  │  <-- Codex-written, error msgs include remediation
  │  (architectural  │      enforce layer deps, naming, file size
  │   constraints)   │
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │  Agent self-     │  <-- review own changes
  │  review          │      request additional agent reviews
  │                  │      respond to feedback, iterate
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │  Agent-to-agent  │  <-- humans optional in review
  │  review loop     │      squash & merge when satisfied
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │  PR merged       │
  └─────────────────┘

  ── background ──────────────────────
  "Garbage collection" agents run periodically:
    - scan for stale docs
    - detect architectural violations
    - open fix-up PRs
  "Doc-gardening" agent:
    - cross-link and validate knowledge base

Three pillars:

Context engineering (knowledge base + observability + browser via Chrome DevTools)
Architectural constraints (custom linters + structural testing)
Garbage collection (periodic entropy-fighting agents)

Key insight: "When the agent struggles, treat it as a signal. Identify what's missing and feed it back into the repo -- by having the agent write the fix."

Full reference | Source

3. Code Factory / Ralph -- Solo Autonomous Loop

Scale: Ships features while you sleep, 1 agent in a bash loop Trigger: ./scripts/compound/loop.sh N or ralph.sh

  prd.json (task inventory)
  prompt.md (instructions)
  AGENTS.md (conventions)
          |
          v
  while stories remain:
          |
          v
    ┌───────────────┐
    │ Agent picks    │  <-- reads prd.json, selects next by priority
    │ next story     │
    └───────┬───────┘
            |
            v
    ┌───────────────┐
    │ Implement      │  <-- single context window per story
    └───────┬───────┘
            |
            v
    ┌───────────────┐
    │ Typecheck +    │  <-- must be fast, "broken code compounds"
    │ Tests          │
    └───────┬───────┘
            |
       pass? ──no──> skip, log failure
            |
           yes
            |
            v
    ┌───────────────┐
    │ Auto-commit    │
    │ Mark story     │
    │ done           │
    └───────┬───────┘
            |
            v
    ┌───────────────┐
    │ Append to      │  <-- pattern accumulation
    │ progress.txt   │      by iteration 10, agent understands patterns
    └───────┬───────┘
            |
            └──> next iteration

  ── code review layer (Code Factory) ──
  Risk tiers:
    Low    -> fully automated merge
    Medium -> automated with CI gates
    High   -> require human confirmation

  Review agent validates PR:
    - review state must match current HEAD SHA
    - evidence: tests + browser recording + review
    - auto-resolve only bot-only stale threads

Key files: ralph.sh, prd.json, prompt.md, progress.txt, AGENTS.md

Key insight: Small stories, fast feedback, explicit criteria. "By iteration 10, the agent understands patterns from previous stories."

Full reference | Source

4. dmux -- Parallel Agents via tmux + Worktrees

Scale: N concurrent agents, each in isolated git worktree Trigger: Press n in dmux TUI, type a prompt

  dmux TUI
    |
    |──> press 'n'
    |
    v
  ┌─────────────────┐
  │ Generate slug    │  <-- AI-generated branch name via OpenRouter
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Create git       │  <-- .dmux/worktrees/<slug>/
  │ worktree         │      full independent working copy
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Split tmux pane  │
  │ Launch agent     │  <-- claude/codex/opencode
  │ (--acceptEdits)  │
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Agent works      │  <-- status detected via LLM analysis of terminal
  │ autonomously     │      polls every 1s
  └────────┬────────┘
           |
           v
  press 'm' to merge
           |
           v
  ┌─────────────────┐
  │ AI commit msg    │  <-- conventional commits via OpenRouter
  │ Merge to main    │
  │ Remove worktree  │
  └─────────────────┘

  Hooks fire at each stage:
    worktree_created -> e.g. pnpm install
    pre_merge        -> e.g. run tests
    post_merge       -> e.g. git push, close issue

A/B mode: Run two agents on same prompt side-by-side to compare outputs.

Key insight: Git worktrees give true isolation -- each agent has its own working copy, no conflicts. Hooks enable custom automation at every lifecycle point.

Full reference | Source

5. Superconductor -- Parallel Agents with Live Previews

Scale: N agents per ticket, cloud sandboxes, live browser previews Trigger: Web dashboard, iOS app, Slack, or GitHub comment (@superconductor)

  Create ticket (informal description)
          |
          v
  ┌─────────────────┐
  │ Launch N agents  │  <-- each gets isolated container
  │ on same ticket   │      full repo, dev tools, test runners
  └────────┬────────┘
           |
     ┌─────┼─────┐
     v     v     v
  [Agent1][Agent2][Agent3]   <-- Claude/Codex/Amp/Gemini
     |     |     |
     v     v     v
  [Live] [Live] [Live]      <-- browser previews appear ~30s
  [prev] [prev] [prev]
     |     |     |
     └─────┼─────┘
           |
           v
  ┌─────────────────┐
  │ Compare previews │  <-- interact with each, test functionality
  │ Diff viewer      │      audit code changes across agents
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Select best      │
  │ One-click PR     │
  └─────────────────┘

Key insight: Fire many agents in parallel on the same task. Visual comparison of live previews is the quality gate, not just code review.

Full reference | Source

6. Terragon -- Background Fire-and-Forget Fleet

Scale: ~30 concurrent tasks/day, auto-PR creation Trigger: Web dashboard, terry CLI, GitHub comment, mobile, Slack

  Create task (any interface)
          |
          v
  ┌─────────────────┐
  │ Spawn cloud      │  <-- fresh isolated container
  │ sandbox          │      clone repo, create branch
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Agent executes   │  <-- writes code, runs tests, iterates
  │ autonomously     │      checkpoints pushed to GitHub
  │ (background)     │      AI-generated commits
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ PR created       │  <-- automatic when agent finishes
  │ automatically    │
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Human reviews    │  <-- dashboard, CLI, or GitHub
  │ and merges       │
  └─────────────────┘

  If agent struggles:
    "Abandon and retry with different instructions"
    (more effective than course-correcting)

Best for: exploration/prototyping, one-shot cleanup, boilerplate, context-intensive debugging.

Key insight: Async-first. Close your laptop, come back to finished PRs. Volume alone doesn't guarantee gains -- task selection matters.

Full reference | Source

7. Gas Town (Steve Yegge) -- K8s for Agents

Scale: 20-30 parallel Claude Code instances Trigger: Task queue

  Task queue
      |
      v
  ┌─────────────────┐
  │ Orchestrator     │  <-- "K8s for agents"
  │ (Gas Town)       │
  └────────┬────────┘
           |
     ┌─────┼─────┼─────┐
     v     v     v     v
  [Agent][Agent][Agent][Agent] ... x20-30
     |     |     |     |
     v     v     v     v
  [Git-backed persistent state]
           |
           v
  ┌─────────────────┐
  │ Merge queue      │  <-- conflict resolution between agents
  └────────┬────────┘
           |
           v
  ┌─────────────────┐
  │ Patrol agents    │  <-- quality control watchdogs
  └────────┬────────┘
           |
           v
  merged to main

K8s analogy: Pod=Agent, Health check="Is it done?", Service mesh=Merge queue, DaemonSet=Patrol agent.

Full reference

Comparison Matrix

System	Trigger	Agents	Isolation	Quality Gate	Human Role
Stripe Minions	Slack/auto	1 per task	Devbox (EC2)	Linters + selective CI + autofix	Review PR
OpenAI Harness	Prompt	1 per task	Worktree	Custom linters + agent review	Prioritize, validate
Code Factory	Cron/manual	1 (loop)	Branch	Typecheck + tests + browser recording	Review high-risk
dmux	TUI key	N (tmux)	Git worktree	Hooks (custom)	Merge decision
Superconductor	Ticket	N per ticket	Cloud container	Live preview comparison	Select best
Terragon	Any interface	N (cloud)	Container	CI + auto-PR	Review PR
Gas Town	Task queue	20-30	Git state	Patrol agents + merge queue	Supervise

Common Patterns

All systems follow roughly the same skeleton:

  trigger (human or automated)
      |
      v
  isolate (worktree / container / devbox)
      |
      v
  agent works (agentic + deterministic nodes)
      |
      v
  fast feedback (lint / typecheck / tests -- shift left)
      |
      v
  quality gate (CI / agent review / live preview / patrol)
      |
      v
  output (PR / branch / merged code)
      |
      v
  human decision point (review / select / merge / abandon)

Universal principles:

Isolation first -- every agent gets its own sandbox
Shift feedback left -- catch errors before CI, not after
Context is scarce -- small focused instructions > one giant file
Constraints enable speed -- linters and gates prevent drift
Humans supervise loops, not sit inside them

1. Boris Cherny -- Creator of Claude Code, Head of Claude Code @ Anthropic​

2. Claude Code Native Multi-Agent -- Four Layers​

3. Simon Willison -- Parallel Agents, Different Models​

4. dmux -- Parallel Agents via tmux + Worktrees​

5. OpenClaw -- Open-Source AI Agent Framework​

6. Superconductor -- Parallel Cloud Agents with Live Previews​

7. 8090 Software Factory -- Enterprise Agent Platform​

8. Terragon -- Background Fire-and-Forget (SHUT DOWN)​

9. Vadim Strizheus -- "AI Employees" for VugolaAI​

10. Notable Voices​

Francois Chollet (@fchollet)​

Andrej Karpathy​

Addy Osmani​

Comparison Matrix​

Common Patterns​

Business Model Summary​

1. Stripe Minions -- Enterprise Internal Fleet​

2. OpenAI Harness Engineering -- Zero Human Code​

3. Code Factory / Ralph -- Solo Autonomous Loop​

4. dmux -- Parallel Agents via tmux + Worktrees​

5. Superconductor -- Parallel Agents with Live Previews​

6. Terragon -- Background Fire-and-Forget Fleet​

7. Gas Town (Steve Yegge) -- K8s for Agents​

Comparison Matrix​

Common Patterns​

1. Boris Cherny -- Creator of Claude Code, Head of Claude Code @ Anthropic

2. Claude Code Native Multi-Agent -- Four Layers

3. Simon Willison -- Parallel Agents, Different Models

4. dmux -- Parallel Agents via tmux + Worktrees

5. OpenClaw -- Open-Source AI Agent Framework

6. Superconductor -- Parallel Cloud Agents with Live Previews

7. 8090 Software Factory -- Enterprise Agent Platform

8. Terragon -- Background Fire-and-Forget (SHUT DOWN)

9. Vadim Strizheus -- "AI Employees" for VugolaAI

10. Notable Voices

Francois Chollet (@fchollet)

Andrej Karpathy

Addy Osmani

Comparison Matrix

Common Patterns

Business Model Summary

1. Stripe Minions -- Enterprise Internal Fleet

2. OpenAI Harness Engineering -- Zero Human Code

3. Code Factory / Ralph -- Solo Autonomous Loop

4. dmux -- Parallel Agents via tmux + Worktrees

5. Superconductor -- Parallel Agents with Live Previews

6. Terragon -- Background Fire-and-Forget Fleet

7. Gas Town (Steve Yegge) -- K8s for Agents

Comparison Matrix

Common Patterns