Skip to main content

2 posts tagged with "ai-agents"

View All Tags

Agent Teams Report

· 10 min read

A survey of how individuals and teams are running multi-agent coding setups (Feb 2026).


1. Boris Cherny -- Creator of Claude Code, Head of Claude Code @ Anthropic

Scale: 10-15 concurrent sessions, 20-27 PRs/day, 100% AI-written code Business: Employee at Anthropic. Claude Code ~$1B annualized revenue in 6 months.

  Boris (human)
|
|──> 5 terminal tabs (iTerm, OS notifications)
|──> 5-10 browser sessions (claude.ai/code)
|──> mobile sessions (fire-and-forget)
|
v
Each session = independent Claude Code instance
|
|── Model: Opus 4.5 + extended thinking (always)
|── CLAUDE.md: shared knowledge base (updated weekly)
|── Plan Mode first, then auto-accept
|
v
┌───────────────┐
│ PostToolUse │ <-- formatting hooks fix style drift
│ hooks │
└───────┬───────┘
|
v
┌───────────────┐
│ Verification │ <-- Chrome extension, agent self-tests
│ loops │
└───────┬───────┘
|
v
┌───────────────┐
│ PR │ <-- /commit-push-pr slash command
└───────────────┘

"Teleport" hands sessions between terminal ↔ browser ↔ mobile

Key practices:

  • CLAUDE.md (not AGENTS.md) as living knowledge base -- errors get documented so they never repeat
  • /permissions pre-allows safe bash commands
  • Subagents: code-simplifier, verify-app
  • 259 PRs in 30 days. 90% of Claude Code's own codebase written by Claude Code.

Full reference | Boris's Twitter thread


2. Claude Code Native Multi-Agent -- Four Layers

Status: Subagents + SDK stable, Agent Teams experimental Business: Part of Claude Code ($200/mo Max plan, or API usage)

  Layer 1: SUBAGENTS (in-session)
────────────────────────────────
Parent agent
|
|── Task("Explore", "find all API routes") <-- Haiku, read-only
|── Task("code-reviewer", "review changes") <-- custom .claude/agents/*.md
|── Task("general", "refactor auth") <-- background, full tools
|
v
Results summarized back to parent
Subagents CANNOT talk to each other


Layer 2: AGENT TEAMS (cross-session, experimental)
───────────────────────────────────────────────────
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
tmux -CC
|
v
┌──────────────┐
│ Team Lead │ <-- Opus, plans work, assigns tasks
└──────┬───────┘
|
┌────┼────┐
v v v
[T1] [T2] [T3] <-- Teammates (Sonnet/Haiku), each in tmux pane
| | |
v v v
Shared task list + mailbox system
Direct inter-agent messaging
Dependencies: task A blocks task B

Display: in-process OR tmux split panes
Quality gates: TeammateIdle, TaskCompleted hooks


Layer 3: AGENT SDK (programmatic)
─────────────────────────────────
from claude_code import Agent, AgentDefinition

agents = [
Agent("planner", model="opus", tools=[...]),
Agent("coder", model="sonnet", tools=[...]),
Agent("tester", model="haiku", tools=[...]),
]
results = await asyncio.gather(*[a.run(task) for a in agents])

Full control: hooks as callbacks, MCP, permissions, session resume


Layer 4: GIT WORKTREES (manual)
───────────────────────────────
claude -w feature-1 & claude -w feature-2 & claude -w feature-3
| | |
v v v
.worktrees/feature-1 .worktrees/feature-2 .worktrees/feature-3
(independent branch) (independent branch) (independent branch)

Human merges when done. No coordination.

Full reference | Agent Teams docs


3. Simon Willison -- Parallel Agents, Different Models

Scale: 2-3 research projects/day across multiple agents Business: Independent developer, creator of Datasette. No product to sell -- writes about what works.

  Simon (human)
|
├──> Claude Code (Sonnet 4.5) <-- primary terminal agent
├──> Codex CLI (GPT-5-Codex) <-- second terminal agent
├──> Claude Code for Web <-- async, fire-and-forget
├──> Codex Cloud <-- async
└──> Jules <-- async
|
v
Each in separate terminal / browser tab
Isolation: fresh /tmp checkouts per task
No coordination framework -- human is the router

── tools ──────────────────────────
llm CLI <-- logs everything to SQLite, analyzed via Datasette
files-to-prompt <-- convert repo files to LLM context
shot-scraper <-- automated screenshots for visual testing

Key concepts:

  • "Agents = models using tools in a loop" (his canonical definition, 211 competing definitions collected)
  • Vibe Engineering (not vibe coding): 12 practices including automated tests, git discipline, code review
  • Bottleneck is human review, not agent speed
  • Skills > MCP for simplicity and low token overhead
  • "Lethal trifecta" security model: private data + untrusted content + external communication = danger

Full reference | simonwillison.net


4. dmux -- Parallel Agents via tmux + Worktrees

Scale: N concurrent agents, git worktree isolation per pane Business: MIT, fully free. Creator (Justin Schroeder) monetizes FormKit Pro ($149-$1,250). Open source: github.com/standardagents/dmux

  dmux TUI
|
|──> press 'n'
|
v
┌─────────────────┐
│ AI-generate slug │ <-- OpenRouter (gpt-4o-mini)
└────────┬────────┘
|
v
┌─────────────────┐
│ Create git │ <-- .dmux/worktrees/<slug>/
│ worktree │ full independent working copy
└────────┬────────┘
|
v
┌─────────────────┐
│ Split tmux pane │
│ Launch agent │ <-- claude/codex/opencode (--acceptEdits)
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent works │ <-- status via LLM analysis of terminal (1s poll)
│ autonomously │
└────────┬────────┘
|
v
press 'm' to merge
|
v
┌─────────────────┐
│ AI commit msg │ <-- conventional commits via OpenRouter
│ Merge to main │
│ Remove worktree │
└─────────────────┘

Hooks: worktree_created, pre_merge, post_merge
A/B mode: two agents, same prompt, side-by-side
Web dashboard + REST API for programmatic control

Full reference | dmux.ai


5. OpenClaw -- Open-Source AI Agent Framework

Scale: 213K+ GitHub stars, 50+ integrations Business: MIT license, free to self-host. OpenClaw Cloud planned at $39/mo. Real cost: $5-30/mo in LLM API fees. Creator: Peter Steinberger (ex-PSPDFKit, acqui-hired by OpenAI Feb 2026)

  User prompt
|
v
┌─────────────────┐
│ OpenClaw gateway │ <-- local-first, 50+ integrations
│ (agent router) │ messaging, coding, browser, etc.
└────────┬────────┘
|
┌─────┼─────┐
v v v
[Sub-1][Sub-2][Sub-3] <-- sub-agent collaboration
| | | 40% accuracy boost vs monolithic prompting
└─────┼─────┘
|
v
┌─────────────────┐
│ Output │ <-- declarative agent config in YAML
└─────────────────┘

Not primarily a coding tool -- general-purpose AI assistant
Can run with local models (Ollama + Llama 3.3) for $0/mo
Will remain open source under OpenAI stewardship

Full reference | github.com/openclaw


6. Superconductor -- Parallel Cloud Agents with Live Previews

Scale: N agents per ticket, cloud sandboxes, live browser previews Business: Closed-source SaaS by Volition (Gradescope founders). BYOK model. Pricing undisclosed, early access.

  Create ticket (informal)
|
v
┌─────────────────┐
│ Launch N agents │ <-- each in isolated cloud container
│ on same ticket │ (Modal / Morph Cloud)
└────────┬────────┘
|
┌─────┼─────┐
v v v
[Agent1][Agent2][Agent3] <-- Claude/Codex/Amp/Gemini
| | |
v v v
[Live] [Live] [Live] <-- browser previews ~30s
[prev] [prev] [prev]
| | |
└─────┼─────┘
|
v
┌─────────────────┐
│ Compare previews │ <-- visual diff, interact with each
│ Select best │
│ One-click PR │
└─────────────────┘

Full reference | superconductor.com


7. 8090 Software Factory -- Enterprise Agent Platform

Scale: Multi-repo code modernization Business: Proprietary. $200/seat/mo (Team), custom Enterprise, managed delivery from $1M/yr. Funded by Chamath Palihapitiya personally.

  ┌─────────────────┐
│ Refinery │ <-- reverse-engineer codebase into knowledge graph
└────────┬────────┘
|
v
┌─────────────────┐
│ Planner │ <-- AI generates migration/transformation plans
└────────┬────────┘
|
v
┌─────────────────┐
│ Foundry │ <-- specialized agents execute plan
│ (agent workers) │ across multiple repos
└────────┬────────┘
|
v
┌─────────────────┐
│ Validator │ <-- quality gate, CI, tests
└────────┬────────┘
|
v
┌─────────────────┐
│ Factory Line │ <-- full pipeline for enterprise
│ output: PRs │ code modernization at scale
└─────────────────┘

Full reference | 8090.ai


8. Terragon -- Background Fire-and-Forget (SHUT DOWN)

Scale: ~30 concurrent tasks/day, auto-PR creation Business: SaaS subscription. Shut down Feb 9, 2026. Code released Apache-2.0. Why: Native background agents from Claude Code and Codex commoditized the orchestration layer.

  Create task (web / CLI / GitHub / Slack / mobile)
|
v
┌─────────────────┐
│ Cloud sandbox │ <-- isolated container, clone repo, create branch
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent works │ <-- background, checkpoints pushed to GitHub
│ autonomously │ AI-generated commits
└────────┬────────┘
|
v
┌─────────────────┐
│ PR created │ <-- automatic when done
└────────┬────────┘
|
v
Human reviews and merges

DEAD: Codex reached 28% agent usage on Terragon within 1 month
Native background agents made the wrapper unnecessary

Full reference | terragon-labs/terragon-oss


9. Vadim Strizheus -- "AI Employees" for VugolaAI

Scale: Claims 14 AI employees, 95% automated Business: VugolaAI (video clipping/scheduling SaaS). Free tier. Solana token (VGLA).

  Long-form video input
|
v
┌─────────────────┐
│ AI Moment │ <-- "AI employee" 1: detect viral-worthy segments
│ Detection │
└────────┬────────┘
|
v
┌─────────────────┐
│ Auto-Clipping │ <-- "AI employee" 2-N: extract, reframe, caption
│ + Captioning │
└────────┬────────┘
|
v
┌─────────────────┐
│ Branding + │ <-- template application
│ Formatting │
└────────┬────────┘
|
v
┌─────────────────┐
│ Multi-Platform │ <-- TikTok, YouTube, Instagram, X, LinkedIn
│ Scheduling │
└─────────────────┘

Note: Specific agent breakdown from video tweet, not independently verified.
The product itself IS the AI automation -- "employees" = AI pipeline stages.

Full reference | @VadimStrizheus


10. Notable Voices

Francois Chollet (@fchollet)

"Sufficiently advanced agentic coding is essentially machine learning"

Does NOT run a multi-agent setup. Warns about maintaining "sprawling mess of AI-generated legacy code." Useful contrarian check.

Andrej Karpathy

Coined "vibe coding" (Feb 2025), then abandoned it for "agentic engineering" (Feb 2026). Evolution: accept all AI output → require specs, review, test suites.

Addy Osmani

Defined Conductor (sequential) vs Orchestrator (parallel) agent frameworks. Identified the "80% problem" -- last 20% takes as long as first 80%.


Comparison Matrix

SystemTypeOpen SourcePricingAgentsKey Feature
Boris ChernyIndividual workflowN/A (uses Claude Code)$200/mo Max10-15 parallel CCTeleport between devices
Claude Code TeamsBuilt-inN/A (product feature)$200/mo Max or APIN (tmux panes)Shared task list + mailbox
Claude Agent SDKLibraryMITAPI usageProgrammaticFull orchestration control
Simon WillisonIndividual workflowN/AMulti-subscriptionCC + Codex + asyncHuman as router
dmuxOSS toolMITFreeN (tmux + worktrees)A/B agent comparison
OpenClawOSS frameworkMITFree / $39 CloudSub-agents213K stars, joined OpenAI
SuperconductorSaaSNoUndisclosed (BYOK)N per ticketLive browser previews
8090EnterpriseNo$200/seat/mo+Factory LineKnowledge graph + modernization
TerragonSaaS (dead)Apache-2.0 (post-shutdown)Was subscriptionBackground agentsShut down Feb 2026
VugolaAIProductNoFree tier14 "AI employees"Video pipeline automation

Common Patterns

What works across all setups:

1. ISOLATION -- worktrees, containers, or separate sessions
agents must not conflict with each other

2. PLAN FIRST -- Opus/expensive model plans, cheaper model executes
Boris: Plan Mode → auto-accept
Agent Teams: team lead plans, teammates execute

3. MEMORY -- CLAUDE.md / AGENTS.md / progress.txt
errors documented so they never repeat
updated by the agent, not the human

4. VERIFICATION -- automated tests, browser screenshots, self-review
humans review throughput, not individual lines

5. MODEL TIERING -- Opus for planning ($$$), Sonnet for coding ($$), Haiku for tests ($)
"correct answer costs less total iteration time than fast wrong ones"

What doesn't work:

1. NO TESTS -- agents spiral without verification signals
2. NO MEMORY -- same mistakes repeat across sessions
3. SHARED STATE -- agents editing same files = merge hell
4. NO REVIEW -- "vibe coding" produces unmaintainable code (Chollet, Karpathy)

Business Model Summary

  FREE / OSS:
dmux (MIT) -- monetizes separately via FormKit Pro
OpenClaw (MIT) -- Cloud tier planned $39/mo, creator joined OpenAI
claude-flow (MIT) -- reputation/consulting play
Ralph/Compound (MIT) -- promotes Amp (Sourcegraph)
Terragon (Apache-2.0) -- released on shutdown

SAAS / COMMERCIAL:
Superconductor -- BYOK, undisclosed platform fee, early access
8090 -- $200/seat/mo, $1M/yr managed delivery
VugolaAI -- free tier + crypto token (VGLA)

PLATFORM:
Claude Code -- $200/mo Max plan or API usage (~$1B ARR)
OpenAI Codex -- subscription + API
GitHub Agent HQ -- Copilot subscription (multi-vendor agents)

The trend: orchestration tools struggle to monetize when platforms
add native multi-agent features (see: Terragon shutdown).
Survivors either go enterprise (8090) or stay free and build community (dmux, OpenClaw).

Harness Engineering Report

· 8 min read

A survey of how teams are setting up automated coding agent pipelines (Feb 2026).


1. Stripe Minions -- Enterprise Internal Fleet

Scale: 1,300 PRs/week, 0 human-written code Trigger: Slack message, CLI, web UI, or automated (flaky test detected)

  Slack msg / CLI / auto-trigger
|
v
┌─────────────────┐
│ Warm Devbox │ <-- EC2, pre-cloned repo, ~10s ready
│ (isolated) │ no internet, no prod access
└────────┬────────┘
|
v
┌─────────────────┐
│ Blueprint │ <-- state machine: deterministic + agentic nodes
│ Orchestration │
└────────┬────────┘
|
┌─────┴──────┐
| |
v v
[Agentic] [Deterministic]
"Implement" "Run linters"
"Fix CI" "Push changes"
| |
└─────┬───────┘
|
v
┌─────────────────┐
│ Local Lint │ <-- heuristic, <5s
│ (shift left) │
└────────┬────────┘
|
v
┌─────────────────┐
│ CI: selective │ <-- from 3M+ tests, only relevant
│ test run │
└────────┬────────┘
|
pass? ──no──> autofix? ──yes──> apply, retry once
| no──> hand to human
yes
|
v
┌─────────────────┐
│ PR created │ <-- follows Stripe PR template
│ (human review) │
└─────────────────┘

Context sources:

  • Rule files (Cursor format, directory-scoped)
  • MCP "Toolshed" (~500 internal tools, curated subset per agent)
  • Pre-hydrated links from conversation context

Key insight: "Often one, at most two CI runs." Forked Block's Goose as base agent.

Full reference | Source


2. OpenAI Harness Engineering -- Zero Human Code

Scale: ~1M LOC in 5 months, 3.5 PRs/engineer/day Trigger: Human writes a prompt describing a task

  Engineer writes prompt
|
v
┌─────────────────┐
│ Codex agent │ <-- reads AGENTS.md (table of contents, ~100 lines)
│ (isolated │ walks dir tree root -> CWD
│ worktree) │ loads docs/ as needed (progressive disclosure)
└────────┬────────┘
|
v
┌─────────────────┐
│ Work depth- │ <-- break goal into building blocks
│ first │ design -> code -> test -> review
└────────┬────────┘
|
v
┌─────────────────┐
│ Custom linters │ <-- Codex-written, error msgs include remediation
│ (architectural │ enforce layer deps, naming, file size
│ constraints) │
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent self- │ <-- review own changes
│ review │ request additional agent reviews
│ │ respond to feedback, iterate
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent-to-agent │ <-- humans optional in review
│ review loop │ squash & merge when satisfied
└────────┬────────┘
|
v
┌─────────────────┐
│ PR merged │
└─────────────────┘

── background ──────────────────────
"Garbage collection" agents run periodically:
- scan for stale docs
- detect architectural violations
- open fix-up PRs
"Doc-gardening" agent:
- cross-link and validate knowledge base

Three pillars:

  1. Context engineering (knowledge base + observability + browser via Chrome DevTools)
  2. Architectural constraints (custom linters + structural testing)
  3. Garbage collection (periodic entropy-fighting agents)

Key insight: "When the agent struggles, treat it as a signal. Identify what's missing and feed it back into the repo -- by having the agent write the fix."

Full reference | Source


3. Code Factory / Ralph -- Solo Autonomous Loop

Scale: Ships features while you sleep, 1 agent in a bash loop Trigger: ./scripts/compound/loop.sh N or ralph.sh

  prd.json (task inventory)
prompt.md (instructions)
AGENTS.md (conventions)
|
v
while stories remain:
|
v
┌───────────────┐
│ Agent picks │ <-- reads prd.json, selects next by priority
│ next story │
└───────┬───────┘
|
v
┌───────────────┐
│ Implement │ <-- single context window per story
└───────┬───────┘
|
v
┌───────────────┐
│ Typecheck + │ <-- must be fast, "broken code compounds"
│ Tests │
└───────┬───────┘
|
pass? ──no──> skip, log failure
|
yes
|
v
┌───────────────┐
│ Auto-commit │
│ Mark story │
│ done │
└───────┬───────┘
|
v
┌───────────────┐
│ Append to │ <-- pattern accumulation
│ progress.txt │ by iteration 10, agent understands patterns
└───────┬───────┘
|
└──> next iteration

── code review layer (Code Factory) ──
Risk tiers:
Low -> fully automated merge
Medium -> automated with CI gates
High -> require human confirmation

Review agent validates PR:
- review state must match current HEAD SHA
- evidence: tests + browser recording + review
- auto-resolve only bot-only stale threads

Key files: ralph.sh, prd.json, prompt.md, progress.txt, AGENTS.md

Key insight: Small stories, fast feedback, explicit criteria. "By iteration 10, the agent understands patterns from previous stories."

Full reference | Source


4. dmux -- Parallel Agents via tmux + Worktrees

Scale: N concurrent agents, each in isolated git worktree Trigger: Press n in dmux TUI, type a prompt

  dmux TUI
|
|──> press 'n'
|
v
┌─────────────────┐
│ Generate slug │ <-- AI-generated branch name via OpenRouter
└────────┬────────┘
|
v
┌─────────────────┐
│ Create git │ <-- .dmux/worktrees/<slug>/
│ worktree │ full independent working copy
└────────┬────────┘
|
v
┌─────────────────┐
│ Split tmux pane │
│ Launch agent │ <-- claude/codex/opencode
│ (--acceptEdits) │
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent works │ <-- status detected via LLM analysis of terminal
│ autonomously │ polls every 1s
└────────┬────────┘
|
v
press 'm' to merge
|
v
┌─────────────────┐
│ AI commit msg │ <-- conventional commits via OpenRouter
│ Merge to main │
│ Remove worktree │
└─────────────────┘

Hooks fire at each stage:
worktree_created -> e.g. pnpm install
pre_merge -> e.g. run tests
post_merge -> e.g. git push, close issue

A/B mode: Run two agents on same prompt side-by-side to compare outputs.

Key insight: Git worktrees give true isolation -- each agent has its own working copy, no conflicts. Hooks enable custom automation at every lifecycle point.

Full reference | Source


5. Superconductor -- Parallel Agents with Live Previews

Scale: N agents per ticket, cloud sandboxes, live browser previews Trigger: Web dashboard, iOS app, Slack, or GitHub comment (@superconductor)

  Create ticket (informal description)
|
v
┌─────────────────┐
│ Launch N agents │ <-- each gets isolated container
│ on same ticket │ full repo, dev tools, test runners
└────────┬────────┘
|
┌─────┼─────┐
v v v
[Agent1][Agent2][Agent3] <-- Claude/Codex/Amp/Gemini
| | |
v v v
[Live] [Live] [Live] <-- browser previews appear ~30s
[prev] [prev] [prev]
| | |
└─────┼─────┘
|
v
┌─────────────────┐
│ Compare previews │ <-- interact with each, test functionality
│ Diff viewer │ audit code changes across agents
└────────┬────────┘
|
v
┌─────────────────┐
│ Select best │
│ One-click PR │
└─────────────────┘

Key insight: Fire many agents in parallel on the same task. Visual comparison of live previews is the quality gate, not just code review.

Full reference | Source


6. Terragon -- Background Fire-and-Forget Fleet

Scale: ~30 concurrent tasks/day, auto-PR creation Trigger: Web dashboard, terry CLI, GitHub comment, mobile, Slack

  Create task (any interface)
|
v
┌─────────────────┐
│ Spawn cloud │ <-- fresh isolated container
│ sandbox │ clone repo, create branch
└────────┬────────┘
|
v
┌─────────────────┐
│ Agent executes │ <-- writes code, runs tests, iterates
│ autonomously │ checkpoints pushed to GitHub
│ (background) │ AI-generated commits
└────────┬────────┘
|
v
┌─────────────────┐
│ PR created │ <-- automatic when agent finishes
│ automatically │
└────────┬────────┘
|
v
┌─────────────────┐
│ Human reviews │ <-- dashboard, CLI, or GitHub
│ and merges │
└─────────────────┘

If agent struggles:
"Abandon and retry with different instructions"
(more effective than course-correcting)

Best for: exploration/prototyping, one-shot cleanup, boilerplate, context-intensive debugging.

Key insight: Async-first. Close your laptop, come back to finished PRs. Volume alone doesn't guarantee gains -- task selection matters.

Full reference | Source


7. Gas Town (Steve Yegge) -- K8s for Agents

Scale: 20-30 parallel Claude Code instances Trigger: Task queue

  Task queue
|
v
┌─────────────────┐
│ Orchestrator │ <-- "K8s for agents"
│ (Gas Town) │
└────────┬────────┘
|
┌─────┼─────┼─────┐
v v v v
[Agent][Agent][Agent][Agent] ... x20-30
| | | |
v v v v
[Git-backed persistent state]
|
v
┌─────────────────┐
│ Merge queue │ <-- conflict resolution between agents
└────────┬────────┘
|
v
┌─────────────────┐
│ Patrol agents │ <-- quality control watchdogs
└────────┬────────┘
|
v
merged to main

K8s analogy: Pod=Agent, Health check="Is it done?", Service mesh=Merge queue, DaemonSet=Patrol agent.

Full reference


Comparison Matrix

SystemTriggerAgentsIsolationQuality GateHuman Role
Stripe MinionsSlack/auto1 per taskDevbox (EC2)Linters + selective CI + autofixReview PR
OpenAI HarnessPrompt1 per taskWorktreeCustom linters + agent reviewPrioritize, validate
Code FactoryCron/manual1 (loop)BranchTypecheck + tests + browser recordingReview high-risk
dmuxTUI keyN (tmux)Git worktreeHooks (custom)Merge decision
SuperconductorTicketN per ticketCloud containerLive preview comparisonSelect best
TerragonAny interfaceN (cloud)ContainerCI + auto-PRReview PR
Gas TownTask queue20-30Git statePatrol agents + merge queueSupervise

Common Patterns

All systems follow roughly the same skeleton:

trigger (human or automated)
|
v
isolate (worktree / container / devbox)
|
v
agent works (agentic + deterministic nodes)
|
v
fast feedback (lint / typecheck / tests -- shift left)
|
v
quality gate (CI / agent review / live preview / patrol)
|
v
output (PR / branch / merged code)
|
v
human decision point (review / select / merge / abandon)

Universal principles:

  1. Isolation first -- every agent gets its own sandbox
  2. Shift feedback left -- catch errors before CI, not after
  3. Context is scarce -- small focused instructions > one giant file
  4. Constraints enable speed -- linters and gates prevent drift
  5. Humans supervise loops, not sit inside them