Stripe Minions: One-Shot, End-to-End Coding Agents

February 20, 2026 · 3 min read

Source: Part 1 | Part 2 Author: Alistair Gray (Stripe) Date: Feb 9 & Feb 19, 2026

Key Metric

Over 1,300 PRs merged per week -- completely minion-produced, human-reviewed, no human-written code.

Why Custom Agents

Codebase: hundreds of millions of LOC across several large repos
Primarily Ruby (non-Rails) with Sorbet typing -- rare combo for LLMs
Countless homegrown libraries specific to Stripe
Processes >$1T/year in production
Philosophy: "if it's good for humans, it's good for LLMs, too"

Entry Points

Slack (primary) -- tag the Slack app from any thread, full conversation context included
CLI and web interfaces
Internal tool integrations -- docs platform, feature flags, ticketing
Automated triggers -- CI detects flaky tests -> auto-ticket with "launch minion" button

Monitoring & Output

Web interface for real-time observation of agent decisions
On completion: branch, CI push, PR following Stripe's template
If code is good -> open PR for colleague review
If not -> provide additional instructions, minion pushes updates
Partially correct output used as foundation for focused human work

Devboxes (Cloud Dev Infrastructure)

Property	Description
Parallelizability	Multiple agents on separate tasks simultaneously
Predictability	Standardized configs ensure consistent behavior
Isolation	Work confined to individual environments

"Cattle, not pets" -- standardized, easy to replace
Warm pool achieves "hot and ready" in ~10 seconds
Pre-cloned repos, cached services
Isolated from production and internet

Agent Harness: Forked Goose

Internally forked Block's Goose (open-source coding agent) in late 2024. Key distinction: minions operate without human supervision -- full permissions, no confirmation prompts, safe within isolated devboxes.

Blueprints: The Orchestration Framework

Central architectural innovation. A "state machine that intermixes deterministic code nodes and free-flowing agent nodes."

Example node types:

Agentic nodes: "Implement task," "Fix CI failures" (free-form LLM reasoning)
Deterministic nodes: "Run linters," "Push changes," git ops, testing (guaranteed execution)

This hybrid approach reduces token consumption and improves reliability.

Context: Rule Files

Directory-specific and pattern-based rules (not global -- would overwhelm context)
Standardized on Cursor's rule format
Synchronized across minions, Cursor, and Claude Code

Context: MCP (Model Context Protocol)

"Toolshed" -- centralized internal MCP server with ~500 tools for internal systems and SaaS.

Minions receive an intentionally small subset by default
Per-user customizable additional tool sets
MCP is the common language across all Stripe agents
Deterministic pre-execution of relevant MCP tools for context hydration

Feedback Loop

Local linting -- heuristic-based, <5 seconds per push
CI selective testing -- from 3M+ tests, only relevant ones run
Autofixes -- many tests include autofixes, applied automatically
Single retry -- one resolution attempt for remaining failures

Hard rule: "often one, at most two, CI runs" -- shift feedback left.

Security

Devboxes in QA environments
No access to production services or real user data
Internal control frameworks preventing destructive actions

Stripe Minions Architecture

Key Metric​

Why Custom Agents​

Entry Points​

Monitoring & Output​

Devboxes (Cloud Dev Infrastructure)​

Agent Harness: Forked Goose​

Blueprints: The Orchestration Framework​

Context: Rule Files​

Context: MCP (Model Context Protocol)​

Feedback Loop​

Security​