Stripe Minions: One-Shot, End-to-End Coding Agents
Source: Part 1 | Part 2 Author: Alistair Gray (Stripe) Date: Feb 9 & Feb 19, 2026
Key Metric
Over 1,300 PRs merged per week -- completely minion-produced, human-reviewed, no human-written code.
Why Custom Agents
- Codebase: hundreds of millions of LOC across several large repos
- Primarily Ruby (non-Rails) with Sorbet typing -- rare combo for LLMs
- Countless homegrown libraries specific to Stripe
- Processes >$1T/year in production
- Philosophy: "if it's good for humans, it's good for LLMs, too"
Entry Points
- Slack (primary) -- tag the Slack app from any thread, full conversation context included
- CLI and web interfaces
- Internal tool integrations -- docs platform, feature flags, ticketing
- Automated triggers -- CI detects flaky tests -> auto-ticket with "launch minion" button
Monitoring & Output
- Web interface for real-time observation of agent decisions
- On completion: branch, CI push, PR following Stripe's template
- If code is good -> open PR for colleague review
- If not -> provide additional instructions, minion pushes updates
- Partially correct output used as foundation for focused human work
Devboxes (Cloud Dev Infrastructure)
| Property | Description |
|---|---|
| Parallelizability | Multiple agents on separate tasks simultaneously |
| Predictability | Standardized configs ensure consistent behavior |
| Isolation | Work confined to individual environments |
- "Cattle, not pets" -- standardized, easy to replace
- Warm pool achieves "hot and ready" in ~10 seconds
- Pre-cloned repos, cached services
- Isolated from production and internet
Agent Harness: Forked Goose
Internally forked Block's Goose (open-source coding agent) in late 2024. Key distinction: minions operate without human supervision -- full permissions, no confirmation prompts, safe within isolated devboxes.
Blueprints: The Orchestration Framework
Central architectural innovation. A "state machine that intermixes deterministic code nodes and free-flowing agent nodes."
Example node types:
- Agentic nodes: "Implement task," "Fix CI failures" (free-form LLM reasoning)
- Deterministic nodes: "Run linters," "Push changes," git ops, testing (guaranteed execution)
This hybrid approach reduces token consumption and improves reliability.
Context: Rule Files
- Directory-specific and pattern-based rules (not global -- would overwhelm context)
- Standardized on Cursor's rule format
- Synchronized across minions, Cursor, and Claude Code
Context: MCP (Model Context Protocol)
"Toolshed" -- centralized internal MCP server with ~500 tools for internal systems and SaaS.
- Minions receive an intentionally small subset by default
- Per-user customizable additional tool sets
- MCP is the common language across all Stripe agents
- Deterministic pre-execution of relevant MCP tools for context hydration
Feedback Loop
- Local linting -- heuristic-based, <5 seconds per push
- CI selective testing -- from 3M+ tests, only relevant ones run
- Autofixes -- many tests include autofixes, applied automatically
- Single retry -- one resolution attempt for remaining failures
Hard rule: "often one, at most two, CI runs" -- shift feedback left.
Security
- Devboxes in QA environments
- No access to production services or real user data
- Internal control frameworks preventing destructive actions
