Simon Willison on AI Coding Agents and Workflows

February 20, 2026 · 17 min read

Research compiled February 2026. Simon Willison is the creator of Datasette, co-creator of Django, and one of the most prolific and practical writers on LLM-assisted development. Blog: simonwillison.net | Twitter/X: @simonw | Newsletter: simonw.substack.com

Core Philosophy
Defining Agents: "Tools in a Loop"
Vibe Coding vs. Vibe Engineering
His Actual Daily Setup
Parallel Coding Agents
Async Coding Agents (Fire-and-Forget)
Designing Agentic Loops
Security: The Lethal Trifecta and Sandboxing
Context Management
Claude Skills vs. MCP
Practical Techniques for LLM-Assisted Coding
Tools He Built and Uses
Key Quotes
Source Links

Core Philosophy

Willison's overarching stance: LLMs amplify existing expertise. The more skills and experience you have as a software engineer, the faster and better the results you get from working with LLMs and coding agents.

The biggest advantage is not getting work done faster -- it is being able to ship projects that would not have been justified spending time on at all. LLMs accelerate learning, and letting developers execute ideas faster means they learn even more.

Key mental model: Think of LLMs as "an over-confident pair programming assistant who's lightning fast at looking things up, can churn out relevant examples at a moment's notice and can execute on tedious tasks without complaint." But they will absolutely make mistakes -- sometimes subtle, sometimes huge -- with errors that can be deeply inhuman, like hallucinating a non-existent library or method.

Critical warning: "If someone tells you that coding with LLMs is easy they are (probably unintentionally) misleading you." Using LLMs to write code is difficult and unintuitive, requiring significant effort to find the sharp and soft edges.

Defining Agents: "Tools in a Loop"

After collecting 211 different definitions of "agent" from Twitter and growing frustrated that Anthropic's own developer conference used the word dozens of times without defining it, Willison landed on a definition from Hannah Moran at Anthropic:

"Agents are models using tools in a loop."

This distinguishes agents by their iterative process -- not simply models or tools individually, but language models that repeatedly call external tools and use their outputs to inform subsequent decisions. Agents operate through cyclical reasoning rather than single-pass inference. The loop mechanism is central.

Willison notes: "2025 really has been the year of 'agents', no matter which of the many conflicting definitions you decide to use (I eventually settled on 'tools in a loop')."

Source: Agents are models using tools in a loop

Vibe Coding vs. Vibe Engineering

Willison draws a sharp distinction between two modes of AI-assisted development:

Vibe Coding (from Andrej Karpathy): "The fast, loose and irresponsible way of building software with AI -- entirely prompt-driven, and with no attention paid to how the code actually works." Useful for weekend projects, exploration, learning. Willison built 77+ HTML+JavaScript tools this way without reading implementation details.

Vibe Engineering (Willison's term): Responsible AI-assisted development where "seasoned professionals accelerate their work with LLMs while staying proudly and confidently accountable for the software they produce." The name is deliberately cheeky.

The 12 Practices of Vibe Engineering

Automated Testing -- Agents perform best with comprehensive test suites; test-first development is particularly effective
Advance Planning -- High-level planning before coding improves iteration
Comprehensive Documentation -- Enables agents to use APIs without reading source code
Strong Git Habits -- Version control becomes critical; agents excel at git bisect
Effective Automation -- CI/CD, linting, preview deployments amplify agent productivity
Culture of Code Review -- Reviewing agent output requires genuine expertise
Management Skills -- "Getting good results out of a coding agent feels uncomfortably close to getting good results out of a human collaborator"
Manual QA -- Beyond tests, rigorous edge-case testing remains essential
Research Skills -- Determining optimal solutions before implementation
Preview Environments -- Safe feature testing before production
Outsourcing Intuition -- Knowing what AI handles well versus manual work
Updated Estimation -- Accounting for AI's variable impact on timelines

Central insight: "One of the lesser spoken truths of working productively with LLMs as a software engineer on non-toy-projects is that it's difficult."

Source: Vibe engineering

His Actual Daily Setup

Primary Tools (as of late 2025 / early 2026)

Tool	Use Case
Claude Code (Sonnet 4.5)	Primary local terminal agent
Codex CLI (GPT-5-Codex)	Primary local terminal agent, used alongside Claude Code
Claude Code for Web	Async fire-and-forget agent (sandboxed)
Codex Cloud	Async tasks, launched from phone
Google Jules	Free alternative async agent
GitHub Copilot Coding Agent	PR-based async agent
`llm` CLI tool (his own)	Quick prompts, logging to SQLite, RAG workflows
files-to-prompt (his own)	Pipe entire directories into LLM context

How He Runs Multiple Agents

Multiple terminal windows with different agents in separate directories
Mixture of Claude Code and Codex CLI running simultaneously
For isolation: creates fresh checkouts in /tmp rather than using git worktrees
Runs in YOLO mode (--dangerously-skip-permissions) for tasks where malicious instructions cannot sneak into context
Recognizes he should "start habitually running my local agents in Docker containers to further limit the blast radius"

Claude Code as General Agent

Willison's key insight (January 2026): "Claude Code is, with hindsight, poorly named -- it's not purely a coding tool: it's a tool for general computer automation. Anything you can achieve by typing commands into a computer is something that can now be automated by Claude Code. It's best described as a general agent."

Claude Cowork (January 2026): Anthropic's "Claude Code for the rest of your work" -- same underlying engine with a less intimidating UI, automatic filesystem sandboxing via Apple's VZVirtualMachine, aimed at non-technical users.

Sources:

Parallel Coding Agents

Despite initial skepticism, Willison found himself "quietly starting to embrace the parallel coding agent lifestyle, finding an increasing number of tasks that can be fired off in parallel without adding too much cognitive overhead."

Four Key Application Patterns

Research and Proof of Concepts -- Testing whether new libraries work together. Libraries too new to be in training data do not matter; agents can checkout repos and read code to figure out usage.
System Understanding -- Ask agents to "make notes on where your signed cookies are set and read, or how your application uses subprocesses and threads."
Low-Stakes Maintenance -- Fixing deprecation warnings, resolving test suite issues without interrupting primary focus.
Carefully Specified Work -- Code reviewed faster when starting from detailed specifications rather than open-ended requests.

The "Send Out a Scout" Pattern

From Josh Bleecher Snyder: "Hand the AI agent a task just to find out where the sticky bits are, so you don't have to make those mistakes." Use the agent as reconnaissance before committing to an approach.

Source: Embracing the parallel coding agent lifestyle

Async Coding Agents (Fire-and-Forget)

Willison practices code research -- answering software development questions by writing and executing code rather than relying on speculation. Async agents excel at this.

Workflow

Create a dedicated GitHub repository (separate from production code)
Enable full network access for agents in research repos
Formulate clear research goals in 2-3 paragraphs
Submit as async task (fire-and-forget)
Agents file pull requests with results

Willison reports running 2-3 code research projects a day with minimal time investment. Can launch from phone.

Why Async Agents Are Compelling

Great answer to security challenges (code runs on someone else's infra, not your laptop)
Parallelizable -- fire off multiple tasks at once
Hallucination mitigation: "The code itself doesn't lie: if they write code and execute it and it does the right things then they've demonstrated... that something really does work."

Honest Assessment

Willison acknowledges these outputs constitute "total slop" -- unreviewed AI-generated content. He quarantines research in dedicated repositories and requests that platforms add noindex support to prevent search engine indexing of AI-generated research.

Source: Code research projects with async coding agents

Designing Agentic Loops

Willison identifies "designing agentic loops" as a critical new skill for getting the most out of coding agents. The skill involves carefully selecting which tools and feedback mechanisms the agent uses.

When Agentic Loops Shine

Problems with clear success criteria requiring trial-and-error. The signal: "ugh, I'm going to have to try a lot of variations here."

Examples:

Debugging failing tests through iterative investigation
Performance optimization (SQL indexing, container sizing)
Dependency upgrades with automated test validation
Docker image optimization while maintaining test passage

Tool Selection Strategy

Rather than complex MCP setups, create an AGENTS.md file documenting available commands:

To take a screenshot, run:
shot-scraper http://www.example.com/ -w 800 -o example.jpg

LLMs effectively leverage existing tools (Playwright, ffmpeg) they already understand, recovering from mistakes through iteration.

Critical Amplifier

Automated test suites dramatically multiply agent effectiveness. Agents need measurable success criteria to iterate toward solutions reliably. Fast feedback loops enable productive agentic workflows: fast compilation, fast tests, fast tool responses.

YOLO Mode Trade-offs

Three implementation options for unrestricted execution:

Secure sandbox (Docker, Apple container tool) restricting file/secret/network access
Ephemeral environments (GitHub Codespaces, ChatGPT Code Interpreter)
Calculated risk with isolated, monitored environments

Credential Scoping Pattern

Provide credentials only to test/staging with tight constraints. Example: Willison created a dedicated Fly.io organization with $5 budget limit and scoped API key for isolated infrastructure experimentation.

Source: Designing agentic loops

Security: The Lethal Trifecta and Sandboxing

The Lethal Trifecta

Three combined factors create critical vulnerability:

Access to private data
Exposure to untrusted content
Ability to communicate externally

When all three are present, attackers can extract secrets. Example: a malicious HTML file tricks an agent into grepping environment variables (like GitHub tokens) and exfiltrating them to attacker-controlled servers.

The fundamental rule: "Anyone who can get their tokens into your context should be considered to have full control over what your agent does next."

Sandboxing Recommendations

Primary defense: Run coding agents in sandboxes, preferably "on someone else's computer."

Recommended platforms:

Claude Code for Web (sandboxed by default)
Codex Cloud
Gemini Jules
Docker containers locally

Two-layer control problem:

Filesystem access (manageable) -- restrict file read/write permissions
Network access (critical/difficult) -- prevents data exfiltration, the "data exfiltration leg of the lethal trifecta"

Technical Implementation (macOS)

Apple's sandbox-exec command with policy documents controlling file visibility, network allowlists, and process execution. Anthropic's approach: HTTP proxy mediates agent network traffic with domain allowlists. They released an open-source sandbox-runtime library.

MCP Context Pollution Solved (January 2026)

Willison on MCP Tool Search: "Context pollution is why I rarely used MCP, now that it's solved there's no reason not to hook up dozens or even hundreds of MCPs to Claude Code."

MCP Tool Search reduces token overhead by 85% (from ~77K tokens to ~8.7K for 50+ tools), dynamically loading tools into context only when needed.

Sources:

Context Management

Willison's key insight: "Most of the craft of getting good results out of an LLM comes down to managing its context -- the text that is part of your current conversation."

Principles

Context is not free. Every token influences behavior, for better or worse.
Context includes entire conversation history, not just current prompt.
Starting fresh conversations "resets that context back to zero."
Pre-populate context using tools like Claude Projects' GitHub integration.
Explicitly understand what information enters the LLM to get better results.

From Prompt Engineering to Context Engineering

Willison evolved from "prompt engineering" to "context engineering" -- everything that surrounds the prompt: goals, constraints, examples, tools, memory, tests, and retrieved knowledge that steer an LLM to do the next correct thing.

He describes the shift: "Language models change you from a programmer who writes lines of code, to a programmer that manages the context the model has access to, prunes irrelevant things, adds useful material to context, and writes detailed specifications."

Context Problems to Avoid

Context Poisoning: hallucinations making it into the context
Context Distraction: long contexts causing over-focus on irrelevant parts
Context Confusion: superfluous information leading to low-quality responses
Context Clash: new information conflicting with existing prompt information

Source: Simon Willison on context-engineering

Claude Skills vs. MCP

What Skills Are

Skills are folders containing Markdown files with YAML metadata and optional executable scripts. The system scans available skills at session start, reading brief descriptions from frontmatter -- each skill consuming only dozens of tokens until fully loaded when needed.

Why Willison Thinks They Are a Bigger Deal Than MCP

Advantages over MCP:

Extreme simplicity compared to MCP's protocol specification (hosts, clients, servers, resources, transports)
Easy to iterate and improve -- just Markdown files and scripts
Platform-agnostic -- work with Codex CLI, Gemini CLI despite no native integration
Low token overhead -- dozens of tokens per skill vs. tens of thousands for MCP

Willison predicts "a Cambrian explosion in Skills" exceeding the MCP adoption wave.

The General Agent Pattern via Skills

Example: a data journalism agent combining skills for census data access, SQLite/DuckDB operations, S3 publishing, story discovery methodology, and D3 visualization -- all implemented "with a folder full of Markdown files and maybe a couple of example Python scripts."

Practical Configuration: AGENTS.md and CLAUDE.md

CLAUDE.md: The "constitution" for Claude Code. Lives at project root or ~/.claude/CLAUDE.md for global defaults. Sets instructions for every session.
AGENTS.md: Documents available commands and tools for agents. Simpler than MCP -- just describe how to use existing CLI tools.

Source: Claude Skills are awesome, maybe a bigger deal than MCP

Practical Techniques for LLM-Assisted Coding

1. The Authoritarian Approach (Production Code)

Provide exact specifications with function signatures:

async def download_db(url, max_size_bytes=5 * 1025 * 1025): -> pathlib.Path

Then detail requirements in English. Treats LLMs "like a digital intern, hired to type code for me based on my detailed instructions." Saves 15+ minutes on functions you could write manually.

Bad initial results are not failures -- they are starting points. Follow-up prompts like "break that repetitive code into a function" often yield better results. The LLM "can re-type it dozens of times without ever getting frustrated."

3. Strategic Example Provision

Dump several complete working examples as context, then ask the LLM to build inspired by them. Willison used this for his JavaScript OCR application combining Tesseract.js and PDF.js.

4. Plan-Then-Execute for Larger Changes

For refactorings: tell the LLM to write a plan, iterate over it until reasonable, save it as a kind of meta program, then instruct it to implement step by step.

5. Vibe-Coding for Exploration

"Fully give in to the vibes" for weekend projects and learning. Do not read implementation details. Deploy and test -- human takes over when needed.

6. Non-Negotiable Testing

"You absolutely cannot outsource to the machine testing that the code actually works." This is the one thing that must stay human.

7. Provide Documentation for Training Cutoff Gaps

Models trained on data from months ago will not know about breaking library changes. Workaround: provide recent examples, documentation snippets, or changelog entries in prompts.

Source: Here's how I use LLMs to help me write code

Tools He Built and Uses

`llm` -- CLI for Large Language Models

Command-line tool and Python library for interacting with OpenAI, Anthropic, Google, Meta, and local models
Plugin system for model providers (Claude, Gemini, Ollama, Mistral, etc.)
Logs all prompts and responses to SQLite -- explorable with Datasette
Version 0.26 added tool support (LLMs can call Python functions)
Supports RAG workflows as bash scripts against local SQLite databases
Install: pip install llm or brew install llm
GitHub: simonw/llm | Docs: llm.datasette.io

`files-to-prompt` -- Directory-to-Prompt Converter

Turns a whole directory of code into a single prompt ready to pipe into an LLM
-m/--markdown option for Markdown output with fenced code blocks
Supports reading file lists from stdin
GitHub: simonw/files-to-prompt

`shot-scraper` -- Browser Automation for Agents

Takes screenshots and executes JavaScript against web pages via headless Chrome (Playwright)
Useful in AGENTS.md for giving agents visual feedback
gh: prefix loads scripts from GitHub
simonwillison.net/tags/shot-scraper

`llm-prompts` -- Reusable Prompt Collection

A collection of prompts for use with the LLM CLI tool
GitHub: simonw/llm-prompts

Datasette

His flagship project: a tool for exploring and publishing data in SQLite databases
Pairs with llm for logging/analyzing LLM usage patterns
datasette.io

Key Quotes

"Claude Code is, with hindsight, poorly named -- it's not purely a coding tool: it's a tool for general computer automation."

"Agents are models using tools in a loop."

"The biggest advantage is speed of development" -- enabling shipping of projects that would not have been justified building manually.

"One of the new skills required to get the most out of AI-assisted coding tools is designing agentic loops: carefully selecting tools to run in a loop to achieve a specified goal."

"Getting good results out of a coding agent feels uncomfortably close to getting good results out of a human collaborator."

"Anyone who can get their tokens into your context should be considered to have full control over what your agent does next."

"Context pollution is why I rarely used MCP, now that it's solved there's no reason not to hook up dozens or even hundreds of MCPs to Claude Code."

"I find myself instinctively thinking 'neat feature idea, not worth the time it will take to build and maintain it though' -- and then prompting Claude Code anyway, because my 25+ years of intuitions don't match reality any more."

"You absolutely cannot outsource to the machine testing that the code actually works."

"A friend called Claude Code catnip for programmers and it really feels like this."

Source Links

Blog Posts (simonwillison.net)

Agents are models using tools in a loop -- May 2025
Agentic coding: The future of software development with agents -- June 2025
The lethal trifecta for AI agents -- June 2025
Designing agentic loops -- September 2025
Embracing the parallel coding agent lifestyle -- October 2025
Vibe engineering -- October 2025
Claude Skills are awesome, maybe a bigger deal than MCP -- October 2025
Living dangerously with Claude -- October 2025
Code research projects with async coding agents -- November 2025
2025: The year in LLMs -- December 2025
First impressions of Claude Cowork -- January 2026
Here's how I use LLMs to help me write code -- March 2025

Table of Contents​

Core Philosophy​

Defining Agents: "Tools in a Loop"​

Vibe Coding vs. Vibe Engineering​

The 12 Practices of Vibe Engineering​

His Actual Daily Setup​

Primary Tools (as of late 2025 / early 2026)​

How He Runs Multiple Agents​

Claude Code as General Agent​

Parallel Coding Agents​

Four Key Application Patterns​

The "Send Out a Scout" Pattern​

Async Coding Agents (Fire-and-Forget)​

Workflow​

Why Async Agents Are Compelling​

Honest Assessment​

Designing Agentic Loops​

When Agentic Loops Shine​

Tool Selection Strategy​

Critical Amplifier​

YOLO Mode Trade-offs​

Credential Scoping Pattern​

Security: The Lethal Trifecta and Sandboxing​

The Lethal Trifecta​

Sandboxing Recommendations​

Technical Implementation (macOS)​

MCP Context Pollution Solved (January 2026)​

Context Management​

Principles​

From Prompt Engineering to Context Engineering​

Context Problems to Avoid​

Claude Skills vs. MCP​

What Skills Are​

Why Willison Thinks They Are a Bigger Deal Than MCP​

The General Agent Pattern via Skills​

Practical Configuration: AGENTS.md and CLAUDE.md​

Practical Techniques for LLM-Assisted Coding​

1. The Authoritarian Approach (Production Code)​

2. Iterative Refinement Over Perfect First Prompts​

3. Strategic Example Provision​

4. Plan-Then-Execute for Larger Changes​

5. Vibe-Coding for Exploration​

6. Non-Negotiable Testing​

7. Provide Documentation for Training Cutoff Gaps​

Tools He Built and Uses​

llm -- CLI for Large Language Models​

files-to-prompt -- Directory-to-Prompt Converter​

shot-scraper -- Browser Automation for Agents​

llm-prompts -- Reusable Prompt Collection​

Datasette​

Key Quotes​

Source Links​

Blog Posts (simonwillison.net)​

Tag Pages​

Newsletter (Substack)​

Twitter/X Posts​

Other​

Table of Contents