The Tool That Lets You Switch Models Without Losing Your Place

rw-book-cover

Metadata

Author: Source Code
Full Title: The Tool That Lets You Switch Models Without Losing Your Place
URL: https://every.to/source-code/the-tool-that-lets-you-switch-models-without-losing-your-place

Highlights

Kieran Klaassen is not an easy man to convert. As general manager of Every’s AI email assistant Cora, Kieran has become a steadfast Claude Code devotee—including building an entirely new engineering system around the tool. So no one at Every was surprised when he first tried Droid, the agentic coding product from Factory, and proclaimed himself “unimpressed.” (View Highlight)
But he kept trying because he discovered Droid had subagents—specialized AI workers configured for specific tasks. He could replicate the engineering process he designed in Claude Code in Droid without learning a different system. Once he realized Droid could do that—and that it would let him pick which model to use for which task—he used it to ship a feature in Every’s AI writing partner Spiral (an app he’d never worked on before) in less than two hours. (View Highlight)
That’s what Droid offers: an AI agent that lets you switch between GPT and Claude mid-task, picking the best model for each phase of work. Unlike Claude Code (Anthropic only) or Codex’s command line interface (OpenAI only), Droid works with multiple models from different providers. GPT-5 and Claude Sonnet 4.5 each have distinct strengths. Until now, in order to switch between those models, you had to switch tools. (View Highlight)
If there was one person who wasn’t surprised by Kieran’s conversion, it was Danny Aziz, general manager of Spiral and Droid’s biggest evangelist at Every. Danny canceled both his Claude and ChatGPT Max plans for Droid, became Factory’s top early-access user, and built nearly all of the newest version of Spiral using its multi-model workflows. (View Highlight)
If Danny and Kieran represent the developer side of Droid, Factory’s head of developer relations Ben Tossell represents proof that you don’t need to code to get value from AI agents like Droid. He can’t write a single line of code, but uses Droid as his default interface for everything from analyzing monthly financials to downloading YouTube transcripts. Where Danny and Kieran use Droid to build features faster, Ben uses it to automate tasks he’d otherwise do manually. (View Highlight)
Danny, Kieran, and Ben joined Every CEO Dan Shipper last week in Every’s Droid Camp to share how they use Droid in production, answer subscriber questions, and demonstrate workflows you can start using today. The event was for paid subscribers, but there was so much useful discussion and knowledge sharing that we’re posting the key takeaways for anyone who missed it. You’ll learn how to orchestrate multiple AI models in production—plus see real examples of developers and non-coders doing exactly that. (View Highlight)
Here’s what makes Droid worth your attention:
1. Switch models mid-task to match the work. Use GPT for long research and planning, then switch to Claude for implementation—all in the same terminal session.
2. Start with one model, scale up. Ben runs most tasks in a single conversation thread. Danny orchestrates multiple models across separate terminal panes.
3. It’s great at non-technical use cases. Droid handles data analysis, file management, and automation tasks just as well as code.
4. Context moves between models. When you switch from GPT to Claude, Droid compresses your conversation history and carries it forward so the new model understands what you’ve been working on. (View Highlight)
Droid is a command line AI agent—software that can read files, write code, run commands, and complete tasks on your behalf. You might also hear it referred to as a harness: the software layer that packages an AI model into a usable tool. The harness determines how the model interacts with your code, what tools it can access, and how it presents information back to you. A good harness can make the same model perform better by giving it better context and more effective tools to work with. (View Highlight)
Unlike most similar products, which are made by the companies that build the AI models (like Anthropic’s Claude Code or OpenAI’s Codex CLI), Droid works with multiple models from different providers and lets you switch between them with a single command. When he was building Spiral’s latest version, Danny got into the habit of running most of his workflow across multiple terminal panes that play to models’ different strengths: GPT-4 handles research and planning, Claude Haiku implements the bulk of the code, and Claude Sonnet refines the details. Danny never has to leave his terminal or copy files between tools—he simply picks the best model for each phase of work. (View Highlight)
Droid consistently ranks near the top of SWE-bench, a benchmark that measures how well AI agents solve software engineering tasks. According to Ben, the Factory engineering team attributes this to several design choices: reliable error handling that doesn’t fill your context window with repeated failures, built-in system reminders that keep models on track, the ability to pair smaller models for planning with larger ones for execution, and files where you can save notes that Droid will read to keep track of your preferences (a feature Claude Code also has). (View Highlight)
Ben runs Droid the way most people use ChatGPT, but with access to his file system. His terminal typically has six tabs running simultaneously: one analyzing Factory’s monthly finances, another helping him write documentation, a third working through tutorial scripts, and a fourth teaching him technical concepts he doesn’t understand yet. (View Highlight)
When he notices himself doing something manually that could be automated, he asks Droid to handle it, then saves the commands it used as a reusable workflow. Take downloading YouTube videos, for example. In ChatGPT, extracting a transcript requires multiple copy-paste steps. In Droid, he typed one instruction: “download the My First Million episode about Grindr, extract the transcript, save it to a folder.” Droid found the right command line tools, ran them in sequence, and confirmed completion. (View Highlight)
Danny showed the most sophisticated setup of the session. He had multiple terminal panes open, each running a different model on the same codebase:
1. Left pane: GPT-4 Codex in spec mode (a phase where the AI documents how a feature should work), researching how Cora renders email briefs
2. Right pane: Claude Haiku implementing the feature
3. Middle pane: Claude Sonnet 4.5 refining the implementation (View Highlight)
Danny was building a feature that allowed Cora to detect a user’s scrolling behavior. When someone reaches the bottom of their morning Brief, Cora should automatically jump to their afternoon Brief from the same day. It’s a simple premise with messy implementation because it required understanding how Cora’s existing scroll logic worked, implementing new detection code, and polishing the user experience. (View Highlight)
Danny walked through each step:
1. Research phase (GPT-4 Codex): “Look at the web app for how briefs are rendered.” GPT analyzed the codebase (all the code files that make up the application) and documented the existing behavior in a markdown file (a simple text file that developers use for documentation, with basic formatting like headers and bullet points).
2. Planning phase (GPT-4 Codex): “Based on the research, write a spec for scroll detection.” GPT wrote a detailed plan, which Danny saved as a file.
3. Implementation phase (Claude Haiku): In a new pane, Danny pointed Haiku to the spec file and told it to build. Haiku wrote the code.
4. Refinement phase (Claude Sonnet): When the first attempt didn’t work perfectly, Danny switched to Sonnet for debugging and polish. (View Highlight)
His approach combined custom commands with strategic model selection:
1. Plan with GPT-4 Codex: He used his custom /plan command (ported from Claude Code) to generate a 20-minute research session. GPT analyzed Spiral’s message architecture, proposed a solution, and documented what would need to work for the feature to be considered complete.
2. Execute with Claude Sonnet: He copied the plan to a new terminal session, switched to Claude, and ran his /work command. Claude implemented everything—back-end logic, front-end button, database changes—in one pass.
3. Test: Kieran booted up Spiral locally. The rewind button appeared and worked on first click. (View Highlight)
Ben’s automated financial reporting is a case in point in how tools like Droid can be useful even for non-coding tasks. He keeps his P&L in a markdown file that defines categories (income, tax, ads, contractors, software) and a Python script that processes bank statements. Every month, he types one line: “Do October’s P&L.” Droid reads the markdown, executes the Python script, updates his master spreadsheet, and returns a summary, with just a simple instruction—and without any code—from Ben (View Highlight)

Pelayo Arbués

Explorer

Recent Notes

Self-proclaimed experts

My failure resume

Tres Millones de viviendas

The Tool That Lets You Switch Models Without Losing Your Place

Metadata

Highlights

Graph View

Table of Contents

Now Reading

New platform, familiar risks: Zillow and Expedia bet on OpenAI’s ChatGPT apps rollout