Frontier models usually come with tradeoffs. You get more depth, but less speed. More agency, but less control. Better code, but worse prose. The surprising thing about GPT-5.5, the new OpenAI model out today, is how few of those tradeoffs it asks you to make. (View Highlight)
It’s much faster than Opus 4.7, easier to collaborate with, better at writing than any OpenAI model we’ve used since GPT-4.5 and GPT-4o, and the strongest model we’ve tested on our new Senior Engineer Benchmark, which measures how well models can rewrite a slop-coded codebase the way a senior engineer would. (View Highlight)
On that benchmark, GPT-5.5 with extra high reasoning reached 62.5 on its best run, while Opus 4.7 at a similar reasoning level landed in the low 30s. For reference, human senior engineers score in the high 80s and low 90s. GPT-5.5 performed best, however, when it executed a plan written by Opus 4.7—curious. (View Highlight)
For a long time, OpenAI looked like it was trying to be everywhere at once: Sora for video, Atlas for browsing, consumer ChatGPT features, creative media tools, and whatever else might turn AI into the next mass-market platform. Meanwhile, Anthropic doubled down on work, and Claude became the default for coding agents, long-running engineering tasks, and professional workflows. (View Highlight)
GPT-5.5 gives OpenAI something it badly needed: a fast, capable workhorse model for the professional tasks where most AI use happens. (View Highlight)
GPT-5.5 is OpenAI’s clearest bid to reclaim the code-and-work narrative. It does not win everything. Opus 4.7 seems to write better plans and have a superior eye for design and product details. But GPT-5.5 is faster, steadier, and easier to trust for everyday professional work. (View Highlight)
OpenAI is pitching GPT-5.5 as a higher-capability model for complex work, especially tasks where stronger reasoning, higher reliability, and fewer retries yield a finished result faster and cheaper. (View Highlight)
API pricing is set at 5per1millioninputtokensand30 per 1 million output tokens for GPT-5.5, with GPT-5.5 Pro at 30and180. OpenAI’s argument is that for harder tasks, better reasoning and fewer retries can lower the cost per completed task even when the per-token price is higher. (View Highlight)
GPT-5.5 is built on a new pre-train—the broad, expensive training run that teaches the base model its underlying patterns before instruction tuning, tool use, and reasoning scaffolds are added in post-training. Post-training can make a model more obedient, safer, or more agentic. A new pre-train can change the model’s center of gravity. (View Highlight)
OpenAI had already made a strong case that it was competitive again with GPT-5.4, which used the same pre-train as earlier GPT-5.x models. Releasing a new pre-train now suggests it wants to keep pressure on Anthropic—betting that the next answer to Claude starts with a different base model underneath, not just better scaffolding around the same one. (View Highlight)
The most obvious change is speed. GPT-5.5 is much faster than Opus 4.7 in head-to-head tests, and conveys a low-friction competence. It is easier to iterate with, keep in the loop, and trust with everyday professional work. It also spends more time on planning and reviewing, asks more questions, and checks its work before moving on, especially at extra high reasoning. (View Highlight)
GPT-5.5 is good at turning messy inputs into orderly, usable outputs: dashboards, curricula, run-of-show documents, consulting prose, and transcript-grounded writing. But the new pre-train does not solve everything. It can still be bland, struggle with Ruby, and trail Opus 4.7 on PowerPoint presentations, spatial composition, and ambitious prototypes. (View Highlight)
“GPT-5.5 is my new daily driver. It’s what I reach for first on every coding task from vibe coding to serious engineering. And it’s my main model for most other agentic knowledge-work tasks from spreadsheets to research. It’s also the model I use by default in my OpenClaw setup.” (View Highlight)
“GPT-5.5 feels very capable, and you can see it thinking harder. The planning and review cycles are longer, and on the best tasks it feels similar to Opus 4.7, which I had called the best model so far. But I’m mixed on it for product work. It can build deep functionality, but the design doesn’t always come together. The details are often good; the whole can feel random. It’s strong in a way I respect, but not yet in a way that consistently inspires me. To be a daily driver, I need a model that’s very good in all things, not just one or a few. It needs to be better at starting from scratch and filling in the blanks while still following instructions closely.” (View Highlight)