Generating code is easier than ever now, but more code does not mean better products. As code generation costs tend to zero, judgment and taste become more critical than ever. (View Highlight)
Paul Graham recently posted: “In the AI age, taste will become even more important. When anyone can make anything, the big differentiator is what you choose to make.” Soon after, Greg Brockman, co-founder and president of OpenAI, followed suit: “taste is a new core skill.” (View Highlight)
For years, the hype cycle evolved next to data scientists. Prompt engineers were the future. Then AI engineers. Then vibe coders. And most recently agentic engineers.
Each wave put data scientists in an awkward position, in the “post-hype” dormancy, while sharpening experimental design, statistical reasoning, and evaluation methodology. They patiently learned how to translate business needs into measurable outcomes.
That patience is about to pay dividends. (View Highlight)
Sitting between the business and the technical, figuring out what to measure and why, is exactly what production AI demands the most today.
The upside for a data scientist with product sense is unprecedented. (View Highlight)
Think of frontier models as collective intelligence. Everyone has access to them, including your competitors, who are plugging into the same models you are.
But here, the frontier doesn’t have access to what makes your product yours. The only way to differentiate is through your business context and taste, and those are bespoke by definition. You need your own ways to experiment on that taste and ensure your AI complies with your own quality standards. (View Highlight)
The danger is picking a generic metric somebody else defined and using that to improve your product. Accuracy is meaningless if you are measuring the wrong thing. Data scientists learned to avoid this mistake a decade ago. The instinct to distrust vanity metrics and define your own quality standards is more important than ever. (View Highlight)
Bias for shipping is a great thing (as long as you are clear on what shippable means). Some people think experimenting is costly and unnecessary when you can just ship. If you are putting a finger in the wind and vibe-checking the quality of your product, you are still experimenting. You’re just running a very poor experiment.
When Opus 5 drops and you replace your model, you vibe-check before deploying. Another poor experiment.
Poor experiments compound into poor products. When problems inevitably come, it becomes impossibly difficult to diagnose and fix them. (View Highlight)
Vibe-checking works well enough for now, but the future won’t be so forgiving. As agents multiply and feed into one another, the surface area grows faster than any human can manually patrol. The people who try will burn out chasing problems they never saw coming.
As long as there is clarity on what “good” looks like for the business, product, and users, a good data scientist can turn these vibe checks into principled, automated experiments.
The first time, this takes a bit longer than a thorough vibe check. Every subsequent time, you run it in minutes and get a number you can trust. You have something to hill-climb on. (View Highlight)
The teams who succeed in the age of AI count experiments. They spend the majority of development time on evaluation and error analysis. They build feedback loops that compound. This makes it effortless to ride the frontier, swap in a new model, and get a clear understanding of how it impacts your business.
The person who can run a rapid experiment, measure the result in context, and make a sound decision based on the data is more valuable than ever. (View Highlight)