Wrapped-up Readings 2026-04-25

Frontier capability is speeding up while getting cheaper and more open: DeepSeek’s MIT‑licensed 1M‑token MoEs slash FLOPs/KV and brush frontier scale, a 27B dense model (Qwen3.6‑27B) edges past flagships, and early GPT‑5.5 jumps fit the AI Index’s tighter, data‑driven races. The action is sliding from engines to cars—agents, connectors, tool turns, real‑time APIs, native autonomy—but the road and driver still fail (fragile products, overreliance), so I’m bullish on capability and cautious on deployment.

AI

‘DeepSeek-V4: A Million-Token Context That Agents Can Actually Use’: DeepSeek-V4 launches two MoE models (Pro, Flash) with 1M-token context built for agents. V4 cuts per-token cost and KV cache: Pro needs 27% FLOPs and 10% KV vs V3.2; Flash 10% FLOPs and 7% KV. Gains come from interleaving Compressed Sparse Attention and Heavily Compressed Attention. It preserves reasoning across tool turns, adds a DSML token and XML tool-calls, and is RL-trained on DeepSeek Elastic Compute for scalable, long-running tool use.
‘DeepSeek V4 - Almost on the Frontier, a Fraction of the Price’: DeepSeek released V4 previews: V4-Pro and V4-Flash, both 1M-token MoE under MIT. Pro has 1.6T params (49B active), likely the largest open-weights model; Flash has 284B (13B). Weights are 865GB and 160GB; quantized local runs look feasible. Pricing is sharp: Flash 0.14/1M input, 0.28/1M output; Pro 1.74/1M and 3.48/1M—cheapest in their tiers. Efficiency vs V3.2: Pro 27% FLOPs/10% KV; Flash 10%/7%. Unsloth quantizations are expected soon.
‘A Pelican for GPT-5.5 via the Semi-Official Codex Backdoor API’: Simon Willison reports GPT-5.5 is out, fast, and highly capable—building exactly what he asks. For his Pelican benchmark he prefers API access to avoid hidden system prompts from ChatGPT or agent harnesses, using a semi-official Codex backdoor. OpenAI says API deployments need extra safeguards and that GPT-5.5 and GPT-5.5 Pro will arrive to the API soon.
‘Sign of the Future: GPT-5.5’: Ethan Mollick’s early access to GPT-5.5 shows a major leap, though the AI frontier remains jagged. OpenAI advanced models, apps (Codex desktop), and harnesses, including an image model that renders readable text and aces the Otter Test. GPT-5.5 Pro outperformed on a complex 3D coding task and is faster. With a few prompts it produced a near PhD-quality paper and a full tabletop RPG, but long-form fiction still lags. Overall, capability gains are accelerating.
‘Qwen3.6-27b: Flagship-Level Coding in a 27B Dense Model’: Qwen Team open-sources Qwen3.6-27B, a 27B dense multimodal model delivering flagship agentic coding. It beats the prior flagship Qwen3.5-397B-A17B across benchmarks (e.g., SWE-bench Verified 77.2, Pro 53.5; Terminal-Bench 59.3; SkillsBench 48.2) and scores 87.8 on GPQA Diamond. As dense and MoE-free it’s easy to deploy, supports vision-language and non-thinking modes, integrates with OpenClaw, Claude Code, Qwen Code, and is on Qwen Studio, API, and open weights.
‘Vibe Check: GPT-5.5 Has It All’: GPT-5.5 minimizes tradeoffs: much faster than Opus 4.7, strong writing, and top on a Senior Engineer Benchmark (62.5 vs low 30s), though best when executing Opus’s plans. A new pre-train makes it a fast, reliable workhorse for pro tasks and turning messy inputs into usable docs. Weaknesses: bland tone, Ruby, and product/design polish. OpenAI aims to reclaim code-and-work vs Anthropic. Pricing: 5/1M input, 30/1M output; Pro 30/180. Reviewers: one daily driver; another mixed.
‘An Update on Recent Claude Code Quality Reports’: Anthropic investigated Claude code quality drops and found three product-layer causes (API unaffected): a March 4 switch of default effort from high to medium (reverted Apr 7); a Mar 26 bug that repeatedly cleared reasoning after idle sessions, causing forgetfulness and higher usage (fixed Apr 10); and an Apr 16 anti-verbosity prompt that hurt coding (reverted Apr 20). Issues affected Sonnet/Opus 4.6 and Opus 4.7; all fixed in v2.1.116.
‘Artificial Intelligence Index Report’: Stanford’s AI Index 2026 finds rapid capability and adoption, tighter races among top models and between the U.S. and China, while responsible AI trails and incidents rise. Data quality beats scale; synthetic data aids post-training or narrow tasks. Compute and data centers boom with heavy TSMC reliance. Benchmarks tighten yet face flaws; agents and video models advance, robots lag. Investment and consumer value grow; early labor shifts and environmental costs emerge.
‘Anthropic and Amazon expand collaboration for up to 5 gigawatts of new compute’: Anthropic deepened its Amazon partnership, securing up to 5 GW of AWS compute for Claude, with Trainium2 ramping in H1 and scaled Trainium3 later this year, nearing 1 GW by end-2026 and options for future silicon. It will invest over 100 billion on AWS over 10 years. Claude will be available directly in AWS. Amazon invested 5 billion now, up to 20 billion more on top of 8 billion, and global inference will expand in Asia and Europe.
‘Why AI Doesn”t Make You Better’: AI amplifies you: it boosts strengths and broadcasts weaknesses. It reads/writes but hallucinates, mimics your style, and is trained to sound agreeable. On open-ended work this breeds comprehension debt as judgment is outsourced; you execute faster than you think. Output rises without understanding, causing rework and burnout. Real speed comes from accumulated context. Use AI deliberately; keep the thinking you need to own—learn the chords first.
‘Model Tree for Moonshotai/Kimi-K2.6’: Kimi K2.6 is an open-source, native multimodal agentic model for practical autonomy. It excels at long-horizon coding across languages and domains, converts prompts and visuals into production-ready UIs and lightweight full-stack workflows, scales to 300 sub-agents over 4,000 steps for parallel task orchestration, and powers persistent 24/7 agents that manage schedules, execute code, and coordinate cross-platform operations without human oversight.
‘Set Organization Preferences’: Admins on Team/Enterprise can set org-wide instructions Claude follows in every chat for tone, formatting, compliance, and domain context. Org preferences override user ones when they conflict; user settings fill gaps. Keep guidance concise, specific, consistent; avoid conflicts and safety overrides. Test in new chats and review regularly. Examples: team identity, formal tone, formatting limits, domain terms, referral rules, and data handling reminders.

Philosophy

‘Quoting Maggie Appleton’: Simon Willison quotes Maggie Appleton: Learning in public—via digital gardening, podcasting, or streaming—makes others see you as more competent than you are, which can lead to invites to exclusive events with high-achieving, interesting people. A standout side benefit of sharing work openly.
‘How to Do Great Work’: Paul Graham’s guide to great work: pick work where aptitude, deep interest, and big scope meet. Let intense curiosity drive you to the frontier, spot gaps, and chase outlier, unfashionable ideas. Work hard and consistently; start small, iterate, finish; take smart risks. Be earnest, optimistic, and intellectually honest; cut fluff. Find great collaborators and a small real audience. Use youth’s time and age’s knowledge. Aim for the best and let compounding help.
‘Buddhist Economics: How to Start Prioritizing People Over Products and Creativity Over Consumption’: Maria Popova presents E.F. Schumacher’s Buddhist Economics: a people-first alternative to materialist growth. It unites work and leisure, prizes creativity over consumption, and favors tools that enhance, not enslave. The goal is liberation and character—maximizing well-being with minimal consumption, local self-sufficiency, and Right Livelihood: a Middle Way between heedless growth and stagnant tradition.

Technology

‘New Connectors in Claude for Everyday Life’: Claude is expanding connectors beyond work tools to everyday apps like AllTrails, Instacart, Audible, TripAdvisor, Spotify, Uber, and TurboTax. Its 200+ connector directory lets you combine apps in one chat, with Claude suggesting the right tool and allowing refinements and choices when multiple fit. Connectors are on all plans (mobile beta), ad-free, protect your data, and require your confirmation before any booking or purchase.
‘OpenAI DevDay: Let’s Build Developer Tools, Not Digital God’: Simon Willison says this DevDay was truly for developers: automatic prompt caching (50% off) vs harder, cheaper options at rivals; the new Realtime API via WebSockets streams text/audio with tool use and barge-in; and a rebrand of fine-tuning as distillation, boosted by stored completions, evals, and image fine-tuning. He laments no Whisper Turbo mention and WebSocket proxy friction, cares little for AGI, and notes OpenAI’s huge funding and valuation that contrast with the tools-first message.

Data Science

‘Your AI Budget Is Being Wasted Right Now’: Companies are boosting AI spend on pre‑AI data pipelines, wasting up to 30% of IT budgets and causing stalled pilots and bad decisions. The fix is GenAI‑augmented data engineering: automated code generation, SQL translation across clouds, and real‑time quality monitoring to ensure reliable, fresh data. This frees engineers for architecture, needs executive sponsorship, and shifts teams from reactive maintenance to proactive, scalable pipelines that make AI pay off.

Economics

‘Jevons paradox’: Simon Willison considers how LLMs that generate working code might affect engineering jobs. If coding becomes cheaper, Jevons paradox may apply: efficiency lowers unit cost, which can boost demand for custom software so much that the overall need for professional software engineers rises rather than falls.

Pelayo Arbués

Explorer

Recent Notes

I am cooking again

The 10x Manager

2025 Reading Wrapped

Wrapped-up Readings 2026-04-25

AI

Philosophy

Technology

Data Science

Economics

Graph View

Table of Contents

Now Reading

Advisor Tool