This week I’m watching the agentic turn go from lab to production: AI code factories, Claude’s PR reviews, Groq‑speed QA—and Amazon’s outages and our own “brain fry” as cautionary tales. To do it right, I leaned on Double ML’s intuition, finance‑grade data contracts, crowdsourced context for coding agents, and Idealista’s ChatGPT app as a glimpse of everyday utility.

Real estate

  • ‘idealista lanza su app en ChatGPT’: Idealista launched an app inside ChatGPT, becoming the first Spanish real estate platform there. Users can state needs in natural language and ChatGPT will search Idealista’s listings in Spain, Portugal and Italy, showing photos, prices, descriptions and map views for homes, rooms, garages, retail spaces and land. Spokesperson Francisco Iñareta says this advances the company’s AI-driven strategy to simplify finding a home.

Data Science

  • ‘The Intuition Behind Double ML’: Aaron Pickering explains Double ML intuitively: when confounders (e.g., season) drive both marketing and sales, naive regressions can flip the sign. Use DAGs to see the bias, then partial out confounders by predicting marketing and sales from them, subtracting predictions (residuals), and regressing residuals on residuals to get an unbiased effect. This mirrors Frisch–Waugh–Lovell; in simple cases controls suffice, but DML scales to complex, nonlinear settings.
  • ‘5 Ready-to-Use Finance Templates’: In finance, machine-readable data contracts define structure, validation, and governance between data producers and consumers, making schema, freshness, and quality rules testable in pipelines. Enforced during ingestion, transforms, or CI, they add checks, block bad data and updates, trigger alerts, and flag invalid datasets. This shifts teams to predictable ops amid strict regulation, complex systems, high stakes, and rising ML reliance.

Software Engineering

  • ‘Software Factories and the Agentic Moment’: They built a Software Factory for non-interactive development: agents, driven by specs and scenarios, generate and validate code without human writing or review. Tests proved rigid and hackable, so they use LLM-judged scenarios and a probabilistic satisfaction metric. A Digital Twin Universe of Okta, Jira, Slack, Google Docs/Drive/Sheets enables safe, high-volume validation, reshaping software economics and urging deliberate naivete.
  • ‘Code Review for Claude Code’: Anthropic unveils Code Review for Claude Code: a multi-agent, depth-first PR reviewer that finds, verifies, and ranks bugs, posting a concise overview plus inline comments. It raised substantive feedback from 16% to 54%, takes ~20 min, and scales with PR size; humans still approve. Internal runs show <1% false positives and critical catches. Priced by tokens (typically 15–25), with org caps, repo-level controls, and analytics.

AI

  • ‘The Shape of the Thing’: Mollick argues AI has moved from prompt-based co-intelligence to managed agents that autonomously do complex work, with rapid gains on key benchmarks. Experiments like StrongDM’s AI-only software factory signal new operating models as thresholds trigger volatile shifts in markets, jobs, and policy. With recursive self-improvement on lab roadmaps, acceleration is likely; the window to shape AI’s use is open now.
  • ‘Crowdsourced Context for Coding Agents’: deeplearning.ai introduces Context Hub (chub), an open tool that gives coding agents up to date API docs to avoid outdated calls and hallucinated parameters. Even top models may use old OpenAI endpoints or miss newer tools. Agents can use chub via CLI or a skill, save notes/workarounds to improve over time, and soon share findings across agents. The repo includes docs for major LLMs, databases, payments, identity, messaging, and invites community contributions.

Technology

  • ‘Ship Fast, Break Nothing: Autonoma and Groq Are Rewriting the Rules of Software Quality’: Autonoma, founded by ex-Google engineers, uses AI agents to simulate real users and validate mobile/web apps, slashing regression testing from days to minutes and catching bugs before release. Migrating to GroqCloud cut latency to milliseconds, enabling real-time test creation and massive, spiky workloads. Stack: Llama 4 Maverick 17B-128E and Qwen QwQ 32B. Impact: hundreds of thousands of tests weekly, 20+ enterprise clients, 5000+ Vercel installs, about 10x the nearest rival.

Others

  • ‘Amazon Admits Extensive AI Use Is Wreaking Havoc on Its Core Business’: Amazon acknowledged that AI-assisted coding has caused major outages in its retail site and AWS, including a six-hour crash and an incident where its in-house tool rebuilt an entire environment. Rather than retreat, it plans tighter guardrails and oversight. Even as it fires thousands and pushes widespread AI use, it aims for 80% of developers to use AI weekly—more AI, more oversight, fewer humans.
  • ‘When Using AI Leads to “Brain Fry”’: Firms pushing multi-agent AI are triggering AI brain fry: acute mental fatigue from heavy oversight. A survey of 1,488 workers found oversight and AI-raised workload increase cognitive load, info overload, decision fatigue, errors, and quitting intent; productivity drops past three tools. Using AI to offload routine tasks lowers burnout. Mitigate with clear strategy, training, integrated workflows, capped oversight, and upskilling.
  • ‘La Bonilista — El Nuevo 10x Y Las Hostias Que Se Vienen 🥊’: AI is reshaping work across functions: the new 10x are the curious who go all-in, widening a gap with cautious adopters. Seniority by old craft is obsolete; execution is automating and engineers will orchestrate agents. Fast validation replaces handoffs; PRDs become specs for AI. Hybrid, product‑minded builders win. Edge shifts from technique to context, taste, intuition, communication, and grit. The hits—and opportunities—will fall on people more than companies.
  • ‘AI Was Supposed to Free My Time. It Consumed It.’: Katie Parrott describes how an AI assistant swallowed a day and why: AI feels rewarding and urgent, intensifying work through task expansion, blurred boundaries, and multitasking. Variable, slot‑machine-like rewards and FOBO (“AI genie,” epistemic rabbit holes) keep us prompting. She urges an AI practice—pauses, batching, protected focus—and analog rituals and candid talk to reclaim limits and remember: you’re doing enough.