Wrapped-up Readings 2025-10-31

AI

‘CometJacking: How One Click Can Turn Perplexity’s Comet AI Browser Against You’: LayerX exposed “CometJacking,” a flaw in Perplexity’s Comet agentic browser where a single crafted URL can inject hidden commands. By abusing query parameters (e.g., forcing memory/connector access) and trivial encoding like base64, attackers hijack the AI, exfiltrating emails, calendar data, and more to remote servers. The threat shifts browser security from phishing to agent hijack and command execution.
‘Con Inteligencia Artificial Habrá Menos Asimetría De La Información Y También Más Ayudas Sociales’: AI is cutting information asymmetry: people already use ChatGPT to parse bank or insurance contracts, draft claims, and check product requirements to control spending. As customers compare, negotiate, and spot abusive clauses, margins of info‑dependent businesses erode. Moving from search to proactive push, next‑gen AI could learn habits and alert users to savings and advantages, improving personal and financial decisions.
‘Making Knowledge Machine-Readable’: LightOn introduces LightOnOCR-1B, a 1B vision-language, end-to-end OCR that converts documents into machine-readable, semantically rich text at high speed. It matches or beats larger models on Olmo-Bench, runs faster than dots.ocr, PaddleOCR-VL, and DeepSeekOCR, and handles complex layouts. Trained via distillation from Qwen2-VL-72B on a 17.6M-page corpus to be open-sourced, it is easy to fine-tune, offers pruned vocab variants, and powers enterprise search at scale.
‘Advancing Claude for Financial Services’: Anthropic is advancing Claude for Financial Services with a beta Claude for Excel sidebar that can read, analyze, edit, and create workbooks with transparent change tracking; new real-time connectors (Aiera/Third Bridge, Chronograph, Egnyte, LSEG, Moody’s, MT Newswires); and six Agent Skills (comps, DCFs, diligence packs, teasers/profiles, earnings analyses, initiating coverage). Built on Sonnet 4.5, which leads Vals AI’s Finance Agent benchmark at 55.3% accuracy.
‘DeepSeek Cuts Inference Costs, OpenAI Tightens Ties With AMD, Thinking Machines Simplifies Fine-Tuning, Robots Improve Spatial Awareness’: The Batch highlights: disciplined evals and error analysis, not trendy hacks, best accelerate agentic AI; generative models’ richer outputs require prototyping and iteratively tuned metrics, sometimes with LLM-as-judge. DeepSeek’s V3.2-Exp uses dynamic sparse attention to speed long-context inference and cut costs 6-7x, with mixed benchmark trade-offs and support for Chinese chips. Thinking Machines’ Tinker API streamlines multi-GPU fine-tuning via LoRA.
‘Ling-1t Leads Non-Reasoning Performance, MCP Poses Security Risks, California Regulates AI, Auto-Tune for Agentic Prompts’: The Batch covers: practical error analysis for agentic AI—inspect traces, benchmark to human level, and iteratively refine steps or simplify pipelines as models improve. Ant Group’s open 1T-parameter Ling-1T, trained with chain-of-thought signals, leads non-reasoning benchmarks; Ring-1T ranks highly among reasoning models. MCP tool chaining raises compositional security risk. California enacts disclosure, labeling, and liability AI laws. GEPA auto-tunes prompts, beating RL with fewer runs.
‘EA’s Attempt to Use AI for Game Development Backfiring Horribly’: Video game makers are rushing into generative AI, but at EA the rollout is backfiring: tools produce buggy, hallucinated code that creates extra work amid layoffs and crunch. Staff fear training systems that could replace them and mock leadership’s AI zeal, underscoring a big exec-worker adoption gap. EA hails AI as core yet warns of legal and reputational risks. Players and many devs also push back, citing creepy demos and loss of human touch.
‘Streaming Datasets: 100x More Efficient’: Hugging Face revamped dataset streaming to prevent request storms and speed large-scale training. Backward-compatible (streaming=True), it adds a shared file-list cache and optimized resolution, cutting startup requests up to 100x and file resolution 10x. Parquet prefetching and tunable buffering raise throughput up to 2x. Xet dedupe and Parquet CDC speed uploads; pyspark_huggingface and an improved HfFileSystem power custom pipelines.
‘Parlant vs LangGraph’: Parlant contrasts with LangGraph: graphs and router supervisors excel at explicit workflows but falter in free-form chat as isolated agents miss intertwined, recurring topics, causing gaps or hallucinations. Parlant dynamically assembles only relevant guidelines per turn to ground multi-topic replies while limiting context. Use LangGraph for scripted, staged flows; Parlant for natural, compliance‑critical dialogue—or combine them (Parlant for conversation, LangGraph for backend orchestration).
‘Reconozcamos Que No Sabemos Cómo La Inteligencia Artificial Está Afectando Al Empleo’: AI’s job impact is uncertain: surveys show no broad unemployment, though digital freelancers and US junior devs (~20% down since 2022) are hit. Effects look like task shifts and productivity gains amid other forces. The web is turning into TV—video-heavy, concentrated, attention-short—yet bottom-up creators and AI video expand access despite deepfake risks. China rises as no.2; Alibaba’s Qwen leads open models that firms like Airbnb favor for control.

Real estate

‘CoStar Group CEO Andy Florance on Zillow, M&A, and Domain”s Key Lime Pie Problem’: CoStar CEO Andy Florance says challengers can dethrone leaders when incumbents mistreat customers, criticizing Zillow’s sloppy operations and hinting at cynical content-scraping incentives. On buying Domain, he aims to refocus on a superior core portal in Australia versus REA, leveraging CoStar’s product team. He blames Domain’s drift on low-yield distractions and obsolete rules, urging removal of decade-old directives to streamline.

Economics

‘Is This the New ‘Scariest Chart in the World’?’: Derek Thompson critiques a viral “scariest chart” showing stocks soaring while job openings fall. The split is real, but he argues hiring slowed mainly due to Fed rate hikes, tariffs, and immigration limits—not AI. Information has the smallest drop in postings; manufacturing, construction, and energy the largest. Meanwhile, AI-driven spending and profits fuel most market gains, creating a two-speed economy: booming AI, sluggish rest.

Software Engineering

‘Understanding Spec-Driven-Development: Kiro, Spec-Kit, and Tessl’: Birgitta Böckeler examines Spec-Driven Development: write specs before AI code, with three levels—spec-first, spec-anchored, spec-as-source. Specs are structured natural language, distinct from a memory bank. Kiro is lightweight and spec-first; GitHub’s spec-kit relies on a constitution but feels spec-first; Tessl pursues spec-anchored/as-source. She finds SDD verbose, hard to size, and agents unreliable, yet sees spec-first as useful and in demand.
‘Flink is 95% problem’: Flink is a 95% problem: most real-time needs don’t justify its complexity. ~65% fit HTTP+Postgres, ~25% suit OLAP (e.g., ClickHouse), ~5% go custom; only ~5% need Flink-level guarantees. Ultra-low latency and exactly-once are rare/expensive; SQL handles windows/CEP well enough. Flink adds heavy ops (Kafka, k8s/ZooKeeper), config sprawl, testing pains, Java lock-in, and weak traction. Prefer Kafka libs + ClickHouse; reserve Flink for edge cases.

Management

‘The Pendulum Has Swung: Remote Isn’t Sacred Anymore’: Remote work isn’t sacred—it’s a tool. Some teams thrive remote-first, others in-office. Factorial now mandates five days on-site (one flex), echoing leaders who prize in-person energy, though it can hurt work-life balance. Remote can hinder juniors, spontaneity, and culture. Hybrid works only with clear, consistent rituals and fixed days. Still, remote is an edge for lean teams if built async-first. Know your tradeoffs and commit.

Pelayo Arbués

Explorer

Recent Notes

Self-proclaimed experts

My failure resume

Tres Millones de viviendas

Wrapped-up Readings 2025-10-31

AI

Real estate

Economics

Software Engineering

Management

Graph View

Table of Contents

Now Reading

My Week in New York: The Future of Code Arrived Early