This week I chased where AI gets real: agents that act (and misfire if you dangerously-skip-permissions), banks and portals going multi-agent, kernels and hardwired chips outrunning compilers, and practical local setups. Thread through it all: ditch AGI fantasies and token‑miser guilt—progress now is judgment, verification, and guardrails.

Others

  • ‘Claude Code Dangerously-Skip-Permissions: Why It”s Tempting, Why It”s Dangerous’: Thomas Wiegold warns that Claude Code’s —dangerously-skip-permissions is tempting but perilous: it auto-approves shell, file, network, and subagent actions, causing disasters like rm -rf nuking home or root paths. Use containers or VMs with network limits, never your host. Add git checkpoints, tight scoping, budget caps, disallow rm, and request changelogs. Prefer acceptEdits, allowedTools, plan mode, and hooks—watch a bug where bypass overrides plan.
  • ‘How Property Portals Are Rewriting Themselves for the AI Era’: Property portals are rewriting for the AI era: embedding in ChatGPT to capture intent yet route users back, and overhauling buyer and seller flows. Buyer side adds natural language and voice search, conversational help, personalization, AVMs, neighborhood intelligence, and rich virtual tours. Seller side adds automated listing and valuation, lead scoring, fraud detection, and agent dashboards. Aim: better service, new acquisition channels, and meeting higher user expectations.
  • ‘Taalas HC1 hardwired Llama-3.1 8B AI accelerator delivers up to 17,000 tokens/s’: Taalas HC1 is a hardwired Llama-3.1 8B accelerator reaching up to 17k tokens/s—about 10x faster, 20x cheaper to build, and 10x lower power than Cerebras. It unifies compute and storage on-chip, but is limited to the embedded model (with adjustable context and LoRA fine-tuning). Built on TSMC 6nm (815mm2, 53B trans), for 2.5 kW servers. Online demo shows ~15–20k tokens/s. A new mid-sized LLM and faster HC2 are planned this year.
  • ‘Why I am not AGI-pilled (and you probably shouldn’t be either)’: The author is pro AI but rejects AGI hype: human intelligence isn’t general, so chasing a monolithic “God model” is a fallacy and bad engineering. Real capability comes from compound AI—specialized agents orchestrated by LLMs with tools (RAG, code, calculators)—which is safer, more scalable, and enables continuous learning. Hard takeoff fantasies drive labs, but generality is a system property, not a single model.

Philosophy

  • ‘Romper Tablillas’: The essay argues that fear of “wasting tokens” with LLMs is misplaced: like scribes’ broken tablets or Athenian ostraka, apparent waste is training. Judgment comes from exploratory repetition; counting tokens is like counting broken tablets—technically right, strategically blind. LLMs are a medium that shapes us; inhabit them and practice without fixating on outputs, or risk exclusion from what matters.

Data Science

  • ‘Sexo, Mentiras Y Estadísticas Del INE: El Misterioso Caso De Las Niñas Que No Nacían’: Spain’s 1975–2000 spike in boys at birth (~109 per 100 girls) wasn’t real. Bagues shows INE birth stats had systematic processing/coding errors, causing implausible monthly swings and bias that misclassified girls as boys. Census cohorts show normal ratios (~105–106). Birth order and weight also flawed. Many studies used these data. Lesson: cross-check admin sources; if fixing is impossible, agencies should add clear warnings.
  • ‘The Return of the Data Scientists’: With code generation cheap and frontier models shared, advantage shifts to judgment, product taste, and context. Data scientists who bridge business and tech—via experimental design, metrics, and evaluation—become decisive. Avoid vanity metrics; define bespoke quality standards. Replace vibe checks with principled, automated experiments and feedback loops. Teams that count experiments and iterate quickly will win.

AI

  • ‘LLMs Can Now Write GPU Kernels That Beat torch.compile’: LLMs now generate CUDA/Triton kernels that beat torch.compile on real models, delivering 2–14x gains and bypassing compiler limits (graph breaks, no new algorithms, hardware tricks). KernelBench validates progress; research favors multi‑agent coder‑judge loops and heavy test‑time search. Meta’s KernelEvolve/KernelLLM run this at scale. Forge scales to 32 agents with MAP‑Elites and RAG. Strong verification is crucial (Sakana lesson). Core rules: iterate, use multi‑agents, scale compute.
  • ‘Del Mensaje a La Acción: Así Operan Los Agentes Especializados en Un Asistente De IA Multi-Agente’: BBVA’s Blue uses a multi-agent setup: a triage agent routes intents to specialized agents for 150+ banking tasks. LLMs collect info and handle clarifications or lists; a separate system presents and executes actions, with explicit confirmation for money moves. Some tasks are end to end in chat, like Bizum with contact filtering; others provide app links. Aim: expand to fully conversational coverage.
  • ‘How to Run Local LLMs With Claude Code’: A step-by-step guide to run local LLMs with Claude Code using llama.cpp and llama-server, exposing an OpenAI-compatible endpoint on port 8001. It demos GLM-4.7-Flash 30B MoE (fits 24GB) with Unsloth, and you can swap in DeepSeek, Qwen, or Gemma. Covers building llama.cpp with/without CUDA, launching with temp 1.0, top_p 0.95, —jinja, and q8_0 KV cache to fit a 24GB GPU (RTX 4090). Claude Code is Anthropic’s terminal agent for coding and Git.
  • ‘💜Qwen3.5 - How to Run Locally Guide’: Guide to running Qwen3.5 locally: models 27B, 35B-A3B, 122B-A10B, 397B-A17B. Multimodal hybrid reasoning, 256K context, 201 languages, thinking/non-thinking, excels at coding, vision, chat, long context. 27B/35B run on ~21GB RAM Macs. Ensure VRAM+RAM exceeds quantized model size; else llama.cpp offloads to SSD and slows. Choose 27B for slightly better accuracy if 35B won’t fit; 35B-A3B for faster inference. presence_penalty cuts repetition but may reduce quality.