Wrapped-up Readings 2025-05-23

Google’s IO 2025 event showcased advancements in AI, including the Gemini 2.5 Pro model, which enhances reasoning and multimodality. The event highlighted the integration of AI into various Google products, such as Search and Meet, and introduced new features like Project Astra for improved human-agent interaction. The focus was on personalizing AI experiences and enhancing productivity through advanced tools and models.

AI

‘Google I/O 2025: From Research to Reality’: At Google I/O 2025, Sundar Pichai introduced advancements in Google’s AI technology, highlighting the personalization of AI through “personal context” used in apps like Gmail for tailored responses. The new Gemini AI models enhance Google Search with intelligent, personalized features and enable complex queries in AI Mode. Gemini 2.5 models, including the speedy 2.5 Flash, improve reasoning and multimodality. Google’s state-of-the-art video model Veo 3 and image generation tool Imagen 4 foster creativity, while the Google Beam platform offers immersive 3D video experiences. AI integration in Google Meet and innovations like Project Astra enhance communication and interaction. Pichai also discussed Project Mariner’s advancements in agent ecosystems, emphasizing compatibility with the Gemini API and SDK.
‘Our Vision for Building a Universal AI Assistant’: Demis Hassabis outlines a vision for a universal AI assistant through projects like Mariner and Astra, which enhance multitasking and human-agent interaction. Project Mariner can perform multiple simultaneous tasks, such as research and bookings, while being integrated into the Gemini API to increase its capabilities. The upgraded Gemini 2.5 Pro aims to simulate world understanding akin to the human brain, empowering AI in areas like robotics. The ultimate goal is to enrich lives by transforming the Gemini app into an everyday AI assistant.
‘Gemma 3n model overview’: The Gemma 3n model from Google AI utilizes an innovative Matryoshka Transformer architecture, enabling smaller, nested sub-models within a larger model to perform inferences, thus reducing compute costs, response time, and energy usage. It supports audio, visual, and text inputs, with techniques like Per-Layer Embedding (PLE) caching and conditional parameter loading that minimize memory load by selectively activating parameters. Optimized for everyday devices, Gemma 3n offers open weights for commercial use, supports over 140 languages, and features a 32K token context. The model’s parameter-efficient design allows it to operate effectively on lower-resource devices by dynamically adjusting loaded parameters based on task requirements.
‘AI in Search: Going Beyond Information to Intelligence’: The article by Elizabeth Reid details enhancements in Google’s Search, focusing on AI Overviews and the newly introduced AI Mode. AI Overviews has successfully increased user satisfaction and search frequency by offering concise, helpful responses and relevant web links. In major markets like the U.S. and India, it has driven a significant rise in search usage. AI Mode, currently being rolled out, offers advanced features like deep reasoning, multimodality, and the ability to handle complex queries through techniques like query fan-out. It introduces Deep Search for extensive research capabilities and leverages Gemini’s capabilities for powerful performance. Further innovations include Project Astra’s visual search integration, offering real-time interactive capabilities with Google Lens, and personalized shopping experiences with virtual try-ons. AI Mode can also customize suggestions based on user history and integrate with Google apps for personalized recommendations. Additionally, it will assist with complex data analysis and visualization, enhancing productivity in areas like sports and finance.
‘Jules’: Simon Willison discusses the surge of AI coding assistants, notably mentioning Google’s Jules, now in beta preview after its December announcement. This follows Microsoft’s recent GitHub Copilot coding agent release and OpenAI’s Codex. These AI tools integrate with GitHub to submit pull requests, reflecting a trend where major tech companies are quickly deploying advanced coding assistants in the developer ecosystem.
‘nanoVLM: The Simplest Repository to Train Your VLM in Pure PyTorch’: nanoVLM is a simple, lightweight toolkit designed to enable the training of Vision Language Models (VLMs) using pure PyTorch, ideal for beginners or those interested in VLMs. It supports launching on free colab notebooks and focuses on the Visual Question Answering task. The toolkit uses the vision transformer based on Google’s SigLIP and a language model following the Llama 3 architecture. nanoVLM includes features like Mixed Precision, pixel shuffle for computational efficiency, and a structured training process. It aims to provide a foundational understanding of VLM components without competing with state-of-the-art models.
‘Marigold Computer Vision’: The article “Marigold Computer Vision,” authored by Hugging Face, highlights the development of Marigold, an affordable adaptation of diffusion-based image generators for advanced image analysis. This approach leverages large pretrained latent denoising diffusion image generators for various downstream tasks, such as monocular depth estimation, to produce 3D models from images. The CVPR 2024 paper outlines a foundational method for repurposing these technologies to enhance image processing capabilities.
‘MonoQwen2-VL-v0.1’: MonoQwen2-VL-v0.1 is a multimodal reranker finetuned with LoRA from Qwen2-VL-2B, designed to optimize image-query relevance using the MonoT5 objective. By inputting an image and query, the model outputs “True” if the image is relevant and “False” otherwise. This is achieved by calculating a relevancy score from the logits of these tokens. The score can be used to rerank candidates generated by a first-stage retriever or to filter them using a threshold, demonstrating its efficacy in multimodal relevance assessments.
’🚨 Google Adds Advanced Reasoning to Its Flagship Model, Launches a Coding Agent at I/O 2025’: At the I/O 2025 event, Google announced significant advancements to its flagship model by incorporating advanced reasoning capabilities and introduced a new coding agent. This development aims to enhance the model’s efficiency and capability in complex problem-solving tasks, potentially revolutionizing fields requiring advanced computational intelligence. The new coding agent is designed to streamline and automate programming tasks, suggesting Google’s continued focus on integrating AI technologies into real-world applications.
‘I Really Don”t Like ChatGPT”s New Memory Feature’: Simon Willison critiques ChatGPT’s new memory feature, emphasizing his frustration with its lack of user control. Previously, users could manage notes taken by the AI, but the current memory system automatically injects summaries of previous chats into new ones, affecting outputs in ways users cannot easily manage. Willison, a power user, asserts the importance of context control and suggests turning off the feature or archiving certain chats to mitigate these issues. He calls for more granular control and integration with ChatGPT’s “projects” feature, allowing tailored memory settings for specific projects, which would enhance usability without compromising precision and user autonomy.
‘Qwen2.5vl in Ollama’: Ollama has launched a new version of their vision support model with Qwen 2.5 VL, marking their first update since the overhaul. The Qwen 2.5 VL model, originally released on January 26, 2025, is now available as a packaged version and is known for its strong reputation in OCR capabilities. This update was highlighted on Simon Willison’s Weblog, where previous notes on the initial release are also mentioned.
‘ColPali: Efficient Document Retrieval With Vision Language Models 👀’: ColPali is an advanced document retrieval methodology leveraging Vision Language Models to enhance the efficiency of retrieving visually rich document information. It simplifies the complex indexing process by embedding entire page images, rather than relying on extensive text extraction and transformation. Utilizing the PaliGemma model and multi-vector retrieval techniques, ColPali constructs and stores detailed multi-vector representations from page patch embeddings. During querying, the approach allows comprehensive interaction between query terms and document patches, substantially improving retrieval speeds and accuracy, especially on visually intricate tasks. Trained using a large dataset from diverse sources, ColPali excels in the ViDoRe benchmark, outshining traditional retrieval systems, and offers innovative features like visual query patch matching for deeper document comprehension.
‘I”d Rather Read the Prompt’: In his article “I’d Rather Read the Prompt,” Clayton Ramsey critiques the reliance on large language models, such as ChatGPT, for writing and academic work. He argues that these models produce verbose, uninspired content lacking originality or meaningful human insight. Ramsey emphasizes the importance of communicating original thoughts, suggesting that using AI-generated text is worse than plagiarism, as it conveys no authentic human perspective. He warns that this reliance, particularly in educational and professional settings, undermines genuine learning and understanding. Ramsey argues that both creative writing and programming should involve personal insight and comprehension rather than being left to automated processes that result in hollow, error-prone outputs. Ultimately, he asserts that creative work should convey genuine personal experiences, and if it’s not worth the time to write authentically, it’s not worth reading.

Data Science

‘Spatial Machine Learning With Caret’: The text discusses the integration of the caret machine learning workflow with spatial data processing packages like blockCV and CAST, emphasizing handling spatial autocorrelation and extrapolation. It covers spatial cross-validation methods using blockCV::cv_spatial() and CAST::knndm(), which help prevent overly optimistic errors by managing training and testing data distribution. Feature selection and hyperparameter tuning enhance model generalizability, while CAST::aoa() is used to identify areas where model predictions are unreliable.
‘Measuring What Matters: Objective Metrics for Image Generation Assessment’: Evaluating image quality is complex, requiring clear, objective metrics to assess the creativity, realism, and style of AI-generated images. Human feedback is often biased and inconsistent, so metrics like Fréchet Inception Distance (FID) and Clip Maximum-Mean-Discrepancy (CMMD) are used to measure how close generated images are to real ones. Pruna’s open-source evaluation framework enables both single and pairwise assessments using various metrics. Users can customize these metrics for tailored evaluations, essential as AI images grow in use.

Economics

‘Goodhart”s Law Isn”t as Useful as You Might Think’: Cedric Chin argues that Goodhart’s Law, which warns against metrics manipulation, is less practical than Donald Wheeler’s approach from “Understanding Variation.” Wheeler’s perspective suggests three responses to target pressure: improving, distorting the system, or data. The solution involves making distortion difficult and allowing system improvements, requiring a focus on process inputs rather than outcome fixation. Chin highlights Amazon’s Weekly Business Review as an example of effective input metric management employing Statistical Process Control principles, enabling leadership to build a causal business model.
‘Pánico Moral Con La Inteligencia Artificial Y Que La Mayoría De Los Empleos Sean Cosas Del Pasado’: Antonio Ortiz discusses the moral panic surrounding AI and its impact on employment. The article highlights concerns about rising unemployment among recent US graduates, as AI increasingly replaces junior roles traditionally held by new graduates. Leading tech companies are reducing hiring or even laying off workers, such as Microsoft with 6000 employees. While some argue that AI-induced job losses are transient, others worry about rapid, irreversible changes. The EU’s regulatory response aims to balance competitiveness with precaution, yet critics say it’s driven by fear rather than current realities. The race for AI dominance is primarily between China and the US, pushing Europe to reconsider its regulatory stance.
‘The Fugu Guide to Jobs in a World of AI’: In “The Fugu Guide to Jobs in a World of AI,” Sangeet Paul Choudary discusses the shifting dynamics of labor as AI transforms scarcity in knowledge work. With AI handling routine tasks, the new constraints lie in areas such as trust, context, and judgment, where human skills remain crucial. Roles must adapt by focusing on interpretation and decision-making. The most valuable positions hold high contextual and economic value, created by understanding and resolving challenges AI cannot address. This strategic thinking rather than mere reskilling is key to thriving alongside AI.

Philosophy

‘Extending Minds With Generative AI’: Andy Clark’s article discusses the implications of generative AI and technological advancements on human cognition. He highlights historical fears of technology diminishing cognitive abilities, akin to past concerns about reading and writing. Today, worries persist that technology might replace rather than extend our minds, leading to reduced reliance on biological memory and creativity. Clark suggests that humans have always evolved as “extended minds,” synergistically incorporating non-biological resources. While new tools like AI pose challenges, they also offer opportunities to augment human intelligence. The key is developing skills to discern when and how to rely on these tools, ensuring they complement rather than overshadow innate cognitive capabilities. Education must adapt to teach these “metacognitive” skills, fostering informed interactions with AI to ensure technology serves human needs.
‘Quoting Neal Stephenson’: Simon Willison’s Weblog discusses the impact of technology on society, quoting Marshall McLuhan’s idea that every technological augmentation is also an amputation. The piece highlights the significant influence of AI, suggesting it brings about changes far beyond McLuhan’s predictions. Concerns are raised about AI systems, such as ChatGPT, as educators observe that students increasingly rely on them, potentially hindering learning. This phenomenon is highlighted as a major current worry regarding AI’s role in education.

Software Engineering

‘8 Powerful Steps for an AI-Assisted Development Workflow’: The article by Tsvetan Tsvetanov outlines an efficient AI-assisted development workflow in 8 steps, emphasizing the importance of human leadership and vertical slicing. The workflow involves breaking down requirements into small, deployable pieces known as vertical slices, focusing on design, and providing exhaustive examples for the AI to generate solutions. Human oversight in validating tests and implementation ensures code reliability. When challenges arise, humans adjust the process or refine steps, maintaining control over the AI’s contribution to development.
‘After Months of Coding With LLMs, I”m Going Back to Using My Brain’: After months of coding with large language models (LLMs), experienced developer Alberto Fortin decided to return to a more traditional approach, involving his own critical thinking. While using AI tools like Claude and Cursor to rewrite a system in Go and ClickHouse, Fortin noticed inconsistencies and redundancies in the generated code. This led him to reevaluate his methods, ultimately opting for a more hands-on approach—using pen and paper and coding initial drafts himself, treating AI as an assistant rather than the main creator.
‘Squelching Bad Vibe Coding’: Jeff Langr’s article discusses the challenges and strategies of AI-assisted development with an emphasis on improving code quality and test verification. Initially attempting test-driven development (TDD) with a large language model (LLM), Langr found it inadequate and shifted towards a method he terms AADV (AI-Assisted Development with Verification). This approach involves the CAX (Create, Assess, Execute) cycle, where both code and tests are generated, assessed, and executed until passing. The process emphasizes design guidelines to enhance code quality and compliance. Langr warns of the proliferation of low-quality AI-generated code but argues that combining design principles and testing can improve outcomes.

Management

‘The Role of the Chief AI Officer (CAIO): A Guide to Leading the Transformation Toward Artificial Intelligence’: The role of a Chief AI Officer (CAIO) has become critical as AI transforms business operations. The CAIO not only leads AI adoption but also drives the cultural shift necessary for integration. Their responsibilities include defining AI initiatives, driving cultural change and skill-building, and bridging the gap between technology and business needs. They must also address risks related to fairness, accountability, transparency, and ethics, ensuring AI aligns with company values. Despite advances in AI, many firms are early in their adoption. Therefore, prioritizing strategic AI adoption and fostering a culture open to change is essential. The future of the CAIO role is integral in shaping agile and responsible organizations.

Technology

‘Vs Code: Open Source AI Editor’: Microsoft’s vision for Visual Studio Code (VS Code) is an open-source future enhanced by AI. Recognizing VS Code’s success as an open-source project, Microsoft plans to integrate AI while maintaining its foundational principles of openness and collaboration. They will open source the GitHub Copilot Chat extension under the MIT license, incorporating AI components into VS Code’s core. This move aims to foster a vibrant ecosystem of extensions, enhance transparency, improve security, and ease AI feature contributions.

Pelayo Arbués

Explorer

Recent Notes

My failure resume

Tres Millones de viviendas

Context Engineering

Wrapped-up Readings 2025-05-23

AI

Data Science

Economics

Philosophy

Software Engineering

Management

Technology

Graph View

Table of Contents

Now Reading

Unas Cuantas Notas Sobre Estadística