rw-book-cover

Metadata

Highlights

  • That article was one of the first things I posted on this blog. It proposed that the growing ecosystem of data startups—then called the modern data stack; now called the modern data stack (derogatory)—needed one more elemental piece: A metrics layer. At that time, there were four generally-accepted layers:
    1. An integration or extraction layer that collected data from various sources.
    2. A database that stored and processed what had been collected.
    3. A transformation layer that defined how to turn messy raw data into clean and tidy tables.
    4. An application layer—BI tools, visualization products, notebooks, SQL clients, lol, no, it was just BI tools, it was always just BI tools—that let people do stuff with all their data, which mostly meant making charts and dashboards of metrics. (View Highlight)
  • Though this worked well enough, there was a problem. People wanted all of the charts in the fourth layer to be consistent with one another, but there was “no central repository for defining a metric.” Even if the third layer included some precomputed revenue tables—revenue by quarter; revenue by product line; revenue adjusted to be pleasing to the CEO—people couldn’t calculate new segments without rewriting the formula for “revenue” from scratch (View Highlight)
  • metric definitions were often “scattered across tools, buried in hidden dashboards, and recreated, rewritten, and reused with no oversight or guidance.” And because the formulas for computing business metrics are often complicated and nuanced, sometimes people would mess them up. Dashboards wouldn’t match, or a customer would get the wrong marketing email, or the CEO would tell regulators that they had $78 billion that did not exist (View Highlight)
  • Of course, none of this was a new idea; semantic layers have encoded metric definitions like these in BI tools for decades. But this added a new twist: Historically, semantic layers couldn’t be shared across different tools. The hypothesis behind the metrics layer was that a universal logical layer would be better than a bunch of fragmented ones—which was itself simply an extension of one of the foundational ambitions behind the modern data stack itself. (View Highlight)
  • That is: The whole stack was turning horizontal: “We no longer need to buy a bunch of vertical-specific products to do analytics on specific things; we push data into a warehouse and can then analyze it all together in a common set of tools.” The metrics layer was a proposal for another shared layer. (View Highlight)
  • I mean, I don’t know that, not for sure, not yet. But the trajectory isn’t great. By the end of 2021, at least six companies were building a product that could reasonably be called a metrics layer. Two pivoted, one got acquired, and one stalled. Two are still growing—Cube raised some money in June [of 2024], and dbt still sells their semantic layer—but neither have become anything close to a market standard. Google has gone silent about their spinoff. In 2022, the industry was chasing the idea; now, after some false starts and disappointing v1’s, it’s slowly backing away from it. (View Highlight)
  • However, to the extent that the idea flopped, the issue probably wasn’t technical or experiential; it was economic. As Fivetran CEO George Fraser predicted, a standalone metrics layer was too hard to sell without a BI tool attached:

    They [Looker] weren’t able to sell their metric store without a built in viz/dashboard/users/permissions layer, and that’s not going to change. (View Highlight)

  • Though centralized horizontal layers sound nice, you have to have something to sell. You can draw a diagram of a great ecosystem of interconnected products, but those products are made by independent companies. And what’s best for the customer—a tight architecture of mutually exclusive and collectively exhaustive parts—may not be what’s best for the businesses making the stuff. (View Highlight)
  • If you squint at semantic layers, they are ways to translate questions into numbers. Companies have a bunch of tables of data over there, and a bunch of people with business questions over here, and semantic layers intermediate between the two. The people creating them first figure out the sorts of questions people might ask, and then they create a catalog of metrics and filters that map to those questions. If they’re able to create a complete-enough library and describe everything with reasonable-enough names, the theory goes, people could find what they need. (View Highlight)
  • If you squint even more, this is similar to the problem every data company is trying to solve right now.2 Except, the intermediation happens with AI, through agents that try to translate questions into numbers. Most of these bots try to understand people’s questions by using whatever information the product running the bot has—Tableau uses its data catalog to understand questions; ThoughtSpot uses its internal semantic model and user feedback; Julius remembers previous conversations to guide future ones (View Highlight)
  • But now, products are beginning to reach out to other services for more “context:”3

    But data teams need a way to curate trusted context; relying on LLMs alone comes with too many gotchas! Last week, we launched semantic authoring, and next we’ll be integrating agentic capabilities with semantic models, so anyone in your organization can ask questions in Hex using governed context straight from the data team. (View Highlight)

  • You can imagine where this might go. To get better at answering questions, analytical bots begin by sourcing information from semantic layers; then, from the MCP servers of other data tools;4 and eventually, from Slack messages, and Google docs, and emails, and the transcribed recordings of Zoom calls. In other words, they will probably do what an analyst does. They will get told a bunch of facts directly, like how to define certain metrics and which ones are most important, and will be given instructions on how to figure out the facts they don’t know—check these docs; read the history of this Slack channel; look at the old versions of some canonical deck and make sure it matches that. And then they have some learned set of skills that help them make sense of all of it. (View Highlight)
  • Building integrations5 into different sources requires work; explaining how to use each tool requires work; defining the organizational particulars of each tool requires work; writing all the various prompts that start with “you are an expert analyst” requires work. You can’t onboard an analyst by giving them logins to a bunch of tools; you can’t make a good bot by granting it access to the same sources. You have to instruct and train both. You have to teach them how the tools work, and how the business that’s using them works. (View Highlight)
  • In theory, it’d make sense for this to exist in one centralized place, rather than every BI tool doing it. Put this contextual logic—how to access relevant information, how to use it, and how to think analytically about it—in a single repository of integrations and prompts; let other tools use it as a source when they need to understand what someone means when they ask how many new accounts were created this fiscal quarter. It’s Fivetran, for context (View Highlight)
  • Or it’s the metrics layer, for fuzzy analytical concepts. And it has the same problem that that idea had: It’d be hard to sell on its own. Every BI tool wants to be the best place to ask questions; they want to make their “AI analyst” the best one; they want to differentiate themselves by having a proprietary agentic loop that can answer questions that nobody else can. And BI tools are unlikely to outsource that, just as they’ve been reluctant to outsource semantic layers to a third party that they can’t control, and that everyone else can use. (View Highlight)
  • As software fitfully becomes more “agentic,” you could imagine two architectures emerging:
    1. One that combines applications and agents. Every word processing tool has a chatbot that’s told how to be a good writer; every project management app is full of prompts that tell it how to write a good status update; every vibe-coding product is both a hosting platform and virtual engineer; every video conferencing system has a note-taking service; every BI tool sources its own context for its own agentic loops. The AI is in the software.
    2. One that separates applications and agents. SaaS products build programmatic interfaces that allow bots to manipulate them, but they don’t build the bots themselves. Instead, a new class of software emerges that is just the agent—it’s a sales operations manager that knows all of Salesforce’s peculiarities; it’s a product manager that knows how to consolidate Slack messages and Linear updates into a roadmap deck; it’s an analyst that knows how to translate an ambiguous questions into mixed and matched MCP calls and a bunch of analytical reasoning. It’s not a SaaS app with a specialized bot inside; it’s a specialized bot that can use SaaS apps. (View Highlight)
  • But that also has the same economic and logistic problems as the metrics layer. You can’t sell a meeting notetaker without a tool for hosting meetings; you can’t sell an analyst without some charts. And software is, first and foremost, a thing to be sold. The road to hell is paved with practicalities. (View Highlight)