rw-book-cover

Metadata

Highlights

  • As AI has invaded every area of the technical world, so has data. Almost every agent, feature, product (I’m avoiding the semantics here) is data-intensive and likely to become more so soon. That produces a new paradigm: agents interacting with data. Why is this paradigm important? (View Highlight)
  • Humans are remarkably good at putting up with a mess (while also complaining about it). We find workarounds, deal with glued-together foundations, and often settle for good enough because we add context where needed. Agents are not (at least yet). Their interaction patterns with data look quite different from those of an analyst or data scientist querying data, or even from those of a product hitting an online feature store. Agents require formalized context that humans often do not. Because of their interaction patterns and need for context, agents are placing greater strain on data foundations than we’ve ever seen before. (View Highlight)
  • As a consequence, data infra is a white-hot space, though without the public fanfare of AI roles. My view is that the success of many AI roles is dependent on getting the data infrastructure right. If you take a look at any AI lab right now, they have data infra roles open alongside research, precisely because of this. But the question remains: what is the data infra strategy in an agent- and AI-world? That’s where the Data Infra PM comes in. (View Highlight)
  • Misaligned data foundations (relative to agents). Bad foundations are guaranteed to prevent rapid AI adoption. The tooling doesn’t matter if your foundations aren’t set up to enable speed and agents. Most non-AI native companies (and even some of those) didn’t design with AI in mind (nor should they have). Practically, this means every company is in a tough spot. The data infra PM will have to grapple with how to get a company from its current to its future state, and quickly, or data foundations will bottleneck product progress. (View Highlight)
  • “Democratizing” data access must be done carefully. I know (and have heard elsewhere) that the goal is to enable everyone to become an analyst through AI tools and data warehouses. Not only that, agents are running on top of those same warehouses. This change places a heavy load on the data infrastructure, which previously only had to serve experts. The data infra PM will need to sort out what matters most to a business when everything feels like a P0. The infrastructure cannot handle everything and will need to avoid full democratization. That won’t be popular. (View Highlight)
  • We don’t yet know enough about agent interaction patterns. Agents have different interaction patterns from people. They stress-test data foundations in a way people just do not. This is both in volume and frequency, as well as complexity. They touch tables that don’t make sense, and they combine data in ways that people wouldn’t imagine doing. On the flipside, they make some things easier than people ever imagined. The data infra PM will need to be an expert on data-intensive agents and likely challenge design patterns that warehouses and engineers have used for a long time. This is going to be really hard because there won’t be time or patience, particularly when data becomes the bottleneck. (View Highlight)
  • • As AI use picks up, cost becomes a real problem. Hitting small tables isn’t a big deal. Hitting tables with billions of rows many times is an issue. Agents do this frequently. As humans offload more analysis and script generation, they won’t pay as close attention to this either. That means token consumption and warehouse costs will jump (likely a lot). While token consumption is popular right now (and CFOs actually want more of it, not less), cost can and will become prohibitive. The data infra PM (and their engineering counterparts) will be on the hook for this cost, whether they admit it now or not. Understanding how to control cost growth and link it to business value will be critical. (View Highlight)
  • Dealing with the context layer is hard. Data-intensive applications (particularly those that are agent-based) are context-dependent. Yet most data foundations lack sufficient context to operate well. There are solutions (e.g., leveraging Unity Catalog in Databricks, as it is designed to be used), but many of them require someone to have a plan and execute it. Engineers are unlikely to self-organize around solving this problem, nor do they have the full business context to do so. The data infra PM will need to think carefully about how to create the right context layer in and around the data foundations. (View Highlight)
  • None of what I wrote above means that the person needs to be a PM in the classical sense. This data infra PM role requires deep business savvy, strong data fundamentals, an appreciation for the complexity of infrastructure, and the ability to navigate significant ambiguity. Every company will need it, and there will only be a few great ones. (View Highlight)