rw-book-cover

Metadata

Highlights

  • Anyone who’s spent time prompting language models (LMs) has probably learned this the hard way — a small perturbation to the prompt string you write can have unforeseen consequences on the downstream task. This is especially true in compound AI pipelines, where you combine several prompts to solve a larger, more complex task. When you take a step back and think about where you actually are in prompt space, you come to the following humbling realization: there’s an infinite number of alternative prompts out there that could potentially get you a better result. All you really did after painstakingly tweaking words and/or word order in the prompt message, was to hit upon an arbitrary point in this space, which is likely far from optimal. (View Highlight)
  • Of course, the real prompt space is likely highly discontinuous, and exists in higher-dimensional space — the image above is for illustrative purposes only. The larger point is still the same: As a developer, it’s not clear how the changes made to the prompt affect the final outcome. Put simply, natural language prompts aren’t that good an interface for developers who want to build reliable AI software. Conventional LM or agent frameworks make the developer begin with the question:

    “What prompt can I write to achieve the given outcome”? This might be a reasonable approach to rapidly test and iterate on your ideas right away. However, when building complex software with LMs, it’s important to recognize that the prompt engineering process never ends — the moment there’s a newer and better model that comes along (there always is), you need to start all over, because prompts are brittle, they interact with different LMs in different ways, and they can break your system for subtle reasons that are incredibly hard to debug. (View Highlight)

  • As a developer learning a new AI framework, it always helps to understand the following: a) the abstractions you’re given to work with, and b) the level of abstraction you’re working at. As you read and digest the documentation, it becomes apparent what primitives you can use to compose and build together the workflow you have in your head or on the whiteboard. Too high-level an abstraction, and you end up with rigid building blocks that you need to wrestle with to align them to your task. Too low-level, and you’re left writing tons of boilerplate and repetitive code in each and every one of your projects. (View Highlight)
  • As you use a framework across a wide variety of problems, you begin to see very clearly what it lacks and what parts you need to customize for your use cases. Because the whole industry is so fast-moving and evolving on a daily basis, it’s also worth thinking about the potential need for future-proofing your code. With all this in mind, let’s list the qualities of what makes for good abstractions in any framework:
    1. Simplicity: Provides an easy entry point for developers to begin building and testing out their ideas.
    2. Clarity: Offers clear semantics, making it obvious how to use them and what to expect.
    3. Flexibility: Enables rapid changes to workflow logic during early experimentation while making it easy for developers to customize for their specific needs.
    4. Longevity: Withstands the test of time, minimizing the need for drastic code changes as the larger AI ecosystem evolves. (View Highlight)
  • DSPy ⤴, or Declarative Self-improving Python, is a framework for programming — rather than prompting — language models. Instead of tuning prompts through trial and error (which is a manual, tedious process), you write modular Python code using composable building blocks, and teach your LM to produce higher quality outputs via an optimization process. Under the hood, the LM still sees a prompt, but one that’s automatically generated by DSPy based on the code you write. To begin using DSPy, all you really need is to wrap your head around these three core abstractions: (View Highlight)
  • Let’s unpack them to understand their purpose (with examples in code, in the next section).
    1. Signatures: Signatures specify the input and output types to the LM and what the expected behaviour is. They let you tell the LM what it needs to do, rather than specify how it should do it.
    2. Modules: Modules are building blocks for programs that interact with LMs. They are generalized to take in any signature while abstracting away the prompting technique for interacting with the LM (e.g., chain of thought). DSPy modules are highly composable, meaning that you can combine multiple modules together to create more complex modules for any task. Importantly, modules serve as learning targets for the optimizer. As LMs and prompting strategies evolve, so too does the module.
    3. Optimizers: Optimizers improve the performance of a DSPy module with annotated examples of input-output pairs. The optimizer can automatically improve and generate prompts, few-shot examples or the language model weights to produce a new, improved module that can perform better on that task. (View Highlight)
  • The table below summarizes the tasks you may be used to in conventional LM frameworks, and what DSPy replaces them with: DSPy abstraction Conventional frameworks ✅ Signatures Hand-written prompts and few-shot examples ✅ Modules Hand-crafted prompting techniques & predefined prompt chains ✅ Optimizers Manual prompt engineering (View Highlight)
  • Once you learn DSPy, you don’t need to rely on predefined prompt chains or manually tune prompts by hand. You get composable, flexible building blocks that can be easily adapted to your specific needs. DSPy inverts the traditional thought process when working with LMs: rather than first thinking about ways to phrase the prompt, you’re focused on the following very important questions:

    What is my intent, and what does success look like for my specific task? Any DSPy workflow begins with you thinking about the desired outcomes and how you can objectively measure them. Once you define metrics that capture the essence of success for your domain, you proceed to write your signature that declares your intent, and specify the expected input/output types. (View Highlight)

  • The module handles the signature, formulates the prompt, and invokes the LM calls. Initially, you don’t need to (and shouldn’t) care about optimization. Simply focus on writing the signatures and modules to express your desired logic, and you’ll be surprised to find how effective your base DSPy pipeline is, no matter what task you throw at it. (View Highlight)
  • Define success metrics Let’s begin with the DSPy mindset of defining what “success” means for this task. For each news article, we want to a) classify it as belonging to the type “merger”, “acquisition” or “other”, and b) extract the relevant fields of interest that are exact matches with what’s in the source data. Below are some of the fields that are captured via exact matches: • Company names • Currency symbols: USD, CAD or AUD • Company stock tickers • Financial amounts (e.g., 5 billion): In this case, the multipliers (“mn” or “bn”) should be extracted as a full word (million or billion) for the purposes of standardization in the downstream analysis. The success metric for this example is thus defined as follows:

    Our result object’s values should be an exact match with gold (annotated) data that has the same structure and field names. (View Highlight)

  • Define complex types Because the task requires complex objects as output, we can use Pydantic models to easily declare complex types to our signature in the next step. Three Pydantic models are created: Merger, Acquisition, and Other, for each output type. (View Highlight)
  • As can be seen, we’re not limiting ourselves in DSPy to using flat data models and simple types. With the power of Pydantic, it’s possible to deal with richer data representations when working with LMs. Signatures The next step is to create signatures, akin to how you’d write prompts in other frameworks, but DSPy’s approach is more declarative. Let’s start with the first signature that performs classification of article types. We create a signature by subclassing dspy.Signature and defining a docstring in Python. The purpose of the docstring is to declare our intent to the LM, i.e., state in clear, crisp language what we want to achieve. (View Highlight)
  • We’re simply asking the LM to classify the article as either a merger or an acquisition, without providing the specifics of what those terms mean. Also, we clarify under what conditions the “other” category is assigned. Anything that’s domain-specific and can serve as important context for the LM, goes into the docstring. The input field is the text of the article, and the output field is a Python literal representing the article typ (View Highlight)
  • Once again, note how concise the docstrings are. We’re not fixating on every word that goes into the prompt — there’s plenty of room for optimization later! Signatures are an abstraction that replace the normally verbose prompts describing the specifics of what we want. We simply declare our intent via declarative Python code, and allow DSPy to formulate the prompt for us1. When defining signatures, avoid prematurely optimizing for specific phrasing or prompting styles in your docstring. Only say what needs to be said for your particular domain, if you know what you want and can articulate it clearly. It’s the job of the optimizer (downstream) to discover how to appropriately phrase the prompt and provide fewshot examples to guide the LM to its destination. (View Highlight)
  • Modules are the meat of a DSPy pipeline, and they define the interaction paradigm with the LM. DSPy offers several useful built-in modules ⤴. The simplest one is a Predict module that uses a basic predictor (given a prompt, predict an output). For this classification and information extraction workflow, this is all we need. (View Highlight)
  • Let’s briefly discuss how the input text to the predictor is preprocessed. Rather than sending the entire news article (which can be long and verbose), a useful heuristic, in this case, would be to pass in the topN​ sentences from the article. M&A news articles typically describe key events right at the top, with a descriptive title and a few sentences that capture the larger meaning. Using only the title and the topN​ sentences can help create a more focused input for the classification predictor, while also saving tokens. The code for this is excluded from this post for brevity, but you can check the extract_first_n_sentences function in the code here ⤴. (View Highlight)
  • Using a module always involves the following two steps: a) Initialize the module by passing in its signature, and b) Call the module by passing in the input fields. The output of the module is a Prediction object that conforms to the output fields defined in the signature. (View Highlight)
  • The best part about modules is how composable they are. In this case, we have a compound workflow, where an LM’s outputs are passed to another LM downstream. Rather than using a single linear flow where we string together module calls in Python, we can write a custom module that composes together these calls into a single class. Here’s the custom module that accomplishes our entire task for information extraction: (View Highlight)
  • Calling a custom module is just as simple as calling a built-in module. Simply initialize the module without any arguments (the __init__ block has all the submodules with their signatures). (View Highlight)
  • What’s the true power of a module? Modules are a powerful construct in DSPy, because they simultaneously serve several functions:
    1. They allow for easy composition of complex workflows. By encapsulating all the logic for a specific task within a module, you can easily express a combination of deterministic and LM-based logic in a concise and readable manner
    2. They serve as learning targets for the optimizer. Within a single compound module, you can make several submodules (that could each be built-in DSPy modules, like Predict or ChainOfThought, or other custom modules). The final output of the module (i.e., whatever is returned by the forward method), is all the optimizer needs to improve the module via examples of input/output pairs.
    3. They handle any kind of signature, and abstract away the hard-to-understand parts (for e.g., how the phrasing of the prompts affects the outcome), and instead, allow the developer to focus on the more important task of defining examples for evaluation and optimization downstream. (View Highlight)
  • As mentioned, DSPy makes you think upfront about metrics and evaluation. The moment the custom module is written, you can run the pipeline end-to-end on a subset of the data and immediately evaluate it using an evaluation script, as shown here ⤴. The key part to note here is our metric, where we define an integer value of 1 or 0, for whether or not there was an exact match between DSPy’s prediction and the ground truth. The final score is the sum total of all exact matches obtained for each and every expected field. (View Highlight)
  • In this post, we covered the key abstractions in DSPy, and showed how simple it is to get started. As a developer, you begin by defining signatures, which are a programmatic way to declare your intent and specify the expected input/output types. You then define a custom module (or multiple built-in modules) that call their respective signatures. Signatures and modules depend on adapters under the hood1 to formulate the prompt for the LM to accomplish its task. As a developer, you interface with LMs via a declarative, programmatic abstraction on top of natural language prompts, and it’s DSPy’s job to manage the nuances of how the prompt is phrased and formatted. All you need to focus on is defining the logic of your workflow and building it out end-to-end, to evaluate its baseline performance. Right upfront (even before you begin optimizing anything), the focus is to define success metrics that help you measure the performance of your baseline implementation. (View Highlight)
  • DSPy is not a prompt optimization framework. It’s simply a declarative way to composably build AI systems. Using optimizers in DSPy is optional, and optimizers can operate on a much larger space ⤴ than just prompts (including weights, and in the future, signatures, adapters and modules themselves)! (View Highlight)
  • • DSPy doesn’t “automate away prompts” — there’s still a prompt operating under the hood — it’s just that you, as a human, don’t need to fixate on the prompt itself. In DSPy, you think in terms of signatures, not prompts. While you may still specify a part of the prompt by hand (as we did in the signature’s docstring), the focus is mainly on what needs to be said (crisply and concisely), without worrying about how exactly the prompt is phrased or formatted. (View Highlight)
  • You don’t need a lot of examples to optimize your DSPy pipeline. As we’ll explore in the next post, all you need to begin optimizing your modules in DSPy is a handful of samples, starting with 10-20, which is sufficient for bootstrap few-shot optimization. And once you see the improvements, an additional 200 examples can go a long way with the more advanced optimizers. (View Highlight)
  • You’re not limited to using only the built-in modules — in fact, it’s actively encouraged for you to write a custom module for each and every DSPy project. Everything is designed to be flexible and transparent, and the core abstractions in DSPy offer a lot of expressive power that enable arbitrarily complex workflows. (View Highlight)
  • It’s been said plenty of times that “English is the new programming language”. While this may be true to some extent (judging by the performance of recent LMs on tasks like coding, summarization and more), we also know that natural language isn’t the most reliable way to obtain specific behaviours from AI systems, especially when you have multiple pro (View Highlight)
  • The best prompts are not just written (by humans), they must be discovered (by algorithms). (View Highlight)
  • DSPy adds a programmatic layer of abstraction atop the natural language one that we’re familiar with, allowing the model to write the prompts3, and the optimizer to discover better versions of them. Tying this back to the larger point about good abstractions, DSPy’s design allow you to translate your ideas into code, such that your overall intent is passed down to the LM’s weights (which actually perform the task at hand). As such, DSPy can be thought of as a “compiler” for LMs that translates high-level concepts into low-level instructions. (View Highlight)
  • In the end, DSPy innovates over traditional frameworks on two fronts: a) it adds a declarative programming abstraction on top of natural language prompts, and b) it plugs in optimizers that seamlessly fit into an existing module’s workflow. As models continue to evolve, the overall logic of your signatures, modules and optimizers can remain largely unchanged (only the adapters that formulate the prompts need to evolve accordingly). To begin appreciating the simplicity and power of DSPy’s abstractions, simply begin rewriting your existing prompts from other frameworks using signatures and modules in DSPy, and you’ll find that there’s an ocean of possibilities out there! (View Highlight)