The Problem With LangChain

rw-book-cover

Metadata

Author: minimaxir.com
Full Title: The Problem With LangChain
URL: https://minimaxir.com/2023/07/langchain-problem/

Highlights

If you’ve been following the explosion of AI hype in the past few months, you’ve probably heard of LangChain. LangChain, developed by Harrison Chase, is a Python and JavaScript library for interfacing with OpenAI’s GPT APIs (later expanding to more models) for AI text generation. More specifically, it’s an implementation of the paper ReAct: Synergizing Reasoning and Acting in Language Models published October 2022, colloquially known as the ReAct paper, which demonstrates a prompting technique to allow the model to “reason” (with a chain-of-thoughts) and “act” (by being able to use a tool from a predefined set of tools, such as being able to search the internet). This combination is shown to drastically improve output text quality and give large language models the ability to correctly solve problems. (View Highlight)
The ReAct workflow popularied by LangChain was particularly effective with InstructGPT/text-davinci-003, although costly and not easy to use for small projects. In March 2023, as ChatGPT API usage became massively popular due to its extremely cheap API as I accurately predicted, LangChain use also exploded, to the point that LangChain was able to raise a $10 mi ll i o n see d ro u n d] (h ttp s : // b l o g . l an g c hain . d e v / ann o u n c in g - o u r - 10 m - see d - ro u n d - l e d - b y - b e n c hma r k /) an d an o t h er [$ 20- $25 mi ll i o na t a$ 200 million valuation Series A despite not having any revenue nor any obvious plans how to generate revenue. (View Highlight)
hat’s where my personal experience with LangChain begins. For my work at BuzzFeed, I was tasked with creating a ChatGPT-based chat bot for the Tasty brand (later released as Botatouille in the Tasty iOS app) that could chat with the user and provide relevant recipes. The source recipes are converted to embeddings and saved in a vector store: for example, if a user asked for “healthy food”, the query is converted to an embedding, and an approximate nearest neighbor search is performed to find recipes similar to the embedded query and then fed to ChatGPT as added context that can then be displayed to the user. This approach is more commonly known as retrieval-augmented generation. (View Highlight)
LangChain was by-far the popular tool of choice for RAG, so I figured it was the perfect time to learn it. I spent some time reading LangChain’s rather comprehensive documentation to get a better understanding of how to best utilize it: after a week of research, I got nowhere. Running the LangChain demo examples did work, but any attempts at tweaking them to fit the recipe chatbot constraints broke them. After solving the bugs, the overall quality of the chat conversations was bad and uninteresting, and after intense debugging I found no solution. (View Highlight)
In all, I wasted a month learning and testing LangChain, with the big takeway that popular AI apps may not necessarily be worth the hype. My existential crisis was resolved after coming across a Hacker News thread about someone reimplementing LangChain in 100 lines of code, with most of the comments venting all their grievances with LangChain: (View Highlight)
The problem with LangChain is that it makes simple things relatively complex, and with that unnecessary complexity creates a tribalism which hurts the up-and-coming AI ecosystem as a whole. If you’re a newbie who wants to just learn how to interface with ChatGPT, definitely don’t start with LangChain. (View Highlight)
The documentation doesn’t make it clear, but within each Thought/Action/Observation uses its own API call to OpenAI, so the chain is slower than you might think. Also, why is each action a dict? The answer to that is later, and is very silly. (View Highlight)
That’s fewer lines of code and makes it very clear where and when the messages are being saved, no bespoke object classes needed. You can say that I’m nitpicking the tutorial examples, and I do agree that every open source library has something to nitpick (including my own!). But if there are more nitpicks than actual benefits from the library then it’s not worth using at all, since if the quickstart is this complicated, how painful will it be to use LangChain in practice? (View Highlight)
While I was working on the recipe-retrieving chatbot (which also must be a fun/witty chatbot), I needed to combine elements from both the third and fourth examples above: a chat bot that can run an Agent workflow, and also the ability to persist the entire conversation into memory. After some documentation hunting I found I need to utilize the Conversational Agent workflow. (View Highlight)
A quick sidenote on system prompt engineering: it is not a meme and is absolutely necessary to get the best results out of the ChatGPT API, particularly if you have constraints on content and/or voice. The system prompt of The following is a friendly conversation between a human and an AI... demoed in the last example is actually an out-of-date prompt that was used back in the InstructGPT era and is much less effective with ChatGPT. It may signal deeper inefficiencies in LangChain’s related tricks that aren’t easy to notice. (View Highlight)
You’ll notice the Recipe ID, which is relevant for my use case since it’s necessary to obtain recipe metadata (photo thumbnail, URL) for the end result shown to the enduser in the final app. Unfortunately there’s no easy way to guarantee the model outputs the Recipe ID in the final output, and no way to return the structured intermediate metadata in addition to the ChatGPT-generated output. Specifying get_similar_recipes as a Tool is straightforward, although you need to specify a name and description, which is actually a form of subtle prompt engineering as LangChain can fail to select a tool if either is poorly specified (View Highlight)
Wait a minute, it ignored my system prompt completely! Dammit. Checking the memory variable confirms it. Looking into the documentation for ConversationBufferMemory and even in the code itself there’s nothing about system prompts, even months after ChatGPT made them mainstream. The intended way to use system prompts in Agents is to add an agents_kwargs parameter to initialize_agent, which I only just found out in an unrelated documentation page published a month ago. (View Highlight)
Good news is that the system prompt definitely worked this time! Bad news is that it broke, but why? I didn’t do anything weird, for once. The root of the issue is to be how LangChain agents actually do Tool selection. Remember when I said that the Agent outputing a dict during the chain was peculiar? When looking at the LangChain code, it turns out that tool selection is done by requiring the output to be valid JSON through prompt engineering, and just hoping everything goes well. (View Highlight)
The consequence of this is that any significant changes in the structure of normal output, such as those caused by a custom system prompt, has a random chance of just breaking the Agent! These errors happen often enough that there’s a documentation page dedicated to handling Agent output parsing errors! Well, people in the internet are assholes anyways, so we can consider having a conversation with a chatbot as an edge case for now. What’s important is that the bot can return the recipes, because if it can’t even do that, there’s no point in using LangChain. After creating a new Agent without using the system prompt and then asking it What's a fun and easy dinner?: (View Highlight)
Atleast it worked: ChatGPT was able to extract out the recipes from the context and format them appropriately (even fixing typoes in the names!), and was able to decide when it was appropriate. The real issue here is that the voice of the output is criminally boring, as is a common trademark and criticism of base-ChatGPT. Even if I did have a fix for the missing ID issue through system prompt engineering, it wouldn’t be worth shipping anything sounding like this. If I did strike a balance between voice quality and output quality, the Agent count still fail randomly through no fault of my own. This Agent workflow is a very fragile house of cards that I in good conscience could not ship in a production application. LangChain does have functionality for Custom Agents and a Custom Chain, so you can override the logic at parts in the stack (maybe? the documentation there is sparse) that could address some of the issues I hit, but at that point you are overcomplicating LangChain even more and might as well create your own Python library instead which…hmmm, that’s not a bad idea! (View Highlight)
LangChain does also have many utility functions such as text splitters and integrated vector stores, both of which are integral to the “chat with a PDF/your code” demos (which in my opinion are just a gimmick). The real issue with all these integrations is that it creates an inherent lock-in to only use LangChain-based code, and if you look at the code for the integrations they are not very robust. LangChain is building a moat, which is good for LangChain’s investors trying to get a return on their $30 million, but very very bad for developers who use it. (View Highlight)
LangChain embodies the philosophy of “it’s complicated, so it must be better!” that plagues late-stage codebases, except that LangChain isn’t even a year old. The effort needed to hack LangChain to do what I want it to do would cause insane amounts of technical debt. And unlike AI startups nowadays, technical debt for my own projects with LangChain can’t be paid with venture capital. API wrappers should at minimum reduce code complexity and cognitive load when operating with complex ecosystems because it takes enough mental brainpower to work with AI itself. LangChain is one of the few pieces of software that increases overhead in most of its popular use cases. (View Highlight)
I came to the conclusion that it’s just easier to make my own Python package than it is to hack LangChain to fit my needs. (View Highlight)
herefore, I developed and open-sourced simpleaichat: a Python package for easily interfacing with chat apps, emphasizing minimal code complexity and decoupling advanced features like vector stores from the conversation logic to avoid LangChain’s lock-in, and many other features which would take its own blog post to elaborate upon. (View Highlight)
I’ve gotten many messages asking me “what should I learn to get started with the ChatGPT API” and I’m concerned that they’ll go to LangChain first because of the hype. If machine learning engineers who do have backgrounds in the technology stack have difficulty using LangChain due to its needless complexity, any beginner is going to drown. (View Highlight)
- Tags: favorite
No one wants to be that asshole who criticizes free and open source software operating in good faith like LangChain, but I’ll take the burden. To be clear, I have nothing against Harrison Chase or the other maintainers of LangChain (who encourage feedback!). However, LangChain’s popularity has warped the AI startup ecosystem around LangChain itself and the hope of OMG AGI I MADE SKYNET, which is why I am compelled to be honest with my misgivings about it. (View Highlight)

Pelayo Arbués

Explorer

Recent Notes

AI Learning Paths for Software Engineers Without Becoming a Data Scientist

Power and Prediction

Why Software Engineers Should Learn a Bit of Data Science

The Problem With LangChain

Metadata

Highlights

Graph View

Table of Contents

Now Reading

John Snow Probably Didn’t Use That Broad Street Map to Reach His Conclusions About Cholera