Otherwise, I have to agree. Langchain to a large extent seems to base its existence on a problem that barely exists. Outside of LLMs as services, the challenging part about LLMs is figuring out how to get one up and running locally. The hard part isn't writing an application that can work with one. Maintaining "memory" of conversations is relatively trivial, and though a framework might give me a lot of stuff for free, it doesn't seem worth giving up the precision of writing code to do things in a very specific way.
Perhaps Langchain is "good" for programming noobs who might benefit from living in just the Langchain universe. The documentation provides enough baby steps that someone who has maybe a few months of experience writing Python can whip something together. However, I'm really giving it the benefit of the doubt here. I really hope noobs aren't getting into programming because they want to build "the next ChatGPT", inherit a bunch of bad ideas about what programming is from Langchain, and then enter the workforce with said ideas.
I guess it's all about whether you believe the most recent LLMs to be good enough to do their own adequate decision making inside their own hallucinations, or if you need to enforce it externally. If the latter, you use LangChain or LlamaIndex. If the former, you rely on OpenAI functions/Claude 2 iterative prompting with minimal Python glue.
LangChain and LlamaIndex also have some nice functions like document imports, RAG, re-ranking but one can simply copy the corresponding code and use it standalone without the rest of the library.
In my experience, it is actually surprisingly hard. I guess it depends on just how "human" you want it to feel. I wrote about it here: [link redacted]
I was actually surprised that LangChain doesn’t do it this way. Just an example of how we shouldn’t assume the established implementations are the best ones and one should always be skeptical and take a fresh look. I posted about this a couple days ago—
https://www.linkedin.com/posts/pchalasani_rag-llm-langchain-...
[1] Langroid: https://github.com/langroid/langroid
If you look at the documentation (1), the API surface is relatively trivial and obvious to me.
Every interaction is a prompt template + an LLM + an output parser.
What’s so hard to understand about this?
Is writing an output parser that extends “BaseOutputParser” really that bad?
The parser and LLM are linked using:
“chatPrompt.pipe(model).pipe(parser);”
How… verbose. Complicated.
People who like to have a go at langchain seem to argue that this is “so trivial” you could just do it yourself… but also not flexible enough, so you should do it yourself.
Don’t get me wrong, I think they’ve done some weird shit (LCEL), but the streaming and batching isn’t that weird.
You see no benefit in using it?
Ok.
…but come on, it’s not that stupid; I would prefer it was broken into smaller discrete packages (large monolithic libraries like this often end up with lots of bloated half baked stuff in them), and I’d rather it focused on local models, not chatgpt…
…but come on. It’s not that bad.
No benefit?
You’ve implemented streaming and batched actions yourself have you?
The API is complicated.
The documentation kind of sucks.
…but the fundamentals are fine, imo.
It irritates me to see people shitting on this project when they haven’t tried it; I don’t even particularly like it… but if you haven’t actually used it, ffs, don’t be a dick about it.
If you have used it, maybe a more nuanced take than “it does not make any sense to me” is more helpful to people considering if they want to use it, or parts of it, or what the cost of implementing those parts themselves might be.
I personally think these templates (like https://github.com/langchain-ai/langchain/blob/master/templa...) don’t offer any meaningful value, lack documentation and context and fail to explain the concepts they’re using… but they at least demonstrate how to do various tasks.
It probably a valuable reference resource, but not a starting point for people.
At least for now and for the most popular usecases, this _is_ true. The framework seems as though it was written by people who had not actually done ML work prior to GPT4's announcement. Regardless if that's true or not; the whole point of a highly robust large language model is to be so robust that _every_ problem you have is easily defined as a formatted string.
The whole idea of deep learning is you don't need rules engines and coded abstractions, just English or whatever other modality people are comfortable communicating with. This is not necessarily true for all such cases at the moment. RAG needs to do a semantic search before formatting the string, for instance. But as we go forward and models get even more robust and advanced, the need for any abstraction other than plain language goes to zero.
They have some neat extras like sample selectors that can be useful — although even then, if you have so many examples you need a sample selector, finetuning gpt-3.5 is often better than using a sample selector with gpt-4 (and is considerably cheaper) in my experience.
Streaming and batching really aren't that onerous to build yourself. Especially if your design goal isn't to support every single LLM provider and related endpoints. And it's the kind of boilerplate that you build once and usually never touch again, so the front-loaded effort amortizes well over time.
With that said, I do think some of the langchain hate is definitely overstated. There's pieces of it that can be useful in isolation, like the document loaders, if you're trying to spin up a prototype quickly to test some ideas. But the pitch they make is that its the fastest/easiest way to build LLM-based application end to end. That pitch I find to be dubious, and thats being charitable.
That's damning by faint praise.
https://github.com/microsoft/semantic-kernel
They were at some level, trolling you. Either way intentionally or not, it says too much about them and not about the position itself.
While SolidJS is good, I really don’t think SolidStart is close to being useful or good. Not sure I really understand the value add on top of SolidJS quite like I understand NextJS on top of React. When I had to use SolidStart, there was a few times, I really just used SolidJS in place of some SolidStart built-ins because I couldn’t get it to do what I wanted and the docs were nearly non-existent. I even think I had to look at its src code to paint a complete picture from where the docs were at the time. In additional, I have no idea why people decide to use things that are so new for apps that are production used as much as they do. SolidStart really just made things more complicated for as simple of an app as I used it on. I couldn’t imagine using it for something that is non trivial at all.
That post also gets a very surprising amount of Google traffic.
I think in terms of tech debt, that's a big part of it. I don't think I've ever seen a python package that was supposed to be used as a library with that many dependencies (that you also can't just pick a reasonable subset from via extras).
I'd rather use a tiny core library with good interfaces + a big ecosystem around it than the kitchen sink approach that langchain takes.
Also, langchain is, at best, not that useful and silly.
Maybe there is a way to do this, but my toy fiddlings would encounter issues if I tried to change my prompt in total isolation from caring about the formatting of the output.
To give a concrete example, I've been using local CPU bound LLMs to slowly do basic feature extraction of a very long-running (1000+ chapters) niche fan fiction-esque story that I've been reading. Things like "what characters are mentioned in this chapter?", features which make it easier to go back and review what a character actually did if we haven't been following them for a while.
To get my data from my low-rent LLMs in a nice and semi-machine readable format, I've found it best to ask for the response to be formatted in a bulleted list. That way I can remove the annoying intro prefix/postfix bits that all LLMs seem to love adding ("sure, here's a list..." or "... hopefully that's what you're looking for").
I've found that innocent changes to the prompt, unrelated to my instructions to use a bulleted list, can sometimes cause the result formatting to become spotty, even though the features are being extracted better (e.g. it stops listing in bullets but it starts picking up on there being "unnamed character 1").
I've only been fiddling with things for about a week though, so maybe there's some fundamental knowledge I'm missing about the "LLM app pipeline architecture" which would make it clear how to solve this better; as it is now I'm basically just piping things in and out of llama.cpp
If folks have thoughts on addressing the promp-to-output-format coupling, I'd love to hear about it!
I've done some productive collaborating with someone who only works at the prompt level, but you can't really hand off in my experience. You can do some things with prompts, but pretty soon you are going to want to change the pipeline or rearrange how data is output. Sometimes you just won't get a good response that is formatted the way you want, and have to accept a different function output and write a bit of code to turn it into the representation you want.
Also the function definition (i.e., output schema) looks separate from the prompt, but you absolutely shouldn't treat it like that; every description in that schema matters, as do parameter names and order. You can't do prompt engineering without being able to change those things, but now you will find yourself mucking in the code again. (Though I make the descriptions overridable without changing code.)
Anyway, all that just to say that I agree that code and prompt can't be well separated, nor can pipeline and prompt.
We released this as a way to make it easier to get started with LLM applications. Specifically, we've heard that when people were using chains/agents they often wanted to see what exactly was going on inside, or change it in someway. This basically moves the logic for chains and agents into these templates (including prompts) - which are just python files you can run as part of your application - making it much easier to modify or change those in some ways.
Happy to answer any questions, and very open to feedback!
I certainly agree, but I'm having trouble seeing how templates help with this. The templates appear to be a consolidation of examples like those that were already emphasized in the current documentation. This is nice to have, but what does it do to elucidate the inner workings?
LangChain gets a lot of pushback in production scenarios, but think going through some of their tutorials is a very reasonable way to get learning more about how you could apply gen AI for various use-cases.
There it is! Why have one level of lock-in when you can have two?
I won’t defend a lot of the tech choices that have been made, but the “free” tooling (LangSmith), integrations, and modicum of cross-model compatibility are worth it to me.
I personally used it in an alpha-quality production app too and it was fine at first, but I found out after a month of hacking, it didn't work for the business needs for the reasons stated above.
Said free tooling just increases lock-in which is not ideal for any complex software project.
1. This domain is moving very fast
2. Creating a wrapper around a fast-moving domain is difficult, lots of thrash on design patterns
3. Langchain's out of the box examples have undoubtedly made it easier to grok LLM patterns
That being said -- this article is about building a production-ready LLM app, and currently I strictly treat Langchain as a learning aide.
Ultimately I'm glad Langchain exists, and I hope to see Harrison et. al. bring more improvements to the underlying abstractions. Possibly with a more functional inspiration?
Some valid criticisms: (1) the learning curve is steep (2) the APIs and docs are volatile (though to be expected).
This reminds me of the old Django vs Flask debate. Sure, Flask is easy to get started with, but over time you end up building an undocumented, untested Django.