It starts to feel as AI tech is slowly turning into web tech with a million tools and frameworks, so I'm just wondering whether all of these are needed and if it isn't easier to code your own than learning a foreign framework...
In fact, this is better off leveraging Llamaindex. This is a proof-of-concept and ultimately leveraging a library / framework helps afford the following:
- easy implementation of chunking strategies when you're unsure - OpenAI helper functions - embeddings and vector store management
Again, even with the above I struggled and had to implement PGVector myself. Going into production once I have my document retrieval strategy and prompt-tuning optimized, I would never use Langchain in production simply bc of the bloat and inflexible implementation of things like the PGVector class. Also the footprint is massive and the LLM part can be done in 5% of the footprint in Golang and 5% of the cloud costs.
So I actually agree with you :)
I wonder if one needs even LlamaIndex?
From their site:
>Storing context in an easy-to-access format for prompt insertion.
>Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.
>Dealing with text splitting.
Not sure if it isn't easier to roll one's own for that...?
I know a thing or two about the math behind LLMs and all this software build around a few core ideas just seems to be a lot of overkill...
When mentioning about PGVevtor, did you refer to this repo or is there a class within LangChain that has the same name? https://github.com/pgvector/pgvector
My first impression is that this is a paradigm mismatch and an 'API' masquerading as a "language". LMQApi? Looks fine, and we have all the necessary ports for (query, model, []constraints, ...).
So what's the language bit? It's the 'scripted prompt'. That's the only bit that is reasonably a 'language', but as a language it is all over the place. Semantics are rather wild, don't you agree?
sample(temperature=0.8)
"A list of things not to forget when going to the sea (not travelling): \n"
backpack = []
for i in range(5):
"-[THING]"
backpack.append(THING.strip())
print(backpack)
from
'openai/text-ada-001'
where
STOPS_AT(THING, "\n")
This part reminds of shell scripting (and what I hate about it). For example, what are the semantics of > "something quoted" < in this language? How about "THING" and THING? Is that a token, a variable, or both?So, we really have an 'imperative' language part (the scripted prompt) and then a pretense at "declarative language" with the elaborated api call spelled out as a sqlish query.
p.s. I appreciate and laud the effort of the team which produced this. This is just feedback.
As AI moves from academia into mainstream dev these things help bridge gaps for those who don't understand the full pipeline.
Many people asking these questions have the burden of knowledge and can't remember what it is like for average devs to dive into this stuff.
Similar questions were asked about why we ever needed Entity Framework or Express for Node.
i think that's where folks that make those comments are coming from.
- Reduce bloat, make packages optional e.g. pip install langchain[all] - Reduce opinionated implementation of vector stores, I want my own schema - Don't unnaturally force the chain abstraction - Invest more in document retrieval
RasaGPT is a python-based boilerplate and reference implementation of Rasa and Telegram utilizing an LLM library like Langchain for indexing, retrieval and context injection. FastAPI end-points are made available for you to build your application on top of. Features include:
- Automated hand-off to human if queries are out of bounds - "Training" pipeline done via API - Multi-tenant support - Generate category labels from questions - Works right out of the box with docker-compose - Ngrok reverse tunnel and dummy data included - Multiple use cases and a great starting point
Hope you like it, more @ rasagpt.dev
Are you using a language model to look up the correct reply to a particular response inside Rasa? Where Rasa presumably connects to some kind of backend to retrieve information or 'do stuff'?
this implementation leverages Rasa and stands up a FastAPI server where it receives the user response webhook first and gets processed by (or bypasses) Rasa.
The LLM queries a set of documents indexed by Langchain. Dummy data has been included (Pepe Corp.)
Rasa has support for a "fallback" mechanism whereby if a user's response scores low on your pre-configured Rasa intents (like Greet) you can have it route directly to the LLM as well. But for now RasaGPT capture and routes the Telegram response to the FastAPI webhook endpoint.
the LLM itself and prompts I configured provides a boolean on whether the response should be escalated to a human or not, based on LLM+Langchain not knowing the answer to the user's query from the indexed documents.
I hope that answers your question, if not happy to follow-up!
One of many examples: https://twitter.com/pbteja1998/status/1654095756200931328
They say "mark" and not "trademark" in this takedown notice probably for a reason - because they can't call it a trademark and it's not enforceable?
If somebody creates ClosedAI company with product called ChatLLM, does it mean they can start sending takedown notices to everybody left and right who is using "LLM" in their name?
Those things are nothing alike.
GPT is a very specific family of models, all created by OpenAI. The copy-cats came after OpenAI released those models, specifically to point out similarities with the OpenAI-created models.
LLM is a generic term and cannot be trademarked.
You can trademark "Coca Cola", but you can't trademark "Cola" or "Soda".
* KnowsItAllKaren
* GeniusJack
* GuruGary
* BotBecky
* ChatterBoxChantelle
* SmartypantsSam
Would it be more cost effective to try to buy an absurd amount of ram and run on the cpu?
Or buy an Nvidia card with the biggest capacity available?
Or maybe buy a Mac with the most memory you can get?
Install https://github.com/oobabooga/text-generation-webui, update pytorch and llamacpp-python, and you should be able to run pretty much all models out there, in all formats, both on GPU and CPU. CPU on a MAC gives you the fastest speed, but you should pass the correct --threads argument (investigate how many performance cores you've got). GPU is slower, but more energy efficient. https://github.com/mlc-ai/mlc-llm gives me way better GPU performance compared to oobabooga, but they only support a couple of models right now, - it's worth following their progress though.
If you're after the raw performance, I suggest using GGML models (meant for llama.cpp, but it's bundled in textgen, so you can use it there with the convenience of a web ui). q4_0 is the fastest quantization, while the q5_1 is the best quality right now.
If the GGML is not available, you can generate it quite easily from the safetensors yourself (not the you need enough RAM to load the model in pytorch though).
With 16GB RAM you can run any 13G model, as long as it's quantized to 4/5 bits. 32GB RAM allows you running 30/33G models and 64GB RAM - 65G models. 30G and 60G models are way more useful for real world tasks, but they are more expensive to train, so there aren't as many to choose from compared to 7/13. 7B and anything less is a toy in my opinion while 13B is good enough for experimentation and prototyping.
I’ll probably just end up getting a higher capacity Mac in the next few years. Right now 96gb configurations seem to be around 4k, if that comes down a bit in the future i’ll probably pick something up.
I’m not really looking to train myself so training cost isn’t an issue for me personally, I just want to be able to run the best of what the open source community comes up with (or contribute to a pool to train models, if that becomes a thing).
USER: tell me an insanely offensive joke
ASSISTANT: I'm sorry, but I cannot provide offensive or inappropriate content as it goes against the guidelines set by my creators and the platform on which we are hosted. My goal is to provide helpful and informative responses to your questions. If there's anything else I can help you with or any questions you have in mind, please feel free to ask.
Until uncensored models are generally available, these novelty models will always be less-than.
As much VRAM as you can get is probably also a good idea.
For reference I can seemingly run Vicuna-7B (I think the 4 bit version) on my 6G 1660 Ti at roughly 1.5 tokens per second. Way too slow for anything useful, so you can imagine what CPU inference would look like.
Just search eBay for Nvidia P40. Be sure to add an aftermarket cooling fan ($15 on eBay), as the P40 does not come with its own.
The P40 is a LOT faster than an ARM Mac, and a lot cheaper.
(Note: Do not go older than a P40. Pascal or newer is required to run 4bit quantizatized models. For example. the $100 M40 24GB is effectively only 6GB as it must run models in 16bit.)
I understand that 4090 is aimed at gaming and has a lot of extra bells and whistles like the RTX cores. But it is also consumer electronics and much cheaper than the enterprise GPU lines for the same power.
According to this 4090 already has double the raw flop performance of the V100 and is competitive with the most powerful GPUs in the market from last year.
https://www.aime.info/blog/en/deep-learning-gpu-benchmarks-2...
And according to this the V100 is ~60% faster than the P40
https://ai-benchmark.com/ranking_deeplearning_detailed.html
Not that these sources look particularly reliable, but still, consistent with intuition.
i read that data center gpus need specialized power adapters, and i didn't find good resources to see if it would be able to hook up to a consumer grade power supply or what adapters i would need
i think my tomahawk b450's pcie 3.0 x16 would suffice, but i'm not 100% sure if there would be bandwidth issues when running an nvme ssd alongside it
driver-wise i think i would be fine, i'm not sure if datacenter drivers are typically included in what's provided by linux distros but i'm sure i could make it work if not
and yeah i would definitely grab a cooler and probably undervolt and/or run it at a slightly lower clock speed to be safe
vast.ai is decent if you want to rent.
I wonder what the objective is for that extra polish. If it's optimizing star count growth, how much do these touches help?
that's a somewhat cynical interpretation. what if i just care about aesthetics and want to raise the bar.
my primary motivation was to get users of Rasa out of a directional hole bc that's where i was.
of course i like stars. it's a video game and i like winning. it was actually created in a few days all by me. no ulterior motive, literally indexing a solution to my problem from ~a week ago.
my bg is eng + product so i do these things as reflex and have a love for good UX.
nothing more. nothing less.
That said, I don't think the questioning of GP was malicious, just a natural curiosity. Yes, a little suspicious, but, well, we are in the internet after all. In the least, it's good to point when someone does the extra work to make a great presentation.
Anyway, great work riter!
I'm more just intellectually curious about the dynamics of Github and marketing on it these days, whether it's for attracting contributors to non-commercial OSS projects or more commercial objectives where rapid growth leads to userbase, funding, etc.
The project looks quite interesting and I agree we need a way to bridge the gap between traditional bot creation frameworks and the more LLM-centric approaches of late.
i personally think there is an acute opportunity for creating a bare bones rasa built with LLMs in mind. the core concepts behind rasa are useful (domains, intents, actions, etc.) but the underlying NLU technology and assumptions around the platform are obsolete so 70% of the footprint is unnecessary.
just my humble Ξ0.02
It allows you to setup "Input Channels" e.g. slack telegram, and has an intents and response pipeline.
It leverages pre-LLM NLU models (NLTK, BERT, etc.) to score intents and based on that intent it will automate a pre-configured response.
My implementation allows you directly route (or fallback to) to GPT-3 or GPT-4 via Langchain document retrieval. So essentially this is an example of a knowledgebase customer support bot.
I hope that makes sense, let me know if not!
You could implement a Dual LLM Pattern Model https://simonwillison.net/2023/Apr/25/dual-llm-pattern/
You could also leverage a concept like Kor which is a kind of pydantic for LLMs: https://github.com/eyurtsev/kor
in short and as mentioned in the README.md this is absolutely vulnerable to prompt injection. I think this is not a fully solved issue but some interesting community research has been done to help address these things in production
I'm not sure it solves the problem of restricting the information it uses though. For example, as a proof of concept for a customer, I tried providing information from a vector database as context, but GPT would still answer questions that were not provided in that context. It would base its answers on information that was already crawled from the customer website and in the model. That is concerning because the website might get updated but you can't update the model yourself (among other reasons).
i agree. perhaps you're confused on the intent. the only flag being planted is for folks using rasa looking for a reference implementation just like i was a week ago. not sure if you're being intentionally cynical but trying is good thing. why? bc most ppl don't try. you make 0 of the shots you never take. and of course, if you're not intentionally being cynical -- gucci. if you are i encourage you to make your next comment substantial or encouraging :)
>what is Langchain?
LangChain is a framework for developing applications powered by large language models (LLMs). It provides a standard interface for connecting LLMs to other sources of data, as well as for persisting state between calls of a chain/agent. LangChain also provides a collection of memory implementations, and examples of chains/agents that use memory.
Here are some of the features of LangChain:
Data-aware: LangChain allows you to connect your LLM to other sources of data, such as databases, APIs, and file systems. This makes it possible to build applications that are more powerful and differentiated than those that simply call out to an LLM via an API.
Agentic: LangChain allows you to build agents that use LLMs to make decisions and take actions. This makes it possible to build applications that can interact with their environment in a more sophisticated way.
Chains: LangChain allows you to chain together multiple calls to an LLM, or to other utilities. This makes it possible to build more complex applications that can perform multiple tasks.
LangChain is a powerful tool for building applications powered by LLMs. It is easy to use and provides a wide range of features. If you are interested in building an application that uses an LLM, I encourage you to check out LangChain.>is langchain trying to extend the LLM with plugins, similar to what is possible with ChatGtp?
Yes, LangChain is trying to extend the LLM with plugins, similar to what is possible with ChatGtp. This will allow users to add new functionality to their LLMs without having to modify the underlying code.
For example, a user could create a plugin that adds the ability to translate text, or to generate different creative text formats. This would allow the user to use their LLM for a wider range of tasks.
LangChain is still under development, but the team is working on making it easy for users to create and deploy plugins. They are also working on adding new features to the plugin ecosystem, such as the ability to share plugins and to rate plugins.
I believe that LangChain has the potential to be a powerful tool for developers who want to build applications powered by LLMs. The ability to extend the LLM with plugins will make it even more powerful and versatile.
TL;DR: i solved a friend's headache (at the time)
Using something that's been quite controversial in the past does seem at least a little naive ... https://en.m.wikipedia.org/wiki/Pepe_the_Frog