RasaGPT: First headless LLM chatbot built on top of Rasa, Langchain and FastAPI (opens in new tab)

(github.com)

179 pointsriter3y ago111 comments

111 comments

65 comments · 12 top-level

data_maan3y ago· 12 in thread

I'm not sure what the advantage the use of a somewhat comprehensive framework like Langchain gives you for this use case?

It starts to feel as AI tech is slowly turning into web tech with a million tools and frameworks, so I'm just wondering whether all of these are needed and if it isn't easier to code your own than learning a foreign framework...

riterOP3y ago

Not off-topic at all. After struggling with LangChain's hyper-opinionated implementation of classes I agree.

In fact, this is better off leveraging Llamaindex. This is a proof-of-concept and ultimately leveraging a library / framework helps afford the following:

- easy implementation of chunking strategies when you're unsure - OpenAI helper functions - embeddings and vector store management

Again, even with the above I struggled and had to implement PGVector myself. Going into production once I have my document retrieval strategy and prompt-tuning optimized, I would never use Langchain in production simply bc of the bloat and inflexible implementation of things like the PGVector class. Also the footprint is massive and the LLM part can be done in 5% of the footprint in Golang and 5% of the cloud costs.

So I actually agree with you :)

data_maan3y ago

Thanks for the insights.

I wonder if one needs even LlamaIndex?

From their site:

>Storing context in an easy-to-access format for prompt insertion.

>Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.

>Dealing with text splitting.

Not sure if it isn't easier to roll one's own for that...?

I know a thing or two about the math behind LLMs and all this software build around a few core ideas just seems to be a lot of overkill...

When mentioning about PGVevtor, did you refer to this repo or is there a class within LangChain that has the same name? https://github.com/pgvector/pgvector

3 more replies

_cs2017_3y ago

Ignoring footprint and bloat, the big problem you identify is inflexible class design. I wonder why it happened? Is it hard for langchain to expose all the desired features of a tool like PGVector via its own class?

yawnxyz3y ago

Someone needs to create a “Langchain, but less complicated” framework

4 more replies

jvanderbot3y ago

But this is a turn-key llm, that is built on langchain? A user doesn't need to dig into langchain themselves, right?

1 more reply

lachlan_gray3y ago

LMQL (language model query language) is a different take on prompting, and I find it less restrictive and more intuitive. Langchain is to LMQL what Keras is to Tensorflow

https://lmql.ai/

eternalban3y ago

Thanks for the link. I skimmed the docs and couldn't find a motivation section. Can you expand on how you find it less restrictive and more intuitive.

My first impression is that this is a paradigm mismatch and an 'API' masquerading as a "language". LMQApi? Looks fine, and we have all the necessary ports for (query, model, []constraints, ...).

So what's the language bit? It's the 'scripted prompt'. That's the only bit that is reasonably a 'language', but as a language it is all over the place. Semantics are rather wild, don't you agree?

    sample(temperature=0.8)
       "A list of things not to forget when going to the sea (not travelling): \n"
       backpack = []
       for i in range(5):
          "-[THING]"
          backpack.append(THING.strip())
       print(backpack)
    from
       'openai/text-ada-001'
    where
       STOPS_AT(THING, "\n")

This part reminds of shell scripting (and what I hate about it). For example, what are the semantics of > "something quoted" < in this language? How about "THING" and THING? Is that a token, a variable, or both?

So, we really have an 'imperative' language part (the scripted prompt) and then a pretense at "declarative language" with the elaborated api call spelled out as a sqlish query.

p.s. I appreciate and laud the effort of the team which produced this. This is just feedback.

1 more reply

riterOP3y ago

very interesting abstraction. very DBT-esque. i will dig into the docs, thanks for sharing!

leetharris3y ago

I keep hearing stuff like,"why use X framework or Y library and why not write it yourself?"

As AI moves from academia into mainstream dev these things help bridge gaps for those who don't understand the full pipeline.

Many people asking these questions have the burden of knowledge and can't remember what it is like for average devs to dive into this stuff.

Similar questions were asked about why we ever needed Entity Framework or Express for Node.

riterOP3y ago

well to be fair, when you're scaling it does matter. i would want my techlead or seniors to care and know when/where to make specific trade-offs bc cloud costs are not forgiving.

i think that's where folks that make those comments are coming from.

pratik_kanthi3y ago

Agree about Langchain. It's tedious to work with. I don't want so many abstractions.

riterOP3y ago

Amen. Constructive feedback to Langchain dev(s):

- Reduce bloat, make packages optional e.g. pip install langchain[all] - Reduce opinionated implementation of vector stores, I want my own schema - Don't unnaturally force the chain abstraction - Invest more in document retrieval

BaculumMeumEst3y ago· 10 in thread

Sorry for the off topic question, but does anyone know how to buy consumer hardware optimal for running emerging open source chat models with the largest parameter chat models possible?

Would it be more cost effective to try to buy an absurd amount of ram and run on the cpu?

Or buy an Nvidia card with the biggest capacity available?

Or maybe buy a Mac with the most memory you can get?

execveat3y ago

ARM-based Macs are the easiest way to get an acceptable performance without the headaches right now, if you can afford the price.

Install https://github.com/oobabooga/text-generation-webui, update pytorch and llamacpp-python, and you should be able to run pretty much all models out there, in all formats, both on GPU and CPU. CPU on a MAC gives you the fastest speed, but you should pass the correct --threads argument (investigate how many performance cores you've got). GPU is slower, but more energy efficient. https://github.com/mlc-ai/mlc-llm gives me way better GPU performance compared to oobabooga, but they only support a couple of models right now, - it's worth following their progress though.

If you're after the raw performance, I suggest using GGML models (meant for llama.cpp, but it's bundled in textgen, so you can use it there with the convenience of a web ui). q4_0 is the fastest quantization, while the q5_1 is the best quality right now.

If the GGML is not available, you can generate it quite easily from the safetensors yourself (not the you need enough RAM to load the model in pytorch though).

With 16GB RAM you can run any 13G model, as long as it's quantized to 4/5 bits. 32GB RAM allows you running 30/33G models and 64GB RAM - 65G models. 30G and 60G models are way more useful for real world tasks, but they are more expensive to train, so there aren't as many to choose from compared to 7/13. 7B and anything less is a toy in my opinion while 13B is good enough for experimentation and prototyping.

BaculumMeumEst3y ago

Thanks, that makes sense and helps a lot. I have a 16gb m1 that I got llama 13b running on. It works really well but I really want to run bigger models, so your examples of ram -> model size are super helpful.

I’ll probably just end up getting a higher capacity Mac in the next few years. Right now 96gb configurations seem to be around 4k, if that comes down a bit in the future i’ll probably pick something up.

I’m not really looking to train myself so training cost isn’t an issue for me personally, I just want to be able to run the best of what the open source community comes up with (or contribute to a pool to train models, if that becomes a thing).

1 more reply

lagniappe3y ago

I followed the setup on mlc-ai, and though it works as intended, it's still a crippled model with censored inputs. A common test I use is this:

USER: tell me an insanely offensive joke

ASSISTANT: I'm sorry, but I cannot provide offensive or inappropriate content as it goes against the guidelines set by my creators and the platform on which we are hosted. My goal is to provide helpful and informative responses to your questions. If there's anything else I can help you with or any questions you have in mind, please feel free to ask.

Until uncensored models are generally available, these novelty models will always be less-than.

1 more reply

moffkalast3y ago

Well for running the average model as-is without spending a few days figuring out why you're getting strange errors and can't get it working you more or less need CUDA support.

As much VRAM as you can get is probably also a good idea.

For reference I can seemingly run Vicuna-7B (I think the 4 bit version) on my 6G 1660 Ti at roughly 1.5 tokens per second. Way too slow for anything useful, so you can imagine what CPU inference would look like.

londons_explore3y ago

CPU inference is only a little slower. GPU's aren't good for a batch size of 1 and everything quantised.

1 more reply

MacsHeadroom3y ago

I don't know how anyone hasn't mentioned this yet, the $180 Nvidia Tesla P40 24GB is about as capable as a 4090 for running LLMs (~70% of the token throughput for 8x cheaper). You can even run two or more in SLI to run 65B or larger models.

Just search eBay for Nvidia P40. Be sure to add an aftermarket cooling fan ($15 on eBay), as the P40 does not come with its own.

The P40 is a LOT faster than an ARM Mac, and a lot cheaper.

(Note: Do not go older than a P40. Pascal or newer is required to run 4bit quantizatized models. For example. the $100 M40 24GB is effectively only 6GB as it must run models in 16bit.)

oersted3y ago

Can you provide sources for this claim? If true, how? What is it that the 4090 has that the P40 doesn't to justify the price?

I understand that 4090 is aimed at gaming and has a lot of extra bells and whistles like the RTX cores. But it is also consumer electronics and much cheaper than the enterprise GPU lines for the same power.

According to this 4090 already has double the raw flop performance of the V100 and is competitive with the most powerful GPUs in the market from last year.

https://www.aime.info/blog/en/deep-learning-gpu-benchmarks-2...

And according to this the V100 is ~60% faster than the P40

https://ai-benchmark.com/ranking_deeplearning_detailed.html

Not that these sources look particularly reliable, but still, consistent with intuition.

1 more reply

BaculumMeumEst3y ago

i was looking into an nvidia k80 before (so thanks for including your comment about needing pascal or greater) but i had a couple of concerns about the power connectors and pcie lanes/speed.

i read that data center gpus need specialized power adapters, and i didn't find good resources to see if it would be able to hook up to a consumer grade power supply or what adapters i would need

i think my tomahawk b450's pcie 3.0 x16 would suffice, but i'm not 100% sure if there would be bandwidth issues when running an nvme ssd alongside it

driver-wise i think i would be fine, i'm not sure if datacenter drivers are typically included in what's provided by linux distros but i'm sure i could make it work if not

and yeah i would definitely grab a cooler and probably undervolt and/or run it at a slightly lower clock speed to be safe

gtirloni3y ago

Check this article out to build a perspective a out how consumer hardware performs against more business solutions in the machine learning area: https://bizon-tech.com/gpu-benchmarks/NVIDIA-RTX-3090-vs-NVI...

riterOP3y ago

there are a few decent YT videos on this topic (dated 2022 though)

vast.ai is decent if you want to rent.

janmo3y ago· 8 in thread

A bit off topic but you better change the name and remove the GPT. OpenAI is claiming AI products that are using GPT in their name are causing confusion and is sending legal threats now.

One of many examples: https://twitter.com/pbteja1998/status/1654095756200931328

mirekrusin3y ago

Can you really use this acronym as a trademark?

They say "mark" and not "trademark" in this takedown notice probably for a reason - because they can't call it a trademark and it's not enforceable?

If somebody creates ClosedAI company with product called ChatLLM, does it mean they can start sending takedown notices to everybody left and right who is using "LLM" in their name?

halflings3y ago

> If somebody creates ClosedAI company with product called ChatLLM, does it mean they can start sending takedown notices to everybody left and right who is using "LLM" in their name?

Those things are nothing alike.

GPT is a very specific family of models, all created by OpenAI. The copy-cats came after OpenAI released those models, specifically to point out similarities with the OpenAI-created models.

LLM is a generic term and cannot be trademarked.

You can trademark "Coca Cola", but you can't trademark "Cola" or "Soda".

1 more reply

petesergeant3y ago

According to Wikipedia, "GPT" originated with OpenAI: "The concept and first such model were introduced in 2018 by the American artificial intelligence organization OpenAI"

JustBreath3y ago

OpenAI really ought to change their name to something more appropriate... It's getting to the point where it almost comes off as doublethink.

riterOP3y ago

I appreciate the feedback. I didn't realize they were acting on it. Would Rasa-LLM sound as compelling?

KaoruAoiShiho3y ago

Yes it sounds better and less confusing.

mirekrusin3y ago

Other good alternatives may include:

* KnowsItAllKaren

* GeniusJack

* GuruGary

* BotBecky

* ChatterBoxChantelle

* SmartypantsSam

2 more replies

danjc3y ago

Is it a GPT?

1 more reply

bravura3y ago· 5 in thread

Curious if people want to suggest alternatives to Rasa for writing stateful chatbots. Or share feedback about using Rasa.

lachlan_gray3y ago

I’m making a similar “chatbot backend” kind of thing right now, meant to be configurable and versatile, while avoiding dependencies. It’ll be compatible with stuff like llama index and pinecone but doesn’t impose it. I’ll have a minimal foundation ready this week, I can send it your way when it’s up if you’d like

aantti3y ago

This was an interesting read :) https://www.pinecone.io/learn/javascript-chatbot/

riterOP3y ago

the next best platform I could find for my friend I was helping was google's dialog flow. again, it was managed, closed-source opinionated and not as flexible. and most importantly design considerations were for a pre-LLM world.

i personally think there is an acute opportunity for creating a bare bones rasa built with LLMs in mind. the core concepts behind rasa are useful (domains, intents, actions, etc.) but the underlying NLU technology and assumptions around the platform are obsolete so 70% of the footprint is unnecessary.

just my humble Ξ0.02

dinerodiva3y ago

What would you use if you were building a consumer facing chat bot, if not Rasa?

sergioprada3y ago

I've been working on Motorhead! https://github.com/getmetal/motorhead

cehrlich3y ago· 5 in thread

I'd suggest to not put a bunch of 4chan memes in your product demos.

Der_Einzige3y ago

It worked just fine for the stable diffusion community, where automatic1111 puts a ton of credit to 4chan for the development of stable diffusion tooling

Beaver1173y ago

every meme started as a 4chan meme

becquerel3y ago

this is a great disrespect to the memes that were born on SA

riterOP3y ago

why is that exactly? is it offensive, if so I'm unaware and appreciate the feedback.

hereonout23y ago

Lots of references to that weird cartoon frog in the JSON output.

Using something that's been quite controversial in the past does seem at least a little naive ... https://en.m.wikipedia.org/wiki/Pepe_the_Frog

deet3y ago· 3 in thread

Tangentially, it's interesting seeing an open source project like this actually spin up a domain name, contact email, and some branding (the image in the Readme), for a project the author said was created in just a few days.

I wonder what the objective is for that extra polish. If it's optimizing star count growth, how much do these touches help?

riterOP3y ago

OP here.

that's a somewhat cynical interpretation. what if i just care about aesthetics and want to raise the bar.

my primary motivation was to get users of Rasa out of a directional hole bc that's where i was.

of course i like stars. it's a video game and i like winning. it was actually created in a few days all by me. no ulterior motive, literally indexing a solution to my problem from ~a week ago.

my bg is eng + product so i do these things as reflex and have a love for good UX.

nothing more. nothing less.

rodoxcasta3y ago

We are all crafters, and I admire the attention to these presentation points of OP. This is clear in the documentation and explanations at the repository too.

That said, I don't think the questioning of GP was malicious, just a natural curiosity. Yes, a little suspicious, but, well, we are in the internet after all. In the least, it's good to point when someone does the extra work to make a great presentation.

Anyway, great work riter!

1 more reply

deet3y ago

Sorry, I didn't mean to imply any nefarious ulterior motives here.

I'm more just intellectually curious about the dynamics of Github and marketing on it these days, whether it's for attracting contributors to non-commercial OSS projects or more commercial objectives where rapid growth leads to userbase, funding, etc.

The project looks quite interesting and I agree we need a way to bridge the gap between traditional bot creation frameworks and the more LLM-centric approaches of late.

1 more reply

xchip3y ago· 3 in thread

what problem is this solving?

MichaelMoser1233y ago

only google bard could tell me (chatgtp3.5 has a too old knowledge cutoff date). They are trying to get the capabilities of ChatGTP plugins (apparently)

>what is Langchain?

LangChain is a framework for developing applications powered by large language models (LLMs). It provides a standard interface for connecting LLMs to other sources of data, as well as for persisting state between calls of a chain/agent. LangChain also provides a collection of memory implementations, and examples of chains/agents that use memory.

Here are some of the features of LangChain:

    Data-aware: LangChain allows you to connect your LLM to other sources of data, such as databases, APIs, and file systems. This makes it possible to build applications that are more powerful and differentiated than those that simply call out to an LLM via an API.
    Agentic: LangChain allows you to build agents that use LLMs to make decisions and take actions. This makes it possible to build applications that can interact with their environment in a more sophisticated way.
    Chains: LangChain allows you to chain together multiple calls to an LLM, or to other utilities. This makes it possible to build more complex applications that can perform multiple tasks.

LangChain is a powerful tool for building applications powered by LLMs. It is easy to use and provides a wide range of features. If you are interested in building an application that uses an LLM, I encourage you to check out LangChain.

>is langchain trying to extend the LLM with plugins, similar to what is possible with ChatGtp?

Yes, LangChain is trying to extend the LLM with plugins, similar to what is possible with ChatGtp. This will allow users to add new functionality to their LLMs without having to modify the underlying code.

For example, a user could create a plugin that adds the ability to translate text, or to generate different creative text formats. This would allow the user to use their LLM for a wider range of tasks.

LangChain is still under development, but the team is working on making it easy for users to create and deploy plugins. They are also working on adding new features to the plugin ecosystem, such as the ability to share plugins and to rate plugins.

I believe that LangChain has the potential to be a powerful tool for developers who want to build applications powered by LLMs. The ability to extend the LLM with plugins will make it even more powerful and versatile.

MichaelMoser1233y ago

it seems that only an LLM is able to explain things about this field clearly, without recourse to jargon and without recursive acronyms.

1 more reply

riterOP3y ago

it solves how to integrate LLMs (Langchain) an application API pipeline with Rasa... of which I could not find an out-of-the-box public example on github. and so here we are :)

TL;DR: i solved a friend's headache (at the time)

riterOP3y ago· 2 in thread

Unfortunately there were not a whole lot of end-to-end examples of integrating Rasa with OpenAI nor functional boilerplates on github so I put a working prototype together in a few days and thus RasaGPT was bron.

RasaGPT is a python-based boilerplate and reference implementation of Rasa and Telegram utilizing an LLM library like Langchain for indexing, retrieval and context injection. FastAPI end-points are made available for you to build your application on top of. Features include:

- Automated hand-off to human if queries are out of bounds - "Training" pipeline done via API - Multi-tenant support - Generate category labels from questions - Works right out of the box with docker-compose - Ngrok reverse tunnel and dummy data included - Multiple use cases and a great starting point

Hope you like it, more @ rasagpt.dev

contravariant3y ago

I haven't worked with Rasa so I was wondering if I understood things correctly.

Are you using a language model to look up the correct reply to a particular response inside Rasa? Where Rasa presumably connects to some kind of backend to retrieve information or 'do stuff'?

riterOP3y ago

thanks for asking.

this implementation leverages Rasa and stands up a FastAPI server where it receives the user response webhook first and gets processed by (or bypasses) Rasa.

The LLM queries a set of documents indexed by Langchain. Dummy data has been included (Pepe Corp.)

Rasa has support for a "fallback" mechanism whereby if a user's response scores low on your pre-configured Rasa intents (like Greet) you can have it route directly to the LLM as well. But for now RasaGPT capture and routes the Telegram response to the FastAPI webhook endpoint.

the LLM itself and prompts I configured provides a boolean on whether the response should be escalated to a human or not, based on LLM+Langchain not knowing the answer to the user's query from the indexed documents.

I hope that answers your question, if not happy to follow-up!

0x0083y ago· 2 in thread

Can somebody ELI5 Rasa for me? I read through the README and I still don't get what it does.

riterOP3y ago

Totally. Rasa (https://github.com/RasaHQ/rasa) is an open source chatbot platform.

It allows you to setup "Input Channels" e.g. slack telegram, and has an intents and response pipeline.

It leverages pre-LLM NLU models (NLTK, BERT, etc.) to score intents and based on that intent it will automate a pre-configured response.

My implementation allows you directly route (or fallback to) to GPT-3 or GPT-4 via Langchain document retrieval. So essentially this is an example of a knowledgebase customer support bot.

I hope that makes sense, let me know if not!

zwaps3y ago

I guess it connects your langchain bot to some API like e.g. slack?

depr3y ago· 2 in thread

Can you actually build a reliable customer-facing chatbot on top of LLM's? With the "jailbreaking" and not knowing if it's actually using the data you're supplying it or other data it was trained on and so on.

riterOP3y ago

yes. there are a few approaches which i intend to take and some helpful resources:

You could implement a Dual LLM Pattern Model https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

You could also leverage a concept like Kor which is a kind of pydantic for LLMs: https://github.com/eyurtsev/kor

in short and as mentioned in the README.md this is absolutely vulnerable to prompt injection. I think this is not a fully solved issue but some interesting community research has been done to help address these things in production

depr3y ago

Thanks, I hadn't seen those. I did find https://github.com/NVIDIA/NeMo-Guardrails earlier but haven't looked into it yet.

I'm not sure it solves the problem of restricting the information it uses though. For example, as a proof of concept for a customer, I tried providing information from a vector database as context, but GPT would still answer questions that were not provided in that context. It would base its answers on information that was already crawled from the customer website and in the model. That is concerning because the website might get updated but you can't update the model yourself (among other reasons).

darepublic3y ago· 1 in thread

Everybody racing into the AI space to plant their flag and say "First!". But first isn't going to be correlated with the winner much, I'd wager

riterOP3y ago

OP here.

i agree. perhaps you're confused on the intent. the only flag being planted is for folks using rasa looking for a reference implementation just like i was a week ago. not sure if you're being intentionally cynical but trying is good thing. why? bc most ppl don't try. you make 0 of the shots you never take. and of course, if you're not intentionally being cynical -- gucci. if you are i encourage you to make your next comment substantial or encouraging :)

aantti3y ago

Also, with Haystack and a smaller Transformer model to address the long-tail of answers https://github.com/deepset-ai/rasa-haystack (and https://www.deepset.ai/blog/build-smart-conversational-agent...)

j / k navigate · click thread line to collapse

111 comments

65 comments · 12 top-level

data_maan3y ago· 12 in thread

I'm not sure what the advantage the use of a somewhat comprehensive framework like Langchain gives you for this use case?

riterOP3y ago

Not off-topic at all. After struggling with LangChain's hyper-opinionated implementation of classes I agree.

In fact, this is better off leveraging Llamaindex. This is a proof-of-concept and ultimately leveraging a library / framework helps afford the following:

- easy implementation of chunking strategies when you're unsure - OpenAI helper functions - embeddings and vector store management

So I actually agree with you :)

data_maan3y ago

Thanks for the insights.

I wonder if one needs even LlamaIndex?

From their site:

>Storing context in an easy-to-access format for prompt insertion.

>Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.

>Dealing with text splitting.

Not sure if it isn't easier to roll one's own for that...?

I know a thing or two about the math behind LLMs and all this software build around a few core ideas just seems to be a lot of overkill...

When mentioning about PGVevtor, did you refer to this repo or is there a class within LangChain that has the same name? https://github.com/pgvector/pgvector

3 more replies

_cs2017_3y ago

yawnxyz3y ago

Someone needs to create a “Langchain, but less complicated” framework

4 more replies

jvanderbot3y ago

But this is a turn-key llm, that is built on langchain? A user doesn't need to dig into langchain themselves, right?

1 more reply

lachlan_gray3y ago

LMQL (language model query language) is a different take on prompting, and I find it less restrictive and more intuitive. Langchain is to LMQL what Keras is to Tensorflow

https://lmql.ai/

eternalban3y ago

Thanks for the link. I skimmed the docs and couldn't find a motivation section. Can you expand on how you find it less restrictive and more intuitive.

My first impression is that this is a paradigm mismatch and an 'API' masquerading as a "language". LMQApi? Looks fine, and we have all the necessary ports for (query, model, []constraints, ...).

So what's the language bit? It's the 'scripted prompt'. That's the only bit that is reasonably a 'language', but as a language it is all over the place. Semantics are rather wild, don't you agree?

    sample(temperature=0.8)
       "A list of things not to forget when going to the sea (not travelling): \n"
       backpack = []
       for i in range(5):
          "-[THING]"
          backpack.append(THING.strip())
       print(backpack)
    from
       'openai/text-ada-001'
    where
       STOPS_AT(THING, "\n")

So, we really have an 'imperative' language part (the scripted prompt) and then a pretense at "declarative language" with the elaborated api call spelled out as a sqlish query.

p.s. I appreciate and laud the effort of the team which produced this. This is just feedback.

1 more reply

riterOP3y ago

very interesting abstraction. very DBT-esque. i will dig into the docs, thanks for sharing!

leetharris3y ago

I keep hearing stuff like,"why use X framework or Y library and why not write it yourself?"

As AI moves from academia into mainstream dev these things help bridge gaps for those who don't understand the full pipeline.

Many people asking these questions have the burden of knowledge and can't remember what it is like for average devs to dive into this stuff.

Similar questions were asked about why we ever needed Entity Framework or Express for Node.

riterOP3y ago

well to be fair, when you're scaling it does matter. i would want my techlead or seniors to care and know when/where to make specific trade-offs bc cloud costs are not forgiving.

i think that's where folks that make those comments are coming from.

pratik_kanthi3y ago

Agree about Langchain. It's tedious to work with. I don't want so many abstractions.

riterOP3y ago

Amen. Constructive feedback to Langchain dev(s):

BaculumMeumEst3y ago· 10 in thread

Sorry for the off topic question, but does anyone know how to buy consumer hardware optimal for running emerging open source chat models with the largest parameter chat models possible?

Would it be more cost effective to try to buy an absurd amount of ram and run on the cpu?

Or buy an Nvidia card with the biggest capacity available?

Or maybe buy a Mac with the most memory you can get?

execveat3y ago

ARM-based Macs are the easiest way to get an acceptable performance without the headaches right now, if you can afford the price.

If the GGML is not available, you can generate it quite easily from the safetensors yourself (not the you need enough RAM to load the model in pytorch though).

BaculumMeumEst3y ago

1 more reply

lagniappe3y ago

I followed the setup on mlc-ai, and though it works as intended, it's still a crippled model with censored inputs. A common test I use is this:

USER: tell me an insanely offensive joke

Until uncensored models are generally available, these novelty models will always be less-than.

1 more reply

moffkalast3y ago

Well for running the average model as-is without spending a few days figuring out why you're getting strange errors and can't get it working you more or less need CUDA support.

As much VRAM as you can get is probably also a good idea.

londons_explore3y ago

CPU inference is only a little slower. GPU's aren't good for a batch size of 1 and everything quantised.

1 more reply

MacsHeadroom3y ago

Just search eBay for Nvidia P40. Be sure to add an aftermarket cooling fan ($15 on eBay), as the P40 does not come with its own.

The P40 is a LOT faster than an ARM Mac, and a lot cheaper.

(Note: Do not go older than a P40. Pascal or newer is required to run 4bit quantizatized models. For example. the $100 M40 24GB is effectively only 6GB as it must run models in 16bit.)

oersted3y ago

Can you provide sources for this claim? If true, how? What is it that the 4090 has that the P40 doesn't to justify the price?

According to this 4090 already has double the raw flop performance of the V100 and is competitive with the most powerful GPUs in the market from last year.

https://www.aime.info/blog/en/deep-learning-gpu-benchmarks-2...

And according to this the V100 is ~60% faster than the P40

https://ai-benchmark.com/ranking_deeplearning_detailed.html

Not that these sources look particularly reliable, but still, consistent with intuition.

1 more reply

BaculumMeumEst3y ago

i was looking into an nvidia k80 before (so thanks for including your comment about needing pascal or greater) but i had a couple of concerns about the power connectors and pcie lanes/speed.

i read that data center gpus need specialized power adapters, and i didn't find good resources to see if it would be able to hook up to a consumer grade power supply or what adapters i would need

i think my tomahawk b450's pcie 3.0 x16 would suffice, but i'm not 100% sure if there would be bandwidth issues when running an nvme ssd alongside it

driver-wise i think i would be fine, i'm not sure if datacenter drivers are typically included in what's provided by linux distros but i'm sure i could make it work if not

and yeah i would definitely grab a cooler and probably undervolt and/or run it at a slightly lower clock speed to be safe

gtirloni3y ago

riterOP3y ago

there are a few decent YT videos on this topic (dated 2022 though)

vast.ai is decent if you want to rent.

janmo3y ago· 8 in thread

A bit off topic but you better change the name and remove the GPT. OpenAI is claiming AI products that are using GPT in their name are causing confusion and is sending legal threats now.

One of many examples: https://twitter.com/pbteja1998/status/1654095756200931328

mirekrusin3y ago

Can you really use this acronym as a trademark?

They say "mark" and not "trademark" in this takedown notice probably for a reason - because they can't call it a trademark and it's not enforceable?

If somebody creates ClosedAI company with product called ChatLLM, does it mean they can start sending takedown notices to everybody left and right who is using "LLM" in their name?

halflings3y ago

> If somebody creates ClosedAI company with product called ChatLLM, does it mean they can start sending takedown notices to everybody left and right who is using "LLM" in their name?

Those things are nothing alike.

GPT is a very specific family of models, all created by OpenAI. The copy-cats came after OpenAI released those models, specifically to point out similarities with the OpenAI-created models.

LLM is a generic term and cannot be trademarked.

You can trademark "Coca Cola", but you can't trademark "Cola" or "Soda".

1 more reply

petesergeant3y ago

According to Wikipedia, "GPT" originated with OpenAI: "The concept and first such model were introduced in 2018 by the American artificial intelligence organization OpenAI"

JustBreath3y ago

OpenAI really ought to change their name to something more appropriate... It's getting to the point where it almost comes off as doublethink.

riterOP3y ago

I appreciate the feedback. I didn't realize they were acting on it. Would Rasa-LLM sound as compelling?

KaoruAoiShiho3y ago

Yes it sounds better and less confusing.

mirekrusin3y ago

Other good alternatives may include:

* KnowsItAllKaren

* GeniusJack

* GuruGary

* BotBecky

* ChatterBoxChantelle

* SmartypantsSam

2 more replies

danjc3y ago

Is it a GPT?

1 more reply

bravura3y ago· 5 in thread

Curious if people want to suggest alternatives to Rasa for writing stateful chatbots. Or share feedback about using Rasa.

lachlan_gray3y ago

aantti3y ago

This was an interesting read :) https://www.pinecone.io/learn/javascript-chatbot/

riterOP3y ago

just my humble Ξ0.02

dinerodiva3y ago

What would you use if you were building a consumer facing chat bot, if not Rasa?

sergioprada3y ago

I've been working on Motorhead! https://github.com/getmetal/motorhead

cehrlich3y ago· 5 in thread

I'd suggest to not put a bunch of 4chan memes in your product demos.

Der_Einzige3y ago

It worked just fine for the stable diffusion community, where automatic1111 puts a ton of credit to 4chan for the development of stable diffusion tooling

Beaver1173y ago

every meme started as a 4chan meme

becquerel3y ago

this is a great disrespect to the memes that were born on SA

riterOP3y ago

why is that exactly? is it offensive, if so I'm unaware and appreciate the feedback.

hereonout23y ago

Lots of references to that weird cartoon frog in the JSON output.

Using something that's been quite controversial in the past does seem at least a little naive ... https://en.m.wikipedia.org/wiki/Pepe_the_Frog

deet3y ago· 3 in thread

I wonder what the objective is for that extra polish. If it's optimizing star count growth, how much do these touches help?

riterOP3y ago

OP here.

that's a somewhat cynical interpretation. what if i just care about aesthetics and want to raise the bar.

my primary motivation was to get users of Rasa out of a directional hole bc that's where i was.

of course i like stars. it's a video game and i like winning. it was actually created in a few days all by me. no ulterior motive, literally indexing a solution to my problem from ~a week ago.

my bg is eng + product so i do these things as reflex and have a love for good UX.

nothing more. nothing less.

rodoxcasta3y ago

We are all crafters, and I admire the attention to these presentation points of OP. This is clear in the documentation and explanations at the repository too.

Anyway, great work riter!

1 more reply

deet3y ago

Sorry, I didn't mean to imply any nefarious ulterior motives here.

The project looks quite interesting and I agree we need a way to bridge the gap between traditional bot creation frameworks and the more LLM-centric approaches of late.

1 more reply

xchip3y ago· 3 in thread

what problem is this solving?

MichaelMoser1233y ago

only google bard could tell me (chatgtp3.5 has a too old knowledge cutoff date). They are trying to get the capabilities of ChatGTP plugins (apparently)

>what is Langchain?

Here are some of the features of LangChain:

    Data-aware: LangChain allows you to connect your LLM to other sources of data, such as databases, APIs, and file systems. This makes it possible to build applications that are more powerful and differentiated than those that simply call out to an LLM via an API.
    Agentic: LangChain allows you to build agents that use LLMs to make decisions and take actions. This makes it possible to build applications that can interact with their environment in a more sophisticated way.
    Chains: LangChain allows you to chain together multiple calls to an LLM, or to other utilities. This makes it possible to build more complex applications that can perform multiple tasks.

>is langchain trying to extend the LLM with plugins, similar to what is possible with ChatGtp?

MichaelMoser1233y ago

it seems that only an LLM is able to explain things about this field clearly, without recourse to jargon and without recursive acronyms.

1 more reply

riterOP3y ago

it solves how to integrate LLMs (Langchain) an application API pipeline with Rasa... of which I could not find an out-of-the-box public example on github. and so here we are :)

TL;DR: i solved a friend's headache (at the time)

riterOP3y ago· 2 in thread

Hope you like it, more @ rasagpt.dev

contravariant3y ago

I haven't worked with Rasa so I was wondering if I understood things correctly.

Are you using a language model to look up the correct reply to a particular response inside Rasa? Where Rasa presumably connects to some kind of backend to retrieve information or 'do stuff'?

riterOP3y ago

thanks for asking.

this implementation leverages Rasa and stands up a FastAPI server where it receives the user response webhook first and gets processed by (or bypasses) Rasa.

The LLM queries a set of documents indexed by Langchain. Dummy data has been included (Pepe Corp.)

I hope that answers your question, if not happy to follow-up!

0x0083y ago· 2 in thread

Can somebody ELI5 Rasa for me? I read through the README and I still don't get what it does.

riterOP3y ago

Totally. Rasa (https://github.com/RasaHQ/rasa) is an open source chatbot platform.

It allows you to setup "Input Channels" e.g. slack telegram, and has an intents and response pipeline.

It leverages pre-LLM NLU models (NLTK, BERT, etc.) to score intents and based on that intent it will automate a pre-configured response.

My implementation allows you directly route (or fallback to) to GPT-3 or GPT-4 via Langchain document retrieval. So essentially this is an example of a knowledgebase customer support bot.

I hope that makes sense, let me know if not!

zwaps3y ago

I guess it connects your langchain bot to some API like e.g. slack?

depr3y ago· 2 in thread

riterOP3y ago

yes. there are a few approaches which i intend to take and some helpful resources:

You could implement a Dual LLM Pattern Model https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

You could also leverage a concept like Kor which is a kind of pydantic for LLMs: https://github.com/eyurtsev/kor

depr3y ago

Thanks, I hadn't seen those. I did find https://github.com/NVIDIA/NeMo-Guardrails earlier but haven't looked into it yet.

darepublic3y ago· 1 in thread

Everybody racing into the AI space to plant their flag and say "First!". But first isn't going to be correlated with the winner much, I'd wager

riterOP3y ago

OP here.

aantti3y ago

j / k navigate · click thread line to collapse