Releasing 3B and 7B RedPajama (opens in new tab)

(together.xyz)

363 pointsantimatter153y ago106 comments

106 comments

Slightly off-topic, but as the parent of a toddler, I got a bit of a chuckle out of the name. It's based off the children's book series of "Llama Llama Red Pajama"

petesergeant3y ago

It had put me in mind of the Ogden Nash poem:

    The one-l lama,
    He's a priest.
    The two-l llama,
    He's a beast.
    And I will bet
    A silk pajama
    There isn't any
    Three-l lllama.

dllthomas3y ago

"*The author's attention has been called to a type of conflagration known as a three-alarmer. Pooh."

elkos3y ago

Thanks.

As non-native English speaker (while though a parent of a toddler too) I wasn't familiar with the book series.

Auracle3y ago

As the father of an 18 month old daughter that likes the book, I have it memorized.

dllthomas3y ago

I'm holding out for the MadAtMama model.

blurbleblurble3y ago

Not off topic at all

innagadadavida3y ago

Founder ex-apple Siri search. Had a baby a couple of years ago. Not too surprising to me :)

rawrmaan3y ago

There was a lot of detail and data in here, but it's not very useful to me because all of the comparisons are to things I have no experience with.

There's really only one thing I care about: How does this compare to GPT-4?

I have no use for models that aren't at that level. Even though this almost definitely isn't at that level, it's hard to know how close or far it is from the data presented.

Joeri3y ago

None of the 3B and 7B models are at ChatGPT’s level, let alone GPT-4. The 13B models start doing really interesting things, but you don’t get near ChatGPT results until you move up to the best 30B and 65B models, which require beefier hardware. Nothing out there right now approximates GPT-4.

The big story here for me is that the difference in training set is what makes the difference in quality. There is no secret sauce, the open source architectures do well, provided you give them a large and diverse enough training set. That would mean it is just a matter of pooling resources to train really capable open source models. That makes what RedPajama is doing, compiling the best open dataset, very important for the future of high quality open source LLM’s.

If you want to play around with this yourself you can install oobabooga and figure out what model fits your hardware from the locallama reddit wiki. The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM. I’ve had lots of fun talking to 7B and 13B alpaca and vicuna models running locally.

https://www.reddit.com/r/LocalLLaMA/wiki/models/

nullsense3y ago

LLaVA 13B is a great multimodal model that has first class support in oobabooga too.

It's really fun to enable both the whisper extension and the TTS extension and have two-way voice chats with your computer while being able to send it pictures as well. Truly mind bending.

Quantized 30B models run at acceptable speeds on decent hardware and are pretty capable. It's my understanding that the open source community is iterating extremely fast on small model sizes getting the most out of them by pushing the data quality higher and higher, and then they plan to scale up to at least 30B parameter models.

I really can't wait to see the results of that process. In the end you're going to have a 30B model that's totally uncensored and is a mix of Wizard + Vicuna. It's going to be a veryyyy capable model.

stavros3y ago

I usually even prefer GPT-3.5, as it's faster and much cheaper. GPT-4 is great for the hardcore logical reasoning, but when I want something that knows to turn my lights on and turn the TV to a channel, it's overkill.

Semaphor3y ago

> The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM.

Bigger ones as well, you just have to wait longer. Nothing for real time usage, but if you can wait 10-20 minutes, you can use them on CPU.

1 more reply

azinman23y ago

Do these red pajama models work with llama.cpp?

2 more replies

quickthrower23y ago

The bit I liked best was the response examples. Look at those. Clearly not as good as GPT-4 but good enough I feel that for say a scenario where you care about privacy or data provenance this would be a contender.

For example a therapist, a search bot for you diary, a company intranet help bot. Anything where the prompt contains something you don’t want to send to a third party.

rawrmaan3y ago

That's a great point, I definitely overlooked these. They look pretty good, too, and I agree with your use cases.

Thanks!

blihp3y ago

Then you probably don't care about this (yet)

Assume a truly competitive model in the Open Source world is still a ways off. These teams and their infrastructure are still in their early days while OpenAI is more at the fine-tuning and polishing stage. The fact that these open teams are able to have something in the same universe in terms of functionality this fast is pretty amazing... but it will take time before there's an artifact that will be a strong competitor.

nullsense3y ago

The pace of the progress the open source models are making is pretty astonishing. The smaller model sizes are cheap to train so there is a lot of iteration by many different teams. People are also combining proven approaches together. Then they're going to nail it and scale it. Will be very interesting to see where we are in 3 months time.

noman-land3y ago

There's a nice chart in the leaked Google memos that compares some of the open models against ChatGPT and Bard so you can get a sense where these models land by comparing them to these.

https://twitter.com/jelleprins/status/1654197282311491592

1 more reply

atleastoptimal3y ago

> How does this compare to GPT-4?

I'll give you the answer for every open source model over the next 2 years: It's far worse

MacsHeadroom3y ago

If you'd said that about OpenAI's DALL-E 2 you'd have been wrong.

I suspect Open Source LLMs will outpace the release version of GPT-4 before the end of this year.

It's less likely they will outpace whatever version of GPT-4 is shipped later this year, but still very much possible.

Sugimot03y ago

Relevant post: https://news.ycombinator.com/item?id=35813322

int_19h3y ago

Open source LLMs might do that, but I very much doubt that those models will be small enough to run even on high-end consumer hardware (like say RTX 3090 or 4090).

1 more reply

detrites3y ago

That seems way off the mark.

Open source models can already approximate GPT-3.5 for most tasks on common home hardware, right now.

fortyseven3y ago

Okay, so "ignore my out of touch opinion of language models". Got it.

andy_xor_andrew3y ago

This is beyond exciting. Welcome to the new reality!

On one hand, the resources required to run these models continues falling dramatically, thanks to the techniques discovered by researchers: GPTQ quantizing down to 4, 3, 2, even 1 bits! model pruning! hybrid vram offloading! better, more efficient architectures! 1-click finetuning on consumer hardware! Of course, the free lunches won't last forever, and this will level off, but it's still incredible.

And on the other side of the coin, the power of all computing devices continues its ever-upward exponential growth.

So you have a continuous lowering of requirements, combined with a continuous increase in available power... surely these two trends will collide, and I can only imagine what this stuff will be like at that intersection.

quickthrower23y ago

I would love to see an article on why quantising to low bits works. Seems counterintuitive to me. For example do that with a CD and it will sound awful. It took smarts to come up with mp3 format rather than just reduce number of bits.

int_19h3y ago

A very broad answer is that large NNs are surprisingly resilient to inaccuracies, and it seems to be more pronounced as size grows larger. This is readily observable with LLaMA, where 4-bit quantization affects 7B worst of all.

Furthermore, model size is still the most significant contributor to output quality. E.g. vanilla llama-30b at 4-bit has better perplexity than any llama-13b finetune at 8-bit. Thus, if 4-bit lets you fit a larger model into available (V)RAM, you're still better off.

This is also why analog computing is seriously considered as a hardware architecture for LLMs: if you don't actually need bit-perfect matmul for things to work well, it can be done much simpler as an analog circuit, and then you can cram a lot more of them on the same chip. Any resulting quality loss would presumably be minor, and in any case would be more than compensated by the much larger model sizes allowed by such architecture.

magicalhippo3y ago

Note, I'm not into ML though I've dabbled with NNs as a teen (before deep learning and all that).

The weights scale the output values from the previous layer, and the weighted values are summed. So it seems to me, instead of having a high-precision weight scale a single output, if you cloned the node in the previous layer M times, you could still have sqrt(M) bits of precision with 1-bit weights (or M bits, my brain is in weekend mode).

Thus a larger network with lower-precision weights should have the ability to have approximately the same precision as a smaller network with high-precision weights.

The larger network has more interconnects though, so seems like it could allow for more interesting space to explore during training, leading to better results.

Then again, I could be entirely wrong.

deepsquirrelnet3y ago

A CD doesn’t work as an analogy. Think about it this way — if you build a model and don’t train it at all, it will still have the same number of parameters and take up the same amount of disk space.

We’re finding out that many models are undertrained for their sizes, and a good option is to post process them into smaller models by teaching a smaller model to mimic their output. Quantization effectively cuts down the model size as well. No loss in quality means that the model has not been trained enough to take advantage of the depth of precision that is available.

specproc3y ago

The analogy I'm currently favouring when talking to semi technical people is that LLMs are a map. We map words and phrases to a coordinate space.

We can use GPS to locate anything down to a sliding scale of decimal precision. There are only so many digits you need to locate a city or even a house.

Der_Einzige3y ago

I think a lot of it is that they are intentionally not measuring the "degradation" in quality experienced. I've noticed that 8 bit quantization of a model like dolly is significantly worse than the 32bit version of it. Seen similar results with using quantization with stable diffusion - the images really are worse, just so little at half percision that it's worth the trade-off.

3 more replies

visarga3y ago

At that intersection is the "Good Enough Model" that can solve 95% of our needs in full privacy and with complete customisability. The key point is being easy to run on every device. We'll still use proprietary, expensive models for the rest of 5%.

wcunning3y ago

Do you have reading links on the consumer hardware fine tuning? I can’t find much from that description…

Oranguru3y ago

Take a look at: https://huggingface.co/blog/trl-peft

knaik943y ago

I have been really impressed with the uncensored WizardLM I was playing with. Having a truely open uncensored model to work with is a really important research tool. Censorship of the training data and results in such a heavy handed way is not really possible without lowering the quality of all output.

As the resouces required to train and fine tune these models becomes consumer handware friendly, I think we'll see a shift towards a bunch of smaller models. Open models like these also mean the results of securty and capability research is publicly available. Models like this one and the Replit code model will become the new base all open source models are based on. I am really looking forward to the gptj 4bit, cuda optimized 7b models, the others I have tested run fast on 2070max q and 16gb ram, I was getting ~7tokens/second. Lora can work directly with 4bit quantized models. While ggml, cpu models are very strong, I don't believe we're move away from gpu accelarated training and fine tuning anytime soon.

regularfry3y ago

The thing is that anything that benefits the bottom end also should reflect up and help the top end too, if they're paying attention.

practice93y ago

Models replicating LLaMA are cool, but they are all missing proper multilingual support, which GPT-3.5 is quite good at.

mirekrusin3y ago

IMHO multilingual support would just pollute precious available estate in those models. Why not use it in english and use another one for translation?

viraptor3y ago

That would work if all information is available in English as the primary language. That's not the case though. You may be missing out on interesting information if you're skipping other languages.

espadrine3y ago

It depends on your use.

LLaMA’s main issue is that its license prevents commercial use.

If you want to use a LLM inside of a product, you may need to internationalize it at some point, so multilingual support matters.

tyfon3y ago

Llama 65B is actually quite decent in other languages. I can just barely fit it in memory though with my 128 gb ram. Usually I run the 8 bit quantized version that use 80, but even the 4 and 3 but are ok compared to the fp16 30B version.

ftxbro3y ago

With this one and mosaicml we now got so many of these consumer-gpu-sized models!

wtarreau3y ago

That's very interesting to perform basic tasks at reasonable speeds or to run on smaller systems. Unfortunately it's not of the many ones based on python and transformers, so all gained resources from the compact model are wasted by the heavy engine and ecosystem, and even a 4GB machine with 4G swap goes oom because the loaded data gets duplicated in memory using read() and malloc() :-(

Let's wait for someone to port it to a cheaper and more powerful C-based engine like llama-cpp.

nico3y ago

idea: linked parameters / models tree

build a model that can change the number of parameters in the vicinity of some meaning, effectively increasing the local resolution around that meaning

so parameter space becomes linked-parameter space, between models

links could be pruned based on activation frequency

another way of seeing the concept is a tree of models/llms

and one additional model/llm that all it does is manage the tree (ie. build it as it goes, use it to infer, prune it, etc)

Or is it too dumb what I’m saying?

ftxbro3y ago

So I tried RedPajama-INCITE-Instruct-7B-v0.1 and the AutoModelForCausalLM.from_pretrained(...) call takes two minutes every time. My GPU is big enough. I don't know why it's so slow. I feel like it's somehow precomputing stuff that can be used across queries, and I had hoped that this stuff would have already been precomputed on the disk and I could just load it up.

born-jre3y ago

i also wonder how powerful will 3b model will be ? can it act as a prompt router where it can make API call to ChatGPT or other specified model for actual processing. its probably possible to do this with langchain but i have not tried it yet.

ibitto3y ago

I am really interested in knowing what people are using these smaller models for. I have seen a lot of projects on top of GPT-3.5 / GPT-4, but I have yet to see any using these smaller models.

mirker3y ago

Does anyone have experience using these open source models in production?

flatiron3y ago

Doubtful since they were released yesterday. That being said I will be deploying something to our lab this week to play with.

acapybara3y ago

I've been following the RedPajama project closely and I must say, it's quite an impressive undertaking. The fact that it's all open-source, and the collaboration between various institutions, is nothing short of amazing. This shows the power of the open-source community in action, with a bunch of smart people coming together to build something truly remarkable.

The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.

As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.

One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.

SeanAnderson3y ago

Sorry, excuse my ignorance, but why is having access to a 3B model a gamechanger?

I played with a pirated 7B model a while back. My computer runs a 1080 TI - so it used to be good but now it's pretty old. The model ran with a reasonable number of tokens/sec, but the quality was just trash compared to what I'd grown used to with ChatGPT. It was a novelty I interacted with for just a single evening.

I truly don't understand the use case for a 3B model with our current technologies.

What are you going to use it for?

examplary_cable3y ago

You can ultra fine tune those models ... look at vicune 13B, if you know how to prompt it well, you can get it to work as """"well"""" as ChatGPT. Running on local hardware .... I just got vicune 13b on gradio[1] to act as japanese kanji personal trainer, and I've only used a simple prompt: "I want you to act as a Japanese Kanji quiz machine. Each time I ask you for the next question, you are to provide one random Japanese kanji from JLPT N5 kanji list and ask for its meaning. You will generate four options, one correct, three wrong. The options will be labeled from A to D. I will reply to you with one letter, corresponding to one of these labels. You will evaluate my each answer based on your last question and tell me if I chose the right option. If I chose the right label, you will congratulate me. Otherwise you will tell me the right answer. Then you will ask me the next question. Avoid simple kanjis, let's go."

[1] https://chat.lmsys.org/

wongarsu3y ago

Sure, a 13B model can be fine-tuned to be pretty decent, which is quite remarkable compared to GPT3's 175B paramters. But a 3B model has 1/4th as many parameters as Vicune-13B, or about twice as many as GPT2. Can you really fine-tune that to do anything useful that wouldn't be better handled by a more specialized open-source model?

cced3y ago

How can someone get into using these models? How does ‘tuning’ work? How might I go about using these models for doing things like say summarizing news articles or video transcriptions? When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model?

2 more replies

ym5553y ago

While I recognize that this only one example of what you can do, you can just ask chatgpt to program you a traditional program that does something like this and not have to run a (pretty big/power-intensive/slow on most hardware) 3B/7B parameter model for simple tasks like these.

Yeah it wouldn't be as flexible as a LLM (for example synonyms won't work), but I doubt that for this particular task it'll be that big of problem, and you can ask it to tweak the program in various ways (for example introducing crude spaced-repetition) making it arguably better than the AI solution which takes sometime to prompt engineer and will never be "perfect".

I don't really know how much better fine-tuning makes these models, so I can't think of anything that they can actually be used for where they aren't worse than traditional programs, maybe as an AI in games? for example making them role-play as a historical figure in Civilization 6.

1 more reply

ttt3ts3y ago

Finetuning which can easily be done on consumer hardware and can give these models a lot more power for specific applications.

Also, ChatGPT just can't do a lot of things because of their "rules". I was doing question answering about products on Amazon with ChatGPT and refused to answer any questions about underwear, certain books/videos, etc

elorant3y ago

Depends on what you want it for. Chatting isn't the only application. For text summarization a model like Vicuna-13b has similar performance to ChatGPT 3.5. Fine-tuned models like the one in this thread might perform way better than the initial ones that leaked from Meta. The important thing is that there's constant progress in this area from the Open Source community and we're about to see amazing things in the future.

barbariangrunge3y ago

I'm in the market for a laptop. If I was crazy and wanted to run or train models like these, what kind of resources would I need?

Would the way the m2 MacBooks share memory be an advantage, or would the lack of cuda support be a killer? Can you do anything with 16GB, or do you need 128gb or something like that? How large are the datasets?

I've only used scikit-learn and pandas so far, I'm not very familiar with neural networks yet

zamnos3y ago

It's not crazy to want to train or run models like these, it's actually quite popular right now! :) The question for you to answer is how handy with scikit-learn and pandas are you, and how much do you want to be on the bleeding edge of things? Most stuff is coming out for CUDA first, since that's what the industrial grade GPUS (A100s) use, so with Apple Arm you either have to wait for someone to port it, or port it yourself.

On the other hand, getting > 8 GiB VRAM on a laptop GPU is rare; you're definitely not getting 128 GiB VRAM, so Apple Arm, with 32 or 64 GiB or RAM (get 128 if you can afford it) is going to get you more gigabytes of usable RAM for training/inference.

1 more reply

youssefabdelm3y ago

Completely agree. Perhaps they were planning to fine-tune it for something though.

acapybara3y ago

Hey SeanAnderson, good question! While parameter count is certainly an important factor in model performance, it's not the only one. The RedPajama project is taking a more nuanced approach to understanding what makes a model perform well, and their focus on smaller models like the 3B is a big part of that.

Sure, you may have played with a 7B model in the past, but that doesn't mean there's no use case for a smaller model like the 3B. In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models. Plus, smaller models are generally faster and more accessible, which is always a plus.

wokwokwok3y ago

> In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models.

So we are all in agreement here that a 3B model is fundamentally inferior to a larger model?

Not that it doesn’t have uses; not that there’s no value in research in small models.

Just, honestly, that these smaller models don’t have the capabilities of the larger models.

It’d be good to be a direct acknowledgment of that, because it seems like you’re going out of your way to promote the “it’s fine to have a small model”; and it is, roughly speaking. Parameter count isn’t everything. Small models are accessible, you can easily fine tune them. They are interesting.

…but, they are not as good, as far as I’m aware, in terms of output, in terms of general purpose function, as larger models.

3 more replies

robertlagrant3y ago

> Hey SeanAnderson, good question! While parameter count is certainly an important factor in model performance, it's not the only one. The RedPajama project is taking a more nuanced approach to understanding what makes a model perform well, and their focus on smaller models like the 3B is a big part of that.

It's hard to pick out the actual answer: what is the application that this is good at? What has their "more nuanced" approach to understanding performance increased this model's performance at doing?

hhh3y ago

is this comment generated by an LLM?

Sunhold3y ago

Took me a bit to realize this comment was written by an LLM.

awegio3y ago

How did you realize it here? This user has multiple comments in this thread but this one actually sounds more normal than the others.

I find it very uncanny to see comments like this that sound like ChatGPT but are surprisingly relevant to the discussion.

2 more replies

j / k navigate · click thread line to collapse

106 comments

sphars3y ago

Slightly off-topic, but as the parent of a toddler, I got a bit of a chuckle out of the name. It's based off the children's book series of "Llama Llama Red Pajama"

petesergeant3y ago

It had put me in mind of the Ogden Nash poem:

    The one-l lama,
    He's a priest.
    The two-l llama,
    He's a beast.
    And I will bet
    A silk pajama
    There isn't any
    Three-l lllama.

dllthomas3y ago

"*The author's attention has been called to a type of conflagration known as a three-alarmer. Pooh."

elkos3y ago

Thanks.

As non-native English speaker (while though a parent of a toddler too) I wasn't familiar with the book series.

Auracle3y ago

As the father of an 18 month old daughter that likes the book, I have it memorized.

dllthomas3y ago

I'm holding out for the MadAtMama model.

blurbleblurble3y ago

Not off topic at all

innagadadavida3y ago

Founder ex-apple Siri search. Had a baby a couple of years ago. Not too surprising to me :)

rawrmaan3y ago

There was a lot of detail and data in here, but it's not very useful to me because all of the comparisons are to things I have no experience with.

There's really only one thing I care about: How does this compare to GPT-4?

I have no use for models that aren't at that level. Even though this almost definitely isn't at that level, it's hard to know how close or far it is from the data presented.

Joeri3y ago

https://www.reddit.com/r/LocalLLaMA/wiki/models/

nullsense3y ago

LLaVA 13B is a great multimodal model that has first class support in oobabooga too.

It's really fun to enable both the whisper extension and the TTS extension and have two-way voice chats with your computer while being able to send it pictures as well. Truly mind bending.

stavros3y ago

Semaphor3y ago

> The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM.

Bigger ones as well, you just have to wait longer. Nothing for real time usage, but if you can wait 10-20 minutes, you can use them on CPU.

1 more reply

azinman23y ago

Do these red pajama models work with llama.cpp?

2 more replies

quickthrower23y ago

For example a therapist, a search bot for you diary, a company intranet help bot. Anything where the prompt contains something you don’t want to send to a third party.

rawrmaan3y ago

That's a great point, I definitely overlooked these. They look pretty good, too, and I agree with your use cases.

Thanks!

blihp3y ago

Then you probably don't care about this (yet)

nullsense3y ago

noman-land3y ago

There's a nice chart in the leaked Google memos that compares some of the open models against ChatGPT and Bard so you can get a sense where these models land by comparing them to these.

https://twitter.com/jelleprins/status/1654197282311491592

1 more reply

atleastoptimal3y ago

> How does this compare to GPT-4?

I'll give you the answer for every open source model over the next 2 years: It's far worse

MacsHeadroom3y ago

If you'd said that about OpenAI's DALL-E 2 you'd have been wrong.

I suspect Open Source LLMs will outpace the release version of GPT-4 before the end of this year.

It's less likely they will outpace whatever version of GPT-4 is shipped later this year, but still very much possible.

Sugimot03y ago

Relevant post: https://news.ycombinator.com/item?id=35813322

int_19h3y ago

Open source LLMs might do that, but I very much doubt that those models will be small enough to run even on high-end consumer hardware (like say RTX 3090 or 4090).

1 more reply

detrites3y ago

That seems way off the mark.

Open source models can already approximate GPT-3.5 for most tasks on common home hardware, right now.

fortyseven3y ago

Okay, so "ignore my out of touch opinion of language models". Got it.

andy_xor_andrew3y ago

This is beyond exciting. Welcome to the new reality!

And on the other side of the coin, the power of all computing devices continues its ever-upward exponential growth.

quickthrower23y ago

int_19h3y ago

magicalhippo3y ago

Note, I'm not into ML though I've dabbled with NNs as a teen (before deep learning and all that).

Thus a larger network with lower-precision weights should have the ability to have approximately the same precision as a smaller network with high-precision weights.

The larger network has more interconnects though, so seems like it could allow for more interesting space to explore during training, leading to better results.

Then again, I could be entirely wrong.

deepsquirrelnet3y ago

specproc3y ago

The analogy I'm currently favouring when talking to semi technical people is that LLMs are a map. We map words and phrases to a coordinate space.

We can use GPS to locate anything down to a sliding scale of decimal precision. There are only so many digits you need to locate a city or even a house.

Der_Einzige3y ago

3 more replies

visarga3y ago

wcunning3y ago

Do you have reading links on the consumer hardware fine tuning? I can’t find much from that description…

Oranguru3y ago

Take a look at: https://huggingface.co/blog/trl-peft

knaik943y ago

regularfry3y ago

The thing is that anything that benefits the bottom end also should reflect up and help the top end too, if they're paying attention.

practice93y ago

Models replicating LLaMA are cool, but they are all missing proper multilingual support, which GPT-3.5 is quite good at.

mirekrusin3y ago

IMHO multilingual support would just pollute precious available estate in those models. Why not use it in english and use another one for translation?

viraptor3y ago

That would work if all information is available in English as the primary language. That's not the case though. You may be missing out on interesting information if you're skipping other languages.

espadrine3y ago

It depends on your use.

LLaMA’s main issue is that its license prevents commercial use.

If you want to use a LLM inside of a product, you may need to internationalize it at some point, so multilingual support matters.

tyfon3y ago

ftxbro3y ago

With this one and mosaicml we now got so many of these consumer-gpu-sized models!

wtarreau3y ago

Let's wait for someone to port it to a cheaper and more powerful C-based engine like llama-cpp.

nico3y ago

idea: linked parameters / models tree

build a model that can change the number of parameters in the vicinity of some meaning, effectively increasing the local resolution around that meaning

so parameter space becomes linked-parameter space, between models

links could be pruned based on activation frequency

another way of seeing the concept is a tree of models/llms

and one additional model/llm that all it does is manage the tree (ie. build it as it goes, use it to infer, prune it, etc)

Or is it too dumb what I’m saying?

ftxbro3y ago

born-jre3y ago

ibitto3y ago

I am really interested in knowing what people are using these smaller models for. I have seen a lot of projects on top of GPT-3.5 / GPT-4, but I have yet to see any using these smaller models.

mirker3y ago

Does anyone have experience using these open source models in production?

flatiron3y ago

Doubtful since they were released yesterday. That being said I will be deploying something to our lab this week to play with.

acapybara3y ago

SeanAnderson3y ago

Sorry, excuse my ignorance, but why is having access to a 3B model a gamechanger?

I truly don't understand the use case for a 3B model with our current technologies.

What are you going to use it for?

examplary_cable3y ago

[1] https://chat.lmsys.org/

wongarsu3y ago

cced3y ago

2 more replies

ym5553y ago

1 more reply

ttt3ts3y ago

Finetuning which can easily be done on consumer hardware and can give these models a lot more power for specific applications.

elorant3y ago

barbariangrunge3y ago

I'm in the market for a laptop. If I was crazy and wanted to run or train models like these, what kind of resources would I need?

I've only used scikit-learn and pandas so far, I'm not very familiar with neural networks yet

zamnos3y ago

1 more reply

youssefabdelm3y ago

Completely agree. Perhaps they were planning to fine-tune it for something though.

acapybara3y ago

wokwokwok3y ago

> In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models.

So we are all in agreement here that a 3B model is fundamentally inferior to a larger model?

Not that it doesn’t have uses; not that there’s no value in research in small models.

Just, honestly, that these smaller models don’t have the capabilities of the larger models.

…but, they are not as good, as far as I’m aware, in terms of output, in terms of general purpose function, as larger models.

3 more replies

robertlagrant3y ago

It's hard to pick out the actual answer: what is the application that this is good at? What has their "more nuanced" approach to understanding performance increased this model's performance at doing?

hhh3y ago

is this comment generated by an LLM?

Sunhold3y ago

Took me a bit to realize this comment was written by an LLM.

awegio3y ago

How did you realize it here? This user has multiple comments in this thread but this one actually sounds more normal than the others.

I find it very uncanny to see comments like this that sound like ChatGPT but are surprisingly relevant to the discussion.

2 more replies

j / k navigate · click thread line to collapse