Ask HN: Open source LLM for commercial use?

184 pointsLewisDavidson3y ago61 comments

Working on a ML project and looking for an open source LLM that can be used in a commercial environment. As far as I'm aware, products cannot be built on LLAMA.

I don't want to use GPT since the project will be using personal information to train/fine tune the models.

184 pointsLewisDavidson3y ago61 comments

Working on a ML project and looking for an open source LLM that can be used in a commercial environment. As far as I'm aware, products cannot be built on LLAMA.

I don't want to use GPT since the project will be using personal information to train/fine tune the models.

61 comments

46 comments · 20 top-level

icapybara3y ago· 5 in thread

Others have answered your question, but I'll add that the market for high quality AI models is not similar to the software marketplace, where there is always an open source alternative (and where open source is often the state of the art).

LLMs take so much engineering effort, research, and compute that it's unlikely there will be good open source alternatives in the near future. Right now your only real option is OpenAI (or maybe Anthropic) and that seems unlikely to change anytime soon.

The only reason we have LLAMA is because Meta threw us a bone. They might not do that again.

lhl3y ago

> Right now your only real option is OpenAI (or maybe Anthropic)

> The only reason we have LLAMA is because Meta threw us a bone

IMO, this is pretty inaccurate, you can look at my other post in the thread to see how many other recent and ongoing projects there are. The training data sets (The Pile, The Stack, LAION, etc) are publicly available and have been shown to be able to train very high quality models (and some groups committed to open models like Stability AI and Hugging Face are fairly well capitalized).

Training and fine-tuning costs are both getting better and costs are droping ridiculously fast (fine tunes went from spending thousands, to hundreds, and now to about $10 in the span of weeks). There are new optimizations and techniques being published every day (almost all of it reproducible, most w/ a code repos).

For new foundational models, Cerebras and others now will happily do built-to-order ones for a flat fee, but I suspect all kinds of well-funded EDUs, research labs, corporations, maybe even nation states will continue to train/release new cutting edge models w/ permissive licenses.

rjzzleep3y ago

> LLMs take so much engineering effort, research, and compute that it's unlikely there will be good open source alternatives in the near future. Right now your only real option is OpenAI (or maybe Anthropic) and that seems unlikely to change anytime soon.

does it though? it looks more like it requires a lot of money for compute and a lot of money and data for parameter tuning, but engineering effort seems soso.

except for the compute cost this is perfect application for a distributed open source labeling effort.

Just for my understanding though, are the data sets full of copyrighted material?

muyuu3y ago

Some are, some aren't. See Koala for instance. The problem with Koala is that it fine-tunes on open sourced data, but makes no claims about the data for the base LLaMA models. https://bair.berkeley.edu/blog/2023/04/03/koala/

The irony is that openAI and Meta themselves might be in flaky ground for having trained models on other people data with dubious rights to do so in many instances, and then using it to produce output commercially.

But this is a new frontier and enforcement might be effectively not possible unless new legislation requires reproducibility and audits on the data sets or something like that.

But without that, how do you know exactly how did they arrive at a given set of weights with Montecarlo algorithms and arbitrary fine tuning? You basically don't know what was there and you cannot prove they didn't achieve those results with perfectly clean data.

PS: https://medium.com/geekculture/list-of-open-sourced-fine-tun...

1 more reply

mejutoco3y ago

> LLMs take so much engineering effort, research, and compute that it's unlikely there will be good open source alternatives in the near future.

One could use chatgpt / gpt4 to create better training material for those models, even if not allowed. In that sense there is an advantage to being second here.

kkielhofner3y ago

I try not to predict the future but similar things were said about Open Source in the 90s. Then IBM threw their weight behind it (they were still pretty relevant), RedHat was and is a success, etc. I remember when the scales completely tipped on the Linux kernel and the top X contributors were from Intel, etc as opposed to individual hobbyist devs. Nvidia is an obvious one here - they already do a ton of large model/research work because good models sell a lot of hardware. I would not be surprised at all if in they're already working internally on this (they're due for a new large model/arch release anyway).

I can see a not-too-distant future where initial "base" models (like LLaMA) are released by such entities that do have the resources as they are seen as foundational enablers of the ecosystem (roughly equivalent to the Linux kernel or possibly Torch/Tensorflow/Transformers) where the "real" (differentiating) value from a commercial standpoint is something like 5-10 layers up the stack. The tremendous amount of value afforded by something like a Linux distribution isn't in the kernel, some random library, nginx, docker, etc. When you look hardware up almost everything you see on HN is 90-99% the same code, frameworks, toolkits, etc.

Then, a wide diaspora of commercial, academic, etc interests and other collaborators scratch their own itches and push the needle forward. Some release to the public, some don't but at a certain scale the combined effort easily exceeds the resources available to even a large, well funded entity like OpenAI. I've talked about it before but the last study I could find from 2008 analyzed Fedora 9 and estimated it represented something like $10b in combined dev cost.

There are also such rapid advancements in finetuning models in limited VRAM environments, quantization, applying them to specific use-cases, tooling, etc that the barrier of entry to iterate, build on, and actually use something like LLaMA is no longer 100 A100s (or whatever) and a dedicated large team. If you run apt-get install $SOMETHINGBIG and it grabs dozens of dependencies you're never heard of it starts to drive this point home.

I'm working on a project to be announced/released soon that in the end is something like > 100 python dependencies and other misc enabling packages, frameworks, tools, etc that it ends up being a 12GB docker image. Our "magic", meanwhile, is something like 1k LoC.

The biggest hole in this position is that it could be viewed releasing a model and weights is the equivalent of releasing your application and data itself but back to your original point I don't see the entire world bifurcating into multi-billion dollar startups and "everyone else".

Or maybe I'm just being optimistic :).

maxilevi3y ago· 4 in thread

You could use GPT-J (https://huggingface.co/EleutherAI/gpt-j-6b)

dceddia3y ago

I just ran across a mention of gpt4all in another thread, and it looks like the team is working on training up GPT-J as an open alternative to the Llama-based model: https://github.com/nomic-ai/gpt4all#short-term

danpalmer3y ago

Just don't let it convince you to "reduce your carbon footprint" like the last guy did.

tough3y ago

Wait is this a reference to the belgian case of someone offing themselves?

Was a bit weird they mentioned eliza/gpt-j i think on it but didnt make much sense to me?

did that happen or just hallucinated?

1 more reply

bufo3y ago

No, there are waaaay better models available these days like Flan (UL2 and the older T5), or Cerebras-GPT.

Garcia983y ago· 3 in thread

I've seen this question asked repeatedly in many LLaMa threads, currently the best models that are truly open are the released models from the Flan family by Google, which includes Flan-T5[0] and Flan-UL2[1]. According to its paper, Flan-UL2 performs slightly better than Flan-T5-XXL.

These models perform slightly better than GPT-3 under some tasks[2], but they're still far from achieving the results from GPT-3.5 and GPT-4. This becomes evident when you try to use them in the real world; they're not "good enough" for general use cases, unlike ChatGPT models. However, if you can restrict your use case to one particular domain, you can achieve pretty good results by further fine-tuning these models.

[0]: https://huggingface.co/google/flan-t5-xxl

[1]: https://huggingface.co/google/flan-ul2

[2]: https://paperswithcode.com/sota/multi-task-language-understa...

momofuku3y ago

For the life of me, I cannot understand why Google did not go ahead and commercialize a lot of this early research. They clearly had a HUGE lead in this space, in terms of engineering/research talent, capital, computer infrastructure. Boggling...

I'd love any alternative view points of this.

sdrinf3y ago

Bringing disruption to your company's 90% revenue generator product without proving the alternative's financial model is not a career-enhancing move.

Can't prove the alternative's financial model without showing the thing to real users. Can't know in advance, if the new financial model will be pennies on adwords' dollars.

4 more replies

mattbrewsbytes3y ago

A company's core competency can act as blinders to seeing the potential in other products. Not literal blinders to every person that works there. There are likely many people that see the potential ... but to communicate that properly up the leadership chain to get funding and people, during times of layoffs, would require an untold amount of momentum. My evidence: Xerox and PARC and all the tech that came from there: UI, ethernet, concepts like tablet computers, etc. The only thing Xerox leadership commercialized was in their wheelhouse: laser printing. Look at all of the dead Google products - they have a hard time commercializing many of the things that come up through their organization.

Here's the ironic thing with Google and stuff like ChatGPT: asking ChatGPT something is like getting paragraphs of text similar to if someone read and summarized top X results from a search engine, without ads. If someone built a browser plugin to replace your URL bar/search engine with ChatGPT (and had it output links to references) how disruptive would that be to their cash cow?

lhl3y ago· 3 in thread

The ones I saw mentioned so far were Flan, Cerebras, GPT-J, and RWKV.

Not yet mentioned:

* Pythia https://github.com/EleutherAI/pythia

* GLM-130B https://github.com/THUDM/GLM-130B - see also ChatGLM-6B https://github.com/THUDM/ChatGLM-6B

* GPT-NeoX-20B https://huggingface.co/EleutherAI/gpt-neox-20b

* GeoV-9B https://github.com/geov-ai/geov

* BLOOM https://huggingface.co/bigscience/bloom and BLOOMZ https://huggingface.co/bigscience/bloomz

sturza3y ago

I tried GLM-130B on huggingface https://huggingface.co/spaces/THUDM/GLM-130B:

Q: hello, who are you? A: I was twitted. In fact, twitted, or twittered, is one of those tweets that one either never sees or sees right away. Twitter is a website that allows users to post short messages that can be read and retweeted by other users. These messages are called tweets. So, who are you? Oh, hello, who are you? I was twitted. In fact, twitted, or twittered, is one of those tweets that one either never sees or sees right away. Twitter is a website that allows users to post short messages that can be read and retweeted by other

rasengan3y ago

At first glance/test on their page, it can provide a significantly better completion if given a prompt of a series of Q and As and by modifying Temp/etc.

tslmy3y ago

Caveat: For ChatGLM-6B, you can't use the pre-trained model for commercial uses:

> You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any commercial, military, or illegal purposes.

dtagames3y ago· 3 in thread

I think you might be confusing the GPT software (a generative pre trained transformer) with the finished product, an LLM (large language model.)

A GPT has no training until you give it materials. I do believe Google released the code for theirs ages ago. Even without source, you can run a GPT against your own data locally, or on a cloud service setup for that purpose.

This is how Bloomberg, for example, created a financial LLM. They used a GPT to train on their own financial data.

moneywoes3y ago

Any examples of doing that process cost effectively?

dtagames3y ago

For many projects, you'll need "natural language" training on regular text documents in order to be able to process even your prompts. So the most effective products will combine someone else's LLM (with their training data already in it) plus your custom training data. That way, you can interact with the LLM using normal English sentences but also get back information from your own dataset. Without this regular language training, your LLM wouldn't understand the questions you ask it.

So there are two cost factors... the cost of paying someone else to train and host the regular LLM part + yours, or the cost of setting up the (virtual) hardware and compute time to train and host those things on your own.

One "middle road" that might for some applications is to use the OpenAI API (for example) to combine access to your own data in real time (via your private APIs) with the natural language understanding that's already present in the LLM. These are the plug-ins that are quickly taking over HN, many without any great utility on their own. But you can see that a pre-trained LLM plus access to your own data privately might very well be worth paying for.

tough3y ago

Not what you're asking but Vicuna did cost merely 300$ to fine-tune on top of LLaMA https://www.marktechpost.com/2023/04/02/meet-vicuna-an-open-...

AFAIK full model training should be a couple order magnitudes higher probably?

mingyeow3y ago· 3 in thread

Noob question here - what’s the best tutorials to get started in mixing LLM models and building on top of one another, assuming very good programming background but little AI background? I asked chatGPT this question, and it was helpful but not comprehensive, but I figure intelligent humans on this forum will give the best answers.

extasia3y ago

My answer would be quite specific to what exactly you're trying to achieve.

Id be wary of just hacking away without understanding at least the fundamentals of ML + NLP or you'll find yourself lost pretty quick.

I'm a former SWE turned NLP researcher, so i was recently in your position:)

gmreads3y ago

Curious why and how you did the transition? Rapid progress in this space I don't think SWE is a viable career for next 20 years.

1 more reply

RockyMcNuts3y ago

not sure what 'mixing LLM models' entails but these are maybe some good starting points

- karpathy - https://www.youtube.com/watch?v=kCc8FmEb1nY

- https://towardsdatascience.com/beautifully-illustrated-nlp-m...

- https://dzone.com/articles/a-deep-dive-into-the-transformer-...

- https://peterbloem.nl/blog/transformers

- http://nlp.seas.harvard.edu/2018/04/03/attention.html

- https://lilianweng.github.io/posts/2023-01-27-the-transforme...

- https://blog.quickchat.ai/post/tokens-entropy-question/

- https://dugas.ch/artificial_curiosity/GPT_architecture.html

- https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

- https://d4mucfpksywv.cloudfront.net/better-language-models/l...

- https://arxiv.org/pdf/2005.14165.pdf

- https://arxiv.org/pdf/2303.08774.pdf

- https://arxiv.org/pdf/2303.17564.pdf

dmurko3y ago· 2 in thread

Just in case you were not aware: "OpenAI does not use data submitted by customers via our API to train OpenAI models or improve OpenAI’s service offering." It does for ChatGPT though.

Source: https://help.openai.com/en/articles/5722486-how-your-data-is...

icapybara3y ago

For many companies this type of promise is not useful. It doesn’t matter that they say they won’t, they still can look if they want to. This is the primary concern when you’re dealing with trade secrets where the secrecy of the information is its only protection.

pantulis3y ago

If you use Azure OpenAI's services, I would guess you would fall into contractual agreements with Microsoft which should cover these concerns just like when you are using MS SQL Server to store trade secrets or PII.

wejick3y ago· 2 in thread

I remember someone mentioned on other thread that after distilled, llama will have no license issue. can someone explain why is that the case?

Probably can give directions where a software engineer can start to understand the concept.

MacsHeadroom3y ago

Because machine leaning models likely have zero intellectual property rights protections. LLMs are the output of an "algorithmic process". Algorithmic outputs are explicitly except form copyright, unlike source code. (Note: Compiled software is not an algorithmic output, under the specific legal definition.)

Machine Learning models are made the same way machine learning output is generated.

In other words, the old model is training data to the new model. Just like the pirated torrent site dataset "Books3" Facebook used to train LLaMA is training data.

If Facebook can protect their model under copyright then every publisher in existence sue Facebook into the ground. They can't have it both ways.

nextaccountic3y ago

> In other words, the old model is training data to the new model. Just like the pirated torrent site dataset "Books3" Facebook used to train LLaMA is training data.

This is a logical conclusion. But if it actually holds, that's for the courts to decide

sinenomine3y ago· 1 in thread

If you want quality, use Google's Apache-licensed LLM https://huggingface.co/google/ul2

vinni23y ago

They also have Flan T5 which is also Apache 2.

https://huggingface.co/google/flan-t5-xxl

cl423y ago

Dolly 2 was released today and is OK for commercial use: https://huggingface.co/databricks/dolly-v2-12b

I'm working on a package to help evaluate LLM results across different LLMs (e.g., GPT3.5 vs. GPT4 vs. Dolly 2 vs...); if you are looking to run experiments to compare results, I'd love to help you out. You can email me at w (at) phaseai (dot) com.

titaniumtown3y ago

Cerebras-GPT is licensed under Apache-2.0 and permits commercial use

https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-...

gumby3y ago

> looking for an open source LLM that can be used in a commercial environment. As far as I'm aware, products cannot be built on LLAMA.

Commercial product sure can be built on top of LLAMA, it's GPL-3. Your models are your own; just patches, modifications, and code you link to LLMA itself will be governed by the GPL as well.

This is almost certainly what you want since this way you can use patches, fixes, and improvements others make to LLMA. You won't have to do all that work yourself, or necessarily wait for Facebook.

erwincoumans3y ago

Truly Open AI: LAION calls for a supercomputer to develop open-source AI, by replicating large models like GPT-4 and exploring them together as a research community.

https://www.heise.de/news/Open-source-AI-LAION-proposes-to-o...

skdotdan3y ago

https://huggingface.co/google/flan-ul2

https://huggingface.co/docs/transformers/model_doc/gpt_neox

K0IN3y ago

I think https://github.com/BlinkDL/RWKV-LM could be used, but not all versions (namely instruction fine-tuned models trained on alpaca data)

dreaminvm3y ago

Here's a recent release of fine-tuning Flan-UL2 on instructions (alpaca). https://medium.com/vmware-data-ml-blog/lora-finetunning-of-u...

brentis3y ago

My personal use case is that I'd like to query a bunch of our APIs and amalgamate a response those consumable for humans.

I think many of us have the same need and are waiting for open AI plug-in access.

Is this the question we are asking yourselves here or are we talking about licensing?

rolisz3y ago

What exactly do you want to do? There are various alternatives, but they are not as general as OpenAI's GPT, but, they can be finetuned more cheaply to solve a specific task.

zweezzy3y ago

BERT: https://huggingface.co/bert-base-uncased

redskyluan3y ago

what about the https://huggingface.co/facebook/opt-66b?

I thought the opt series can be used in production

j / k navigate · click thread line to collapse

61 comments

46 comments · 20 top-level

icapybara3y ago· 5 in thread

The only reason we have LLAMA is because Meta threw us a bone. They might not do that again.

lhl3y ago

> Right now your only real option is OpenAI (or maybe Anthropic)

> The only reason we have LLAMA is because Meta threw us a bone

rjzzleep3y ago

does it though? it looks more like it requires a lot of money for compute and a lot of money and data for parameter tuning, but engineering effort seems soso.

except for the compute cost this is perfect application for a distributed open source labeling effort.

Just for my understanding though, are the data sets full of copyrighted material?

muyuu3y ago

But this is a new frontier and enforcement might be effectively not possible unless new legislation requires reproducibility and audits on the data sets or something like that.

PS: https://medium.com/geekculture/list-of-open-sourced-fine-tun...

1 more reply

mejutoco3y ago

> LLMs take so much engineering effort, research, and compute that it's unlikely there will be good open source alternatives in the near future.

One could use chatgpt / gpt4 to create better training material for those models, even if not allowed. In that sense there is an advantage to being second here.

kkielhofner3y ago

Or maybe I'm just being optimistic :).

maxilevi3y ago· 4 in thread

You could use GPT-J (https://huggingface.co/EleutherAI/gpt-j-6b)

dceddia3y ago

danpalmer3y ago

Just don't let it convince you to "reduce your carbon footprint" like the last guy did.

tough3y ago

Wait is this a reference to the belgian case of someone offing themselves?

Was a bit weird they mentioned eliza/gpt-j i think on it but didnt make much sense to me?

did that happen or just hallucinated?

1 more reply

bufo3y ago

No, there are waaaay better models available these days like Flan (UL2 and the older T5), or Cerebras-GPT.

Garcia983y ago· 3 in thread

[0]: https://huggingface.co/google/flan-t5-xxl

[1]: https://huggingface.co/google/flan-ul2

[2]: https://paperswithcode.com/sota/multi-task-language-understa...

momofuku3y ago

I'd love any alternative view points of this.

sdrinf3y ago

Bringing disruption to your company's 90% revenue generator product without proving the alternative's financial model is not a career-enhancing move.

Can't prove the alternative's financial model without showing the thing to real users. Can't know in advance, if the new financial model will be pennies on adwords' dollars.

4 more replies

mattbrewsbytes3y ago

lhl3y ago· 3 in thread

The ones I saw mentioned so far were Flan, Cerebras, GPT-J, and RWKV.

Not yet mentioned:

* Pythia https://github.com/EleutherAI/pythia

* GLM-130B https://github.com/THUDM/GLM-130B - see also ChatGLM-6B https://github.com/THUDM/ChatGLM-6B

* GPT-NeoX-20B https://huggingface.co/EleutherAI/gpt-neox-20b

* GeoV-9B https://github.com/geov-ai/geov

* BLOOM https://huggingface.co/bigscience/bloom and BLOOMZ https://huggingface.co/bigscience/bloomz

sturza3y ago

I tried GLM-130B on huggingface https://huggingface.co/spaces/THUDM/GLM-130B:

rasengan3y ago

At first glance/test on their page, it can provide a significantly better completion if given a prompt of a series of Q and As and by modifying Temp/etc.

tslmy3y ago

Caveat: For ChatGLM-6B, you can't use the pre-trained model for commercial uses:

> You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any commercial, military, or illegal purposes.

dtagames3y ago· 3 in thread

I think you might be confusing the GPT software (a generative pre trained transformer) with the finished product, an LLM (large language model.)

This is how Bloomberg, for example, created a financial LLM. They used a GPT to train on their own financial data.

moneywoes3y ago

Any examples of doing that process cost effectively?

dtagames3y ago

tough3y ago

Not what you're asking but Vicuna did cost merely 300$ to fine-tune on top of LLaMA https://www.marktechpost.com/2023/04/02/meet-vicuna-an-open-...

AFAIK full model training should be a couple order magnitudes higher probably?

mingyeow3y ago· 3 in thread

extasia3y ago

My answer would be quite specific to what exactly you're trying to achieve.

Id be wary of just hacking away without understanding at least the fundamentals of ML + NLP or you'll find yourself lost pretty quick.

I'm a former SWE turned NLP researcher, so i was recently in your position:)

gmreads3y ago

Curious why and how you did the transition? Rapid progress in this space I don't think SWE is a viable career for next 20 years.

1 more reply

RockyMcNuts3y ago

not sure what 'mixing LLM models' entails but these are maybe some good starting points

- karpathy - https://www.youtube.com/watch?v=kCc8FmEb1nY

- https://towardsdatascience.com/beautifully-illustrated-nlp-m...

- https://dzone.com/articles/a-deep-dive-into-the-transformer-...

- https://peterbloem.nl/blog/transformers

- http://nlp.seas.harvard.edu/2018/04/03/attention.html

- https://lilianweng.github.io/posts/2023-01-27-the-transforme...

- https://blog.quickchat.ai/post/tokens-entropy-question/

- https://dugas.ch/artificial_curiosity/GPT_architecture.html

- https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

- https://d4mucfpksywv.cloudfront.net/better-language-models/l...

- https://arxiv.org/pdf/2005.14165.pdf

- https://arxiv.org/pdf/2303.08774.pdf

- https://arxiv.org/pdf/2303.17564.pdf

dmurko3y ago· 2 in thread

Just in case you were not aware: "OpenAI does not use data submitted by customers via our API to train OpenAI models or improve OpenAI’s service offering." It does for ChatGPT though.

Source: https://help.openai.com/en/articles/5722486-how-your-data-is...

icapybara3y ago

pantulis3y ago

wejick3y ago· 2 in thread

I remember someone mentioned on other thread that after distilled, llama will have no license issue. can someone explain why is that the case?

Probably can give directions where a software engineer can start to understand the concept.

MacsHeadroom3y ago

Machine Learning models are made the same way machine learning output is generated.

In other words, the old model is training data to the new model. Just like the pirated torrent site dataset "Books3" Facebook used to train LLaMA is training data.

If Facebook can protect their model under copyright then every publisher in existence sue Facebook into the ground. They can't have it both ways.

nextaccountic3y ago

> In other words, the old model is training data to the new model. Just like the pirated torrent site dataset "Books3" Facebook used to train LLaMA is training data.

This is a logical conclusion. But if it actually holds, that's for the courts to decide

sinenomine3y ago· 1 in thread

If you want quality, use Google's Apache-licensed LLM https://huggingface.co/google/ul2

vinni23y ago

They also have Flan T5 which is also Apache 2.

https://huggingface.co/google/flan-t5-xxl

cl423y ago

Dolly 2 was released today and is OK for commercial use: https://huggingface.co/databricks/dolly-v2-12b

titaniumtown3y ago

Cerebras-GPT is licensed under Apache-2.0 and permits commercial use

https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-...

gumby3y ago

> looking for an open source LLM that can be used in a commercial environment. As far as I'm aware, products cannot be built on LLAMA.

Commercial product sure can be built on top of LLAMA, it's GPL-3. Your models are your own; just patches, modifications, and code you link to LLMA itself will be governed by the GPL as well.

This is almost certainly what you want since this way you can use patches, fixes, and improvements others make to LLMA. You won't have to do all that work yourself, or necessarily wait for Facebook.

erwincoumans3y ago

Truly Open AI: LAION calls for a supercomputer to develop open-source AI, by replicating large models like GPT-4 and exploring them together as a research community.

https://www.heise.de/news/Open-source-AI-LAION-proposes-to-o...

skdotdan3y ago

https://huggingface.co/google/flan-ul2

https://huggingface.co/docs/transformers/model_doc/gpt_neox

K0IN3y ago

I think https://github.com/BlinkDL/RWKV-LM could be used, but not all versions (namely instruction fine-tuned models trained on alpaca data)

dreaminvm3y ago

Here's a recent release of fine-tuning Flan-UL2 on instructions (alpaca). https://medium.com/vmware-data-ml-blog/lora-finetunning-of-u...

brentis3y ago

My personal use case is that I'd like to query a bunch of our APIs and amalgamate a response those consumable for humans.

I think many of us have the same need and are waiting for open AI plug-in access.

Is this the question we are asking yourselves here or are we talking about licensing?

rolisz3y ago

What exactly do you want to do? There are various alternatives, but they are not as general as OpenAI's GPT, but, they can be finetuned more cheaply to solve a specific task.

zweezzy3y ago

BERT: https://huggingface.co/bert-base-uncased

redskyluan3y ago

what about the https://huggingface.co/facebook/opt-66b?

I thought the opt series can be used in production

j / k navigate · click thread line to collapse