Salesforce releases language model bigger than GPT-2 large (opens in new tab)

(github.com)

145 pointsstrin6y ago40 comments

40 comments

37 comments · 10 top-level

purple_ducks6y ago· 10 in thread

Wow, that's some license addendum:

> This software should not be used to promote or profit from:

> violence, hate, and division,

> environmental destruction,

> abuse of human rights, or

> the destruction of people's physical and mental health.

jchw6y ago

There is no license addendum. It is explicitly stated that they are asking you to respect this rule.

> The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

The license is in LICENSE.txt and that statement seems to unambiguously confirm that LICENSE.txt is the beginning and end of actual legal obligations.

It is not uncommon for FOSS projects to make requests without legal contract about how users use software. In this case it may simply be SalesForce trying to preliminarily distance themselves from malicious actors, with knowledge that the license would be useless if it attempted to give these rules teeth.

the84726y ago

Wouldn't it be better to simply require that all texts produced with this software to be marked as AI-generated? That should rule out many nefarious uses.

sbuttgereit6y ago

Forget for a moment that they don't define what they mean by any of this, what they also do is cloud the license. You called it a license addendum... but I don't believe that it is from my reading... having said that: you can't tell whether you are right or I am clearly.

There's an old saying (that may well be impolitic these days): "you can't be half pregnant". It seems that's what they maintainers are shooting for... I'd urge them to get off the fence one way or the other.

partialrecall6y ago

This is the sort of license that effectively says "If you don't really give a damn about licenses, then you're allowed to use this. If you get pedantic about licenses, then you're not."

Similar in principle to AGPLv3 or even WTFPL.

1 more reply

Wowfunhappy6y ago

What are the chances this could actually be upheld as enforcible by a court?

hugh4life6y ago

The licence is BSD-3... so completely unenforceable. They're just asking nicely and you can kindly tell them to f off.

solarkraft6y ago

So Salesforce could never use it?

bmm6o6y ago

Not sure which of those you think they would be violating? And they own it, so the license doesn't apply to them, it applies to everyone else. (Apologies if you're making a joke and I'm ruining it)

1 more reply

deft6y ago

Pretty much. My question is who would enforce such bizarre terms anyway?

save_ferris6y ago

How easy is it for companies to dodge terms like this by claiming not to know what their clients do with their products?

I’d say most social media platforms check most or all of those boxes in some form, but I can also see them claiming not to know how their platforms are being used.

minimaxir6y ago· 6 in thread

I am working on a guide (should be released tomorrow) to easily get it up and running for personal use. Here's my Twitter thread of current experiments with the model: https://twitter.com/minimaxir/status/1173081315177975810

I recommend reading the linked paper in the repo as it gives decent examples/instructions on how to use the model. Although the size and architecture is comparable to GPT-2, the emphasis on conditional generation differentiates it.

ct5206y ago

Awesome work. Following, I'm a noob when it comes to this type of stuff but always found it highly interesting.

minimaxir6y ago

Update: released. https://github.com/minimaxir/ctrl-gce

riku_iki6y ago

> running for personal use

how one can use it for personal use? In my understanding it will not fit into single GPU memory available to average person? Someone need to distill model first?

minimaxir6y ago

It currently fits into a P100, but barely.

ehsankia6y ago

Maybe using a Cloud provider?

alexcnwy6y ago

Following the Twitter thread, keep up the great work Max!

kevinwang6y ago· 4 in thread

Open AI did the right thing by not releasing their model; it's disappointing that researchers are so callous about the potential effects of their research in the name of progress.

csande176y ago

I've never really gotten why AI types are so concerned about text-generation models.

Like, sure, I can kind of see why you wouldn't want to make the Deepfakes program public; it currently takes a lot of time, effort, and expertise to swap faces realistically in a video, and maybe we don't want to give every average Joe the ability to do that.

But pretty much everyone in the world can already pretty trivially write text. (I'm doing it right now!) And the "typical" generation output from these programs usually isn't very good—OpenAI had to try like thirty times for each of the prompts in their PR materials—so it usually ends up being less work to just write the fake news yourself instead of using the software.

My personal conspiracy theory is that all this talk of "the model is too dangerous to release" really boils down to "if we let people test out the model, they'll find it doesn't work as well as our PR team wants them to think it does".

visarga6y ago

I dunno, this time the text looks really good. I got as far as 5 or 6 phrases deep before it said anything silly. I would have been fooled if I red it in real life.

My guess is that they will perfect the transformer and its training process, curate the dataset and make this method really easy to use. Maybe it can do translation, math, even auto-complete code. That is only by iterating more on the current formulation of the Transformer.

But it is also possible that it is surpassed by something even better. This new language model could replace the inductive bias specific to the Transformer - the ability to "attend" to any part of the input text, with something more efficient, because Transformers are quite hard and expensive to train right now. Maybe the Transformer inductive bias is too general (like a fully connected network) and needs too much data, with a slightly different idea it could be made much more efficient and probably more convincing.

currymj6y ago

I sort of agree with the principle -- for instance, releasing easy-to-use pretrained SOTA models for "deepfakes" type tasks can cause real harm. There are existing repositories out there which are already morally dubious in my view -- if researchers from a large organization made a big advance and released the code, that would seem very irresponsible.

But I just don't buy that there's significant danger in the public having access to a generative language model, at their current level of quality.

It doesn't seem like this team was callous -- they seem to have honestly thought about potential problems before deciding to release it.

Barrin926y ago

the reason openai didn't release their model was just marketing and using the AI hypetrain. None of these text generating models is ever going to fool anyone any more than a markov chain generator, as they have no grasp on the actual meaning of the text they're generating.

The openAI model produced sentences along the lines of "before the first human ever walked on earth, humans did such and such". Hiring workers in a developing country to write you propaganda is cheaper than training that model.

rdiddly6y ago· 3 in thread

Anyone have a real-world use case for something like this? I must admit I'm having trouble thinking of any that aren't essentially deceptive. Because in my little biased world, I have no need of "text" per se, and what value any text has to me is closely linked to the fact that it came from a human.

zawerf6y ago

Machine learning researchers aren't working on language modeling because they want to enable fake news.

They are working on it because it improves all downstream NLP tasks. See: http://ruder.io/nlp-imagenet/. BERT, Elmo and XLNet all fall under this use case.

For example if you're trying to recognize speech or translate some text, it helps a lot if you can start off producing something that is statistically grammatical even if the content is nonsense.

csande176y ago

One use case I've seen is autocomplete -- people have used GPT-2 to build things like TabNine[1] and Write With Transformer[2].

[1]: https://tabnine.com/

[2]: https://transformer.huggingface.co/

taf26y ago

I imagine summarization being a usecase

foundart6y ago· 2 in thread

Could someone provide a high level summary of what this is for a technical person not conversant with the field?

csande176y ago

Salesforce has created a computer program where you put in a small prompt, like "Wikipedia page about badgers" or "News article starting with the line, 'Donald Trump was impeached today'", or "French translation of 'I like pears'", and it tries to predict what the text will be. You can also run the program in reverse, where you put in a snippet of text and it predicts whether it came from Wikipedia or a mystery novel or the fitness subreddit.

Salesforce created the program by first writing some relatively simple linear algebra, then fiddling with the constants until the output happened to look right. Their program contains 1.6 billion constants, which is more than any other program of its kind.

This program is also special because Salesforce has released it publicly; other organizations, like OpenAI, have previously claimed that text-generation software is too dangerous to release to the general public.

lixtra6y ago

> writing some relatively simple linear algebra

Except, that it wouldn’t work if it was purely linear.

1 more reply

visarga6y ago· 1 in thread

It was trained on 140GB of text on 256 TPUs for 2 weeks, the model being made of 48 transformer layers. I'm wondering when we will see a model trained on 1TB or 10TB of text.

p1esk6y ago

I doubt training a scaled up transformer on 10TB of text will lead to significant improvements (btw, 10TB is about the size of all books in English in the Library of Congress). Image classifiers don't get a lot better when trained on a lot more data than ImageNet. 140GB is probably enough to train a general model, which could be finetuned on extra data for specific tasks.

Text generators need a world model and situational awareness, something like a map and a GPS signal. So we are probably two major breakthroughs away from a machine that actually understands something (or at least which seems to understand something, if you're philosophically opposed to the idea that a machine can understand something).

dan_mctree6y ago· 1 in thread

Are there any hardware reqs to work with this?

pas6y ago

In theory, no. But for any decent performance, you need a big CUDA capable GPU, as far as I know.

But you can try it on a CPU of course. (Maybe with some modifications; see this: https://news.ycombinator.com/item?id=20977776 ; also if someone can get it working in Google Colab you get a GPU capable instance for free.)

skybrian6y ago

From the blog post: "Beyond the technical work to develop this model, we’ve also taken several steps to anticipate and mitigate malicious use cases where possible."

From the preprint, this seems to be doing some review before release and having a code of conduct in the GitHub repo.

novalis786y ago

The unicorn prompt is the new text generator lorem ipsum

buboard6y ago

> Advertisement

Yeap, This one is indistinguishable from reality

j / k navigate · click thread line to collapse

40 comments

37 comments · 10 top-level

purple_ducks6y ago· 10 in thread

Wow, that's some license addendum:

> This software should not be used to promote or profit from:

> violence, hate, and division,

> environmental destruction,

> abuse of human rights, or

> the destruction of people's physical and mental health.

jchw6y ago

There is no license addendum. It is explicitly stated that they are asking you to respect this rule.

> The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

The license is in LICENSE.txt and that statement seems to unambiguously confirm that LICENSE.txt is the beginning and end of actual legal obligations.

the84726y ago

Wouldn't it be better to simply require that all texts produced with this software to be marked as AI-generated? That should rule out many nefarious uses.

sbuttgereit6y ago

partialrecall6y ago

This is the sort of license that effectively says "If you don't really give a damn about licenses, then you're allowed to use this. If you get pedantic about licenses, then you're not."

Similar in principle to AGPLv3 or even WTFPL.

1 more reply

Wowfunhappy6y ago

What are the chances this could actually be upheld as enforcible by a court?

hugh4life6y ago

The licence is BSD-3... so completely unenforceable. They're just asking nicely and you can kindly tell them to f off.

solarkraft6y ago

So Salesforce could never use it?

bmm6o6y ago

Not sure which of those you think they would be violating? And they own it, so the license doesn't apply to them, it applies to everyone else. (Apologies if you're making a joke and I'm ruining it)

1 more reply

deft6y ago

Pretty much. My question is who would enforce such bizarre terms anyway?

save_ferris6y ago

How easy is it for companies to dodge terms like this by claiming not to know what their clients do with their products?

I’d say most social media platforms check most or all of those boxes in some form, but I can also see them claiming not to know how their platforms are being used.

minimaxir6y ago· 6 in thread

ct5206y ago

Awesome work. Following, I'm a noob when it comes to this type of stuff but always found it highly interesting.

minimaxir6y ago

Update: released. https://github.com/minimaxir/ctrl-gce

riku_iki6y ago

> running for personal use

how one can use it for personal use? In my understanding it will not fit into single GPU memory available to average person? Someone need to distill model first?

minimaxir6y ago

It currently fits into a P100, but barely.

ehsankia6y ago

Maybe using a Cloud provider?

alexcnwy6y ago

Following the Twitter thread, keep up the great work Max!

kevinwang6y ago· 4 in thread

Open AI did the right thing by not releasing their model; it's disappointing that researchers are so callous about the potential effects of their research in the name of progress.

csande176y ago

I've never really gotten why AI types are so concerned about text-generation models.

visarga6y ago

I dunno, this time the text looks really good. I got as far as 5 or 6 phrases deep before it said anything silly. I would have been fooled if I red it in real life.

currymj6y ago

But I just don't buy that there's significant danger in the public having access to a generative language model, at their current level of quality.

It doesn't seem like this team was callous -- they seem to have honestly thought about potential problems before deciding to release it.

Barrin926y ago

rdiddly6y ago· 3 in thread

zawerf6y ago

Machine learning researchers aren't working on language modeling because they want to enable fake news.

They are working on it because it improves all downstream NLP tasks. See: http://ruder.io/nlp-imagenet/. BERT, Elmo and XLNet all fall under this use case.

For example if you're trying to recognize speech or translate some text, it helps a lot if you can start off producing something that is statistically grammatical even if the content is nonsense.

csande176y ago

One use case I've seen is autocomplete -- people have used GPT-2 to build things like TabNine[1] and Write With Transformer[2].

[1]: https://tabnine.com/

[2]: https://transformer.huggingface.co/

taf26y ago

I imagine summarization being a usecase

foundart6y ago· 2 in thread

Could someone provide a high level summary of what this is for a technical person not conversant with the field?

csande176y ago

lixtra6y ago

> writing some relatively simple linear algebra

Except, that it wouldn’t work if it was purely linear.

1 more reply

visarga6y ago· 1 in thread

It was trained on 140GB of text on 256 TPUs for 2 weeks, the model being made of 48 transformer layers. I'm wondering when we will see a model trained on 1TB or 10TB of text.

p1esk6y ago

dan_mctree6y ago· 1 in thread

Are there any hardware reqs to work with this?

pas6y ago

In theory, no. But for any decent performance, you need a big CUDA capable GPU, as far as I know.

skybrian6y ago

From the blog post: "Beyond the technical work to develop this model, we’ve also taken several steps to anticipate and mitigate malicious use cases where possible."

From the preprint, this seems to be doing some review before release and having a code of conduct in the GitHub repo.

novalis786y ago

The unicorn prompt is the new text generator lorem ipsum

buboard6y ago

> Advertisement

Yeap, This one is indistinguishable from reality

j / k navigate · click thread line to collapse