I recommend reading the linked paper in the repo as it gives decent examples/instructions on how to use the model. Although the size and architecture is comparable to GPT-2, the emphasis on conditional generation differentiates it.
how one can use it for personal use? In my understanding it will not fit into single GPU memory available to average person? Someone need to distill model first?
> This software should not be used to promote or profit from:
> violence, hate, and division,
> environmental destruction,
> abuse of human rights, or
> the destruction of people's physical and mental health.
> The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:
The license is in LICENSE.txt and that statement seems to unambiguously confirm that LICENSE.txt is the beginning and end of actual legal obligations.
It is not uncommon for FOSS projects to make requests without legal contract about how users use software. In this case it may simply be SalesForce trying to preliminarily distance themselves from malicious actors, with knowledge that the license would be useless if it attempted to give these rules teeth.
There's an old saying (that may well be impolitic these days): "you can't be half pregnant". It seems that's what they maintainers are shooting for... I'd urge them to get off the fence one way or the other.
Similar in principle to AGPLv3 or even WTFPL.
I’d say most social media platforms check most or all of those boxes in some form, but I can also see them claiming not to know how their platforms are being used.
They are working on it because it improves all downstream NLP tasks. See: http://ruder.io/nlp-imagenet/. BERT, Elmo and XLNet all fall under this use case.
For example if you're trying to recognize speech or translate some text, it helps a lot if you can start off producing something that is statistically grammatical even if the content is nonsense.
[1]: https://tabnine.com/
From the preprint, this seems to be doing some review before release and having a code of conduct in the GitHub repo.
Text generators need a world model and situational awareness, something like a map and a GPS signal. So we are probably two major breakthroughs away from a machine that actually understands something (or at least which seems to understand something, if you're philosophically opposed to the idea that a machine can understand something).
Salesforce created the program by first writing some relatively simple linear algebra, then fiddling with the constants until the output happened to look right. Their program contains 1.6 billion constants, which is more than any other program of its kind.
This program is also special because Salesforce has released it publicly; other organizations, like OpenAI, have previously claimed that text-generation software is too dangerous to release to the general public.
Except, that it wouldn’t work if it was purely linear.
Yeap, This one is indistinguishable from reality
But you can try it on a CPU of course. (Maybe with some modifications; see this: https://news.ycombinator.com/item?id=20977776 ; also if someone can get it working in Google Colab you get a GPU capable instance for free.)
Like, sure, I can kind of see why you wouldn't want to make the Deepfakes program public; it currently takes a lot of time, effort, and expertise to swap faces realistically in a video, and maybe we don't want to give every average Joe the ability to do that.
But pretty much everyone in the world can already pretty trivially write text. (I'm doing it right now!) And the "typical" generation output from these programs usually isn't very good—OpenAI had to try like thirty times for each of the prompts in their PR materials—so it usually ends up being less work to just write the fake news yourself instead of using the software.
My personal conspiracy theory is that all this talk of "the model is too dangerous to release" really boils down to "if we let people test out the model, they'll find it doesn't work as well as our PR team wants them to think it does".
My guess is that they will perfect the transformer and its training process, curate the dataset and make this method really easy to use. Maybe it can do translation, math, even auto-complete code. That is only by iterating more on the current formulation of the Transformer.
But it is also possible that it is surpassed by something even better. This new language model could replace the inductive bias specific to the Transformer - the ability to "attend" to any part of the input text, with something more efficient, because Transformers are quite hard and expensive to train right now. Maybe the Transformer inductive bias is too general (like a fully connected network) and needs too much data, with a slightly different idea it could be made much more efficient and probably more convincing.
But I just don't buy that there's significant danger in the public having access to a generative language model, at their current level of quality.
It doesn't seem like this team was callous -- they seem to have honestly thought about potential problems before deciding to release it.
The openAI model produced sentences along the lines of "before the first human ever walked on earth, humans did such and such". Hiring workers in a developing country to write you propaganda is cheaper than training that model.