Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang
This hasn’t been proven in court, but it seems the most likely outcome.
https://ai.meta.com/llama/license/ https://ai.meta.com/llama/use-policy/
There are other LLMs that don't have such restrictions, and publish their training data.
Also "don't do naughty things", is there a chart for that? How is that defined, is it part of the non-existing license?
i.e. all other things being equal is a 8k model better at math than a 32k model
OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.
Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.
32k context length sounds nice of course, and it seems to be common to call the just fine-tuned models like that. I think it is more of a marketing thing and we really should distinguish between the context length of the pre-trained model and the fine-tuned model, with the latter being the default meaning of context length.
It looks like the first OSS 13B Llama 2 based 32k token context model[2], but the first OSS and commercially usable 32k token context model was a 7B Llama 2 based model[3] from Together AI, who beat them by about a week[4].
[1]: https://twitter.com/bindureddy/status/1694126931174977906
[2]: https://huggingface.co/abacusai/Giraffe-v2-13b-32k
[3]: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instr...
[4]: https://twitter.com/togethercompute/status/16925744231638470...