It looks like the first OSS 13B Llama 2 based 32k token context model[2], but the first OSS and commercially usable 32k token context model was a 7B Llama 2 based model[3] from Together AI, who beat them by about a week[4].
[1]: https://twitter.com/bindureddy/status/1694126931174977906
[2]: https://huggingface.co/abacusai/Giraffe-v2-13b-32k
[3]: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instr...
[4]: https://twitter.com/togethercompute/status/16925744231638470...
32k context length sounds nice of course, and it seems to be common to call the just fine-tuned models like that. I think it is more of a marketing thing and we really should distinguish between the context length of the pre-trained model and the fine-tuned model, with the latter being the default meaning of context length.
Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang
This hasn’t been proven in court, but it seems the most likely outcome.
https://ai.meta.com/llama/license/ https://ai.meta.com/llama/use-policy/
There are other LLMs that don't have such restrictions, and publish their training data.
Also "don't do naughty things", is there a chart for that? How is that defined, is it part of the non-existing license?
i.e. all other things being equal is a 8k model better at math than a 32k model
OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.
Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.