story

Open source LLM with 32k Context Length (opens in new tab)

blog.abacus.ai

115 pointsshubham_saboo2y ago28 comments

28 comments

Abacus always seemed to me like a 'we got a lot of VC money with inflated claims now we gotta show we do everything' company. I don't really understand what they do, they seem to offer everything but I don't see anyone talking about using their offerings in the real-world. Ever. The only time I see mentions of the company are when I am targeted with ads or promoted posts of the founder.

woadwarrior012y ago

Their CEO made a post[1] on Twitter claiming to have invented "the world's first commercially usable 32K long-context open-source LLM", which IMO is pure hyperbole.

It looks like the first OSS 13B Llama 2 based 32k token context model[2], but the first OSS and commercially usable 32k token context model was a 7B Llama 2 based model[3] from Together AI, who beat them by about a week[4].

[1]: https://twitter.com/bindureddy/status/1694126931174977906

[2]: https://huggingface.co/abacusai/Giraffe-v2-13b-32k

[3]: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instr...

[4]: https://twitter.com/togethercompute/status/16925744231638470...

weinzierl2y ago

This is just another fine-tuned LLaMA and Llama 2, like there are already some. I doubt that this will give seriously meaningful results for long context inference.

32k context length sounds nice of course, and it seems to be common to call the just fine-tuned models like that. I think it is more of a marketing thing and we really should distinguish between the context length of the pre-trained model and the fine-tuned model, with the latter being the default meaning of context length.

kordlessagain2y ago

These 800 watt speakers are great. So loud.

smcleod2y ago

“Better than lossless”

supermatt2y ago

It seems this is built on LLAMA. Did meta change the license to make it open source now? It still seems to be showing otherwise in the repo.

Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang

sillysaurusx2y ago

It’s not possible to have a license over an ML model trained on other peoples’ works, since such models are uncopyrightable. They’re more like a phone book; a collection of facts trained by an entirely un-creative process. https://news.ycombinator.com/item?id=36691050

This hasn’t been proven in court, but it seems the most likely outcome.

nemoniac2y ago

Not saying that this applies to LLMs but if you describe them as "a collection of facts [collected and] trained by an entirely un-creative process" then it begins to sound like one could argue for Database Right.

https://en.wikipedia.org/wiki/Database_right

supermatt2y ago

And yet meta are specifying a license, implying they do hold the copyright.

1 more reply

inciampati2y ago

Keep working on it! A good court case and a landmark decision about this could change the landscape for the market for these models.

ImprobableTruth2y ago

Llama 2 is open source-ish. Weights are freely available and can be commercially used, but only if you have less than 700m users and agree to some "don't do naughty things" terms.

jmiskovic2y ago

Nope. It's limited by 700m monthly-active users at the time Llama2 was released, a weird catch clause for a handful of Meta competitor companies. The license doesn't satisfy OSS requirements, but it is quite reasonable.

https://ai.meta.com/llama/license/ https://ai.meta.com/llama/use-policy/

supermatt2y ago

If the JSON license isn't considered open, due to requiring that "The Software shall be used for Good, not Evil.", then I don't see how tacking an additional financial threshold onto it makes it more open. I don't think meta even released the training dataset, so you cant even replicate it (should you have the funds to do so).

There are other LLMs that don't have such restrictions, and publish their training data.

satvikpendem2y ago

There is no open source-ish. It either protects the fundamental freedoms or it...doesn't.

2 more replies

keyle2y ago

"open source-ish" sounds like the perfect way to massively profitable future litigations.

Also "don't do naughty things", is there a chart for that? How is that defined, is it part of the non-existing license?

1 more reply

Havoc2y ago

From memory llama 2 license does allow tuned models with suitable credit & license inclusion. The restricted using it to train other models though (a bit like people use gpt4 to generate question/answer pairs to train their models)

vekker2y ago

It's probably too new for anyone to have integrated this into text-generation-webui / Gradio? I've been looking for a large context LLM (self-hosted or not) for a project, and as a European I unfortunately don't have access to Anthropic's Claude API yet.

qeternity2y ago

It's just Llama 2 w/ rotary encoding fine tuned to 32k. It should work fine.

syntaxing2y ago

How high context do you need? There’s a couple 16K models out there now. Some people have their own 32K ones too but the quality vary. It’s worth trying them on huggingface. The easiest way is to track TheBloke’s work to see any new models that come out.

Havoc2y ago

Does anyone know if larger context lengths are inherent worse at other task?

i.e. all other things being equal is a 8k model better at math than a 32k model

syntaxing2y ago

There’s a couple models on huggingface that uses NTK/linear RoPE that you can play with. Vicuna and WizardLM both have a 16K context model. The biggest issue is that if you go to really high context, it sometimes does these weird repetitions. But to be fair, I only have tried the quantized models and 13B (highest I can run locally). Not sure if the repetition are an artifact of the rope or quantization or both.

weinzierl2y ago

They are more resource (time and memory) intensive in training and inference, that is their disadvantage. For a fair comparison you would have to compare a 8k to a 32k pre-trained model with otherwise similar hyper-parameters.

OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.

semi2y ago

Is the increased resource usage inherent to the model or does it only happen when using the extra context? Like if your workflow currently fits in a 2k model would an 8k model be objectively worse and only worth using once you've filled the context up of a smaller model? Or would it be worth always using an 8k context model and just knowing it will get slower and more resource hungry as your context grows?

Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.

1 more reply

j / k navigate · click thread line to collapse

28 comments

alsodumb2y ago

woadwarrior012y ago

Their CEO made a post[1] on Twitter claiming to have invented "the world's first commercially usable 32K long-context open-source LLM", which IMO is pure hyperbole.

[1]: https://twitter.com/bindureddy/status/1694126931174977906

[2]: https://huggingface.co/abacusai/Giraffe-v2-13b-32k

[3]: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instr...

[4]: https://twitter.com/togethercompute/status/16925744231638470...

weinzierl2y ago

This is just another fine-tuned LLaMA and Llama 2, like there are already some. I doubt that this will give seriously meaningful results for long context inference.

kordlessagain2y ago

These 800 watt speakers are great. So loud.

smcleod2y ago

“Better than lossless”

supermatt2y ago

It seems this is built on LLAMA. Did meta change the license to make it open source now? It still seems to be showing otherwise in the repo.

Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang

sillysaurusx2y ago

This hasn’t been proven in court, but it seems the most likely outcome.

nemoniac2y ago

https://en.wikipedia.org/wiki/Database_right

supermatt2y ago

And yet meta are specifying a license, implying they do hold the copyright.

1 more reply

inciampati2y ago

Keep working on it! A good court case and a landmark decision about this could change the landscape for the market for these models.

ImprobableTruth2y ago

Llama 2 is open source-ish. Weights are freely available and can be commercially used, but only if you have less than 700m users and agree to some "don't do naughty things" terms.

jmiskovic2y ago

https://ai.meta.com/llama/license/ https://ai.meta.com/llama/use-policy/

supermatt2y ago

There are other LLMs that don't have such restrictions, and publish their training data.

satvikpendem2y ago

There is no open source-ish. It either protects the fundamental freedoms or it...doesn't.

2 more replies

keyle2y ago

"open source-ish" sounds like the perfect way to massively profitable future litigations.

Also "don't do naughty things", is there a chart for that? How is that defined, is it part of the non-existing license?

1 more reply

Havoc2y ago

vekker2y ago

qeternity2y ago

It's just Llama 2 w/ rotary encoding fine tuned to 32k. It should work fine.

syntaxing2y ago

Havoc2y ago

Does anyone know if larger context lengths are inherent worse at other task?

i.e. all other things being equal is a 8k model better at math than a 32k model

syntaxing2y ago

weinzierl2y ago

OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.

semi2y ago

Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.

1 more reply

j / k navigate · click thread line to collapse