GLM-4-9B: open-source model with superior performance to Llama-3-8B (opens in new tab)

(github.com)

66 pointsmarcelsalathe1y ago17 comments

17 comments

Looks like terrific technology. However, the translation says that it's an "irrevocable revocable" non-commercial license with a form to apply for commercial use.

pilotneko1y ago

That's weird, because the repository states Apache 2.0. https://github.com/THUDM/GLM-4/blob/main/LICENSE

Oops, you are right. The code is Apache 2.0, license for the model weights is separate.

mikeqq20241y ago

"non-exclusive, worldwide, irrevocable, non-sublicensable, revocable, photo-free copyright license."

Translation error? output from GPT: "non-exclusive, global, non-transferable, non-sublicensable, revocable, royalty-free license."

great_psy1y ago

I’m excited to hear work is being done on models that support function calling natively.

Does anybody know if performance could be greatly increased if only a single language was supported ?

I suspect there’s a high demand for models that are maybe smaller and can run faster if the tradeoff is support for only English.

Is this available in ollama ?

freeqaz1y ago

Are there any other models that support function calling?

wtarreau1y ago

I ran some tests on phi-3 and mistral-7b and it's not very hard to teach them to use tools, even though they were not designed for this. It turns out these models obey their instructions quite well and when you explain them that if they need to look up data on the net or to perform a calculation, they must formulate this demand with a specific syntax, they do a pretty good job. You just have to enable reverse-prompting so that the evaluation stops after their demand, your tools do the job (or you simulate it manually) and their task continues.

abrichr1y ago

> GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.

But according to their own evaluation further down, gpt-4o-2024-05-13 outperforms GLM-4V-9B on every task except OCRBench.

czl1y ago

Based on size (parameter count) they are in different categories. GLM-4V-9B is in the "light-weight" competition. gpt-4o-2024-05-13 is in the medium or heavy "weight" competition.

norwalkbear1y ago

Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?

Llama-3-8b was garbage for me but damn 70b is good enough

reaperman1y ago

The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.

norwalkbear1y ago

So a MacBook m series is a decent buy

Manabu-eo1y ago

No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.

1 more reply

oarth1y ago

If those numbers are true then it's very impressive. Hoping for llama.cpp support.

nubinetwork1y ago

1M context, but does it really? I've been hit with 32K models that crap out after 10K before...

fragmede1y ago

model available, not open source.

refulgentis1y ago

Ehhh man this is frustrating, 7B was a real sweet spot for hobbyist. 8B...doable. I've been joking to myself/simultaneously worried that Llama 3 8B and Phi-3 "3B" (3.8B) would start a "ehhh, +1, might as well be a rounding error" thing. It's a big deal! I measure a 33% decrease just going from 3B to 3.8B when inferencing on CPU.

j / k navigate · click thread line to collapse

17 comments

ilaksh1y ago

Looks like terrific technology. However, the translation says that it's an "irrevocable revocable" non-commercial license with a form to apply for commercial use.

pilotneko1y ago

That's weird, because the repository states Apache 2.0. https://github.com/THUDM/GLM-4/blob/main/LICENSE

Oops, you are right. The code is Apache 2.0, license for the model weights is separate.

mikeqq20241y ago

"non-exclusive, worldwide, irrevocable, non-sublicensable, revocable, photo-free copyright license."

Translation error? output from GPT: "non-exclusive, global, non-transferable, non-sublicensable, revocable, royalty-free license."

great_psy1y ago

I’m excited to hear work is being done on models that support function calling natively.

Does anybody know if performance could be greatly increased if only a single language was supported ?

I suspect there’s a high demand for models that are maybe smaller and can run faster if the tradeoff is support for only English.

Is this available in ollama ?

freeqaz1y ago

Are there any other models that support function calling?

wtarreau1y ago

abrichr1y ago

But according to their own evaluation further down, gpt-4o-2024-05-13 outperforms GLM-4V-9B on every task except OCRBench.

czl1y ago

Based on size (parameter count) they are in different categories. GLM-4V-9B is in the "light-weight" competition. gpt-4o-2024-05-13 is in the medium or heavy "weight" competition.

norwalkbear1y ago

Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?

Llama-3-8b was garbage for me but damn 70b is good enough

reaperman1y ago

The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.

norwalkbear1y ago

So a MacBook m series is a decent buy

Manabu-eo1y ago

No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.

1 more reply

oarth1y ago

If those numbers are true then it's very impressive. Hoping for llama.cpp support.

nubinetwork1y ago

1M context, but does it really? I've been hit with 32K models that crap out after 10K before...

fragmede1y ago

model available, not open source.

refulgentis1y ago

j / k navigate · click thread line to collapse