Tencent Hunyuan-Large (opens in new tab)

(github.com)

148 pointshelloericsf1y ago103 comments

103 comments

45 comments · 9 top-level

mrob1y ago· 14 in thread

Not open source. Even if we accept model weights as source code, which is highly dubious, this clearly violates clauses 5 and 6 of the Open Source Definition. It discriminates between users (clause 5) by refusing to grant any rights to users in the European Union, and it discriminates between uses (clause 6) by requiring agreement to an Acceptable Use Policy.

EDIT: The HN title was changed, which previously made the claim. But as HN user swyx pointed out, Tencent is also claiming this is open source, e.g.: "The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry".

ronsor1y ago

I will again ask the obligatory question: are model weights even copyrightable? And if not, does the "license" still matter?

parl_match1y ago

I doubt there will be a satisfactory answer for a long time.

1 more reply

warkdarrior1y ago

(IANAL)

Model weights could be treated the same way phone books, encyclopedias, and other collections of data are treated. The copyright is over the collection itself, even if the individual items are not copyrightable.

2 more replies

dplavery921y ago

The title of Tencent's paper [0] as well as their homepage for the model [1] each use the term "Open-Source" in the title, so I think they are making the claim.

[0] https://arxiv.org/pdf/2411.02265 [1] https://llm.hunyuan.tencent.com/

vanguardanon1y ago

What is the reason for restrictions in the EU? Is it due to some EU regulations?

ronsor1y ago

Most likely yes. I don't think companies can be blamed for not wanting to subject themselves to EU regulations or uncertainty.

Edit: Also, if you don't want to follow or deal with EU law, you don't do business in the EU. People here regularly say if you do business in a country, you have to follow its laws. The opposite also applies.

1 more reply

blueblimp1y ago

In Meta's case, the problem is that they had been given the go-ahead by the EU to train on certain data, and then after starting training, the EU changed its mind and told them to stop.

GaggiX1y ago

They probably trained on data protected by privacy laws, similar to Meta.

karaterobot1y ago

Hmm, in fairness I don't see where Tencent is claiming this is open source (at least in this repo; I haven't checked elsewhere). The title of the HN post does make the claim, and that may be controversial or simply incorrect.

swyx1y ago

readme: https://github.com/Tencent/Tencent-Hunyuan-Large

> "By open-sourcing the Hunyuan-Large model"

1 more reply

kaliqt1y ago

I agree, however, Meta is also guilty of this crime as well.

DataDaemon1y ago

Who cares about EU? They are destroying themselves.

Mistletoe1y ago

Ironically their policies are why I want to move there with my American dollars. I want to live somewhere that cares about my rights, not the rights of corporations.

1 more reply

the5avage1y ago

Where would you go when you would live there (as a programmer interested in ai)? Just asking for a friend.

1 more reply

Tepix1y ago· 7 in thread

I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters". Do hobbyists stand a chance of running this model (quantized) at home? For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?

bick_nyers1y ago

You would need to fit the 389B parameters in VRAM to have a speed that is usable. Different experts are activated on a per token basis, so you would need to load/unload a large chunk of the 52B active parameters every token if you were trying to offload parameters to system RAM or SSD. PCIE 4.0 x16 speed is 64GB/s, so you can load those active parameters maybe 1 or 2 times per second, yielding an output speed of 1-2 tokens per second, which most would consider "unusable".

o11c1y ago

Does that have to be same-node VRAM? Or can you fit 52B each on several nodes, and only copy the transient state around?

1 more reply

DrPhish1y ago

Yes, it can be done. I'm running a 24-channel DDR5 dual-EPYC rig and get good speed on large MoE models. I only use the GPU for context processing.

They're actually a best-case for CPU inference vs dense models. I usually run deepseek 2.5 quanted to q8, but if this model works well I'll probably switch to it once support hits llama.cpp.

ConnectX-21y ago

>I only use the GPU for context processing.

If your GPU has enough VRAM to support it, you might benefit from https://github.com/kvcache-ai/ktransformers

Tepix1y ago

Interesting, what RAM do you use exactly? 24x 16GB DDR5-6000 DIMMs? It seems that those boards only support up to DDR5-4800: https://geizhals.de/?cat=mbsp3&xf=4921_2%7E493_24x+DDR5+DIMM...

Does the core count matter or can you get away with the smallest 2x EPYC 9015 configuration? What are "good speeds"?

2 more replies

lanceflt1y ago

RAM for 4-bit is 1GB per 2 billion parameters. So you will want 256GB RAM and at least one GPU. If you only have one server and one user, it's the full parameter count. (If you have multiple GPUs/servers and many users in parallel, you can shard and route it so you only need the active parameter count per GPU/server. So it's cheaper at scale.)

zamadatix1y ago

Do the inactive parameters need to be loaded into RAM to run an MoE model decently enough?

eptcyka1y ago· 5 in thread

Definitely not trained on Nvidia or AMD GPUs.

acchow1y ago

How do you know this?

Apparently 20% of Nvidia's quarterly revenue is booked in Singapore where shell companies divert product to China: https://news.ycombinator.com/item?id=42048065

smnrg1y ago

Sarcasm is a valid theory.

azinman21y ago

I assume it was missing /s

rb2k_1y ago

The readme mentioned H20 GPUs. Nvidia's "China compatible" card (41% Fewer Cores & 28% Lower Performance Versus Top Hopper H100 Config)

1R0531y ago

you can get a long way on something with 41% less performance than your favorite supercar...

the_duke1y ago· 4 in thread

> Territory” shall mean the worldwide territory, excluding the territory of the European Union.

Anyone have some background on this?

jmole1y ago

I believe the EU has (or is drafting) laws about LLMs of a certain size which this release would not comply with.

mattlutze1y ago

https://artificialintelligenceact.eu/high-level-summary/

There's many places where the model might be used which could count as high-risk scenarios and require lots of controls. Also, we have:

  GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 10^25 floating point operations (FLOPs). Providers must notify the Commission if their model meets this criterion within 2 weeks. The provider may present arguments that, despite meeting the criteria, their model does not present systemic risks. The Commission may decide on its own, or via a qualified alert from the scientific panel of independent experts, that a model has high impact capabilities, rendering it systemic.

  In addition to the four obligations above, providers of GPAI models with systemic risk must also:

  - Perform model evaluations, including conducting and documenting adversarial testing to identify and mitigate systemic risk.
  - Assess and mitigate possible systemic risks, including their sources.
  - Track, document and report serious incidents and possible corrective measures to the AI Office and relevant national competent authorities without undue delay.
  - Ensure an adequate level of cybersecurity protection."

They may not want to meet these requirements.

2 more replies

troupo1y ago

Also existing privacy laws (GDPR) and AI Act (foundational models have to disclose and document their training data)

GaggiX1y ago

I imagine they trained on data that is protected by privacy laws, similar to Meta.

helloericsfOP1y ago· 3 in thread

- 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. - outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model.

Etheryte1y ago

It's a bit funny to call the 405B reference "significantly larger" than their 389B, while highlighting the fact that their 389B outperforms the 70B.

rose_ann_1y ago

MoE model with 52 billion activated parameters means its more comparable to a (dense) 70b model and not a dense 405b model

2 more replies

klipt1y ago

It's a whole 4% smaller!

a_wild_dandan1y ago· 2 in thread

The model meets/beats Llama despite having an order-of-magnitude fewer active parameters (52B vs 405B). Absolutely bonkers. AI is moving so fast with these breakthroughs -- synthetic data, distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG, curriculum learning, etc.

We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.

csomar1y ago

It is insane because 52B can run on my current 3 years old laptop. 3B LLMA 3.2 from Facebook can already autocomplete for me. I didn't try this model but if the scores are to be believed, this can give useful and actionable insights into a project source code. Probably not as good as Claude 3.5 but I can run it locally. This is a game changer.

z3ncyberpunk1y ago

Moving fast or just completely inefficient

iqandjoke1y ago· 1 in thread

How does it compare with LLama3.2?

Tepix1y ago

Llama 3.2 has the same performance for text as Llama 3.1 and the largest model hasn't been released.

1R0531y ago

the paper with details: https://arxiv.org/pdf/2411.02265

They use

- 16 experts, of which one is activated per token

- 1 shared expert that is always active

in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.

adt1y ago

https://lifearchitect.ai/models-table/

j / k navigate · click thread line to collapse

103 comments

45 comments · 9 top-level

mrob1y ago· 14 in thread

ronsor1y ago

I will again ask the obligatory question: are model weights even copyrightable? And if not, does the "license" still matter?

parl_match1y ago

I doubt there will be a satisfactory answer for a long time.

1 more reply

warkdarrior1y ago

(IANAL)

2 more replies

dplavery921y ago

The title of Tencent's paper [0] as well as their homepage for the model [1] each use the term "Open-Source" in the title, so I think they are making the claim.

[0] https://arxiv.org/pdf/2411.02265 [1] https://llm.hunyuan.tencent.com/

vanguardanon1y ago

What is the reason for restrictions in the EU? Is it due to some EU regulations?

ronsor1y ago

Most likely yes. I don't think companies can be blamed for not wanting to subject themselves to EU regulations or uncertainty.

1 more reply

blueblimp1y ago

In Meta's case, the problem is that they had been given the go-ahead by the EU to train on certain data, and then after starting training, the EU changed its mind and told them to stop.

GaggiX1y ago

They probably trained on data protected by privacy laws, similar to Meta.

karaterobot1y ago

swyx1y ago

readme: https://github.com/Tencent/Tencent-Hunyuan-Large

> "By open-sourcing the Hunyuan-Large model"

1 more reply

kaliqt1y ago

I agree, however, Meta is also guilty of this crime as well.

DataDaemon1y ago

Who cares about EU? They are destroying themselves.

Mistletoe1y ago

Ironically their policies are why I want to move there with my American dollars. I want to live somewhere that cares about my rights, not the rights of corporations.

1 more reply

the5avage1y ago

Where would you go when you would live there (as a programmer interested in ai)? Just asking for a friend.

1 more reply

Tepix1y ago· 7 in thread

bick_nyers1y ago

o11c1y ago

Does that have to be same-node VRAM? Or can you fit 52B each on several nodes, and only copy the transient state around?

1 more reply

DrPhish1y ago

Yes, it can be done. I'm running a 24-channel DDR5 dual-EPYC rig and get good speed on large MoE models. I only use the GPU for context processing.

They're actually a best-case for CPU inference vs dense models. I usually run deepseek 2.5 quanted to q8, but if this model works well I'll probably switch to it once support hits llama.cpp.

ConnectX-21y ago

>I only use the GPU for context processing.

If your GPU has enough VRAM to support it, you might benefit from https://github.com/kvcache-ai/ktransformers

Tepix1y ago

Interesting, what RAM do you use exactly? 24x 16GB DDR5-6000 DIMMs? It seems that those boards only support up to DDR5-4800: https://geizhals.de/?cat=mbsp3&xf=4921_2%7E493_24x+DDR5+DIMM...

Does the core count matter or can you get away with the smallest 2x EPYC 9015 configuration? What are "good speeds"?

2 more replies

lanceflt1y ago

zamadatix1y ago

Do the inactive parameters need to be loaded into RAM to run an MoE model decently enough?

eptcyka1y ago· 5 in thread

Definitely not trained on Nvidia or AMD GPUs.

acchow1y ago

How do you know this?

Apparently 20% of Nvidia's quarterly revenue is booked in Singapore where shell companies divert product to China: https://news.ycombinator.com/item?id=42048065

smnrg1y ago

Sarcasm is a valid theory.

azinman21y ago

I assume it was missing /s

rb2k_1y ago

The readme mentioned H20 GPUs. Nvidia's "China compatible" card (41% Fewer Cores & 28% Lower Performance Versus Top Hopper H100 Config)

1R0531y ago

you can get a long way on something with 41% less performance than your favorite supercar...

the_duke1y ago· 4 in thread

> Territory” shall mean the worldwide territory, excluding the territory of the European Union.

Anyone have some background on this?

jmole1y ago

I believe the EU has (or is drafting) laws about LLMs of a certain size which this release would not comply with.

mattlutze1y ago

https://artificialintelligenceact.eu/high-level-summary/

There's many places where the model might be used which could count as high-risk scenarios and require lots of controls. Also, we have:

  GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 10^25 floating point operations (FLOPs). Providers must notify the Commission if their model meets this criterion within 2 weeks. The provider may present arguments that, despite meeting the criteria, their model does not present systemic risks. The Commission may decide on its own, or via a qualified alert from the scientific panel of independent experts, that a model has high impact capabilities, rendering it systemic.

  In addition to the four obligations above, providers of GPAI models with systemic risk must also:

  - Perform model evaluations, including conducting and documenting adversarial testing to identify and mitigate systemic risk.
  - Assess and mitigate possible systemic risks, including their sources.
  - Track, document and report serious incidents and possible corrective measures to the AI Office and relevant national competent authorities without undue delay.
  - Ensure an adequate level of cybersecurity protection."

They may not want to meet these requirements.

2 more replies

troupo1y ago

Also existing privacy laws (GDPR) and AI Act (foundational models have to disclose and document their training data)

GaggiX1y ago

I imagine they trained on data that is protected by privacy laws, similar to Meta.

helloericsfOP1y ago· 3 in thread

Etheryte1y ago

It's a bit funny to call the 405B reference "significantly larger" than their 389B, while highlighting the fact that their 389B outperforms the 70B.

rose_ann_1y ago

MoE model with 52 billion activated parameters means its more comparable to a (dense) 70b model and not a dense 405b model

2 more replies

klipt1y ago

It's a whole 4% smaller!

a_wild_dandan1y ago· 2 in thread

We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.

csomar1y ago

z3ncyberpunk1y ago

Moving fast or just completely inefficient

iqandjoke1y ago· 1 in thread

How does it compare with LLama3.2?

Tepix1y ago

Llama 3.2 has the same performance for text as Llama 3.1 and the largest model hasn't been released.

1R0531y ago

the paper with details: https://arxiv.org/pdf/2411.02265

They use

- 16 experts, of which one is activated per token

- 1 shared expert that is always active

in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.

adt1y ago

https://lifearchitect.ai/models-table/

j / k navigate · click thread line to collapse