Nemotron-4-340B (opens in new tab)

(blogs.nvidia.com)

137 pointsbcatanzaro1y ago46 comments

46 comments

> The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs.

I feel like everyone is missing this from the announcement. They explicitly are releasing this to help generate synthetic training data. Most big models and APIs have clauses that ban its use to improve other models. Sure it maybe can compete with other big commercial models at normal tasks, but this would be a huge opportunity for ML labs and startups to expand training data of smaller models.

Nvidia must see a limit to the growth of new models (and new demand for training with their GPUs) based on the availability of training data, so they're seeking to provide a tool to bypass those restrictions.

All for the low price of 2x A100s...

jsheard1y ago

> Most big models and APIs have clauses that ban its use to improve other models.

I will never get over the gall of anything and everything being deemed fair game to use as training data for a model, except you're not allowed to use the output of a model to train your own model without permission, because model output has some kind of exclusive super-copyright apparently.

vineyardmike1y ago

> because model output has some kind of exclusive super-copyright apparently

Well, its not copyright that is being used to forbid this, its terms of service, but yea, it is quite a hypocrisy.

1 more reply

logicchains1y ago

>They explicitly are releasing this to help generate synthetic training data

Synthetic training data is basically free money for NVidia; there's only a fixed amount of high-quality original data around, but there's a potential for essentially infinite synthetic data, and more data means more training hours means more GPU demand.

cyanydeez1y ago

GIGOaaS

observationist1y ago

This is (possibly) a GPT-4 level dense model with an open source license. Nvidia has released models with issues before, but reports on this so far indicate it's a solid contender without any of the hiccups of previous releases.

A 340B model should require around 700GB vram or ram to run inference. To train or finetune, you're looking at almost double, which is probably why Nvidia recommends 2xA100 nodes with 1.28TB vram.

Jensen Huang is the king of AI summer.

samspenc1y ago

I wonder if the open-source LLM community understands what just happened here - we finally got a truly large LLM (a whopping 340B!) but it costs ... $15K per A100 x 16 GPUs = a minimum of $240K to just get started. Probably closer to $500K or half a million dollars once you factor in space, power, cooling, infrastructure etc.

lhl1y ago

You could probably run it as a Q4 (definitely as a Q3) on 4 x A6000 (so on a $25K workstation), although you'd probably also be looking about 3-4 tok/s text generation. I do think that it's a big landmark to have a true GPT4-class model (with some questionable RL though from my initial testing). The best thing about it is that it's almost certainly now the strongest model available for generating synthetic data without any licensing restrictions.

Funnily enough, I don't think it's actually the most interesting model that Nvidia released this week. Nvidia also published this paper https://arxiv.org/abs/2406.07887 and released https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-128k (Apache 2.0 licensed, to boot). It looks like it matches (and sometimes even edges out) Transformer performance, while having linear scaling for context length. Can't wait for a scaled up version of this.

Nvidia also released a top-notch Llama3 70B SteerLM reward model as well (although RLHFlow/ArmoRM-Llama3-8B-v0.1 might still be a better choice).

rogerdox141y ago

Or you could run it quantized on about $6k worth of 192GB Mac Studio — probably not that fast.

throwaway_ab1y ago

How would a server/workstation like this be setup?

I thought you could only use the vram on the GPU, so for 700GB you would need 8-9 A100 nodes as 2 only gives 160GB.

I've been trying to figure out how to build a local system to run inference and train on top of LLM models, I thought there was no way to add vram to a system outside of adding more and more GPU's or use system ram (DDR5) even though that would be considerably slower.

toshinoriyagi1y ago

An A100 node has 8 A100s in it, each with 80GB, which is how they got the 1.28TB number 2 * (80 * 8).

rthnbgrredf1y ago

With CPU inference you just need a server with 1.28TB RAM. Yes, the inference will be super slow, but it is more realistic than to spend 100k+ dollars for A100 clusters with 1.28TB VRAM.

One example: HP DL580 Gen8. Use the 32GB PC3L-14900L LRDIMMs (HP PN 715275-001; 712384-001, 708643-B21) for a maximum of 3TB. You can get the LRDIMMs in the $32-$45 range on the second-hand market.

Tepix1y ago

I was thinking the same. Jart has gotten very impressive performance out of her 8 channel Zen 4 Threadripper Pro 7995WX (Storm peak). I'm using a Zen 3 TR Pro (Chagall) myself.

waldrews1y ago

do you mean 1.28 TB?

observationist1y ago

Yes, thank you for catching that!

latchkey1y ago

It would only require 4x AMD MI300x.

1 more reply

diggan1y ago

The "open" and "permissive" license has an interesting section on "AI Ethics":

> AI Ethics. NVIDIA is committed to safety, trust and transparency in AI development. NVIDIA encourages You to (a) ensure that the product or service You develop, use, offer as a service or distributes meets the legal and ethical requirements of the relevant industry or use case, (b) take reasonable measures to address unintended bias and to mitigate harm to others, including underrepresented or vulnerable groups, and (c) inform users of the nature and limitations of the product or service. NVIDIA expressly prohibits the use of its products or services for any purpose in violation of applicable law or regulation, including but not limited to (a) illegal surveillance, (b) illegal collection or processing of biometric information without the consent of the subject where required under applicable law, or (c) illegal harassment, abuse, threatening or bullying of individuals or groups of individuals or intentionally misleading or deceiving others

https://developer.download.nvidia.com/licenses/nvidia-open-m...

Besides limiting the freedom of use (making it less "open" in my eyes), it's interesting that they tell you to meet "ethical requirements of the relevant industry or use case". Seems like that'd be super hard to pin down in a precise way.

jerbear43281y ago

I read that as "NVIDIA encourages you to be ethical and prohibits breaking the law. That doesn't seem so bad to me. What is bad, however, is section 2.1.

> 2.1 ... If You institute ... litigation against any entity ... alleging that the Model or a Derivative Model constitutes direct or contributory copyright or patent infringement, then any licenses granted to You under this Agreement for that Model or Derivative Model will terminate...

If you sue or file a copyright claim that the model violates copyright, you lose your license to use the model. That's a really weird restriction, I'm not sure what the point is.

bcatanzaroOP1y ago

The point is: if you sue claiming this model breaks the law, you lose your license to use it.

Apache 2.0 has a similar restriction: “ If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.”

2 more replies

sebzim45001y ago

Sounds reasonable to me. If you are going to claim in court the the model is illegal then why exactly are you using it?

IncreasePosts1y ago

It says "NVIDIA encourages You to..."

Which, in terms of a contract, means absolutely nothing at all.

brianshaler1y ago

I'm not sure what business GP is in, but being encouraged not to be unethical and explicitly forbidding illegal activity doesn't seem like much of an infringement on one's freedom more than the applicable laws. I guess being arrested for crimes is one thing, but having a license revoked on top of that is just one step too far?

mushufasa1y ago

Google famously removed "don't be evil" because lawyers pushed back on who gets to define evil. I can imagine same logic applies here: Nvidia isn't about to define objective morality, so the best alternative is to ask people to try their best.

imglorp1y ago

Very weaselly worded. Some things that appear to be allowed:

    * intended bias
    * legal surveillance
    * legal collection of biometrics without consent
    * legal harrassment

Ie, state sanctioned killbots are just fine!

telotortium1y ago

No copyright license is going to stop a state from using the model for the military use that they really need. First of all, I’m pretty sure most countries have laws allowing the state to ignore copyright in the case of national defense. More importantly, power does what it wants and what it can get away with.

abdullahkhalids1y ago

It's good they have included this clause, despite it being difficult to legally pin down. Hopefully, there will be a lawsuit at some point which will create some ethical boundaries that AI developers and users much not cross.

kirilligum1y ago

it's 5x the price of llama3/qwen2 70b. the performance on the benchmark is similar. but with 70b you can break a task in steps and do 5+ steps. doesn't seem like it is worth it in general cases for the price. is 340 better for synthetic data generation (which is my primary usecase) are there tests for that? seems like synthetic data would benefit from multi step reasoning and reduction of hallucination and in those tests, the difference is small.

option1y ago

3 models are included: base, instruct, and reward. All under license permitting synthetic data generation and commercial use.

ilaksh1y ago

Has anyone runs evaluations to compare the instruct version with gpt-4o or llama3-70b etc.? It's so much larger than the leading open source models. So one would hope it would perform significantly better?

Or is this in one of the chat arenas or whatever? Very curious to see some numbers related to the performance.

But if it's at least somewhat better than the existing open source models then that is a big boost for open source training and other use cases.

rllearneratwork1y ago

this is june-chatbot model currently running on chatbot arena from lmsys

belter1y ago

https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_...

"...Nemotron-4-340B-Base was trained using 768 DGX H100 nodes"

That is 350 million dollars for you...Poor Startups, better have a rich sponsor.

1 more reply

hilux1y ago

I'm so confused.

Isn't "training LLMs on LLM output" the very definition of "model collapse" or "model poisoning"?

int_19h1y ago

The claim (which is not uncontested, I should add) is that doing so repeatedly inevitably produces model collapse. Even if that is true, however, you can still derive benefit from using larger models to generate large amounts of synthetic training data for smaller models. Most LLaMA finetunes out there are trained on GPT-4 output, for example.

WithinReason1y ago

"...and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision"

OK I see the goal is to sell more H100s, they made it big enough so it's not compatible with a cheaper GPU

bguberfain1y ago

"Nemotron-4-340B-Instruct is a chat model intended for use for the English language" - frustrating

Something12341y ago

What is it? Is it an llms or what?

danielhanchen1y ago

Oh NVIDIA released an open weights 340 billion parameter LLM!

It should be the biggest open weights to date I think (Grok 314b).

It's trained on 8 trillion tokens, and some benchmarks show it does better than or equal to GPT-4o!

They released 3 checkpoints - the base, the instruct and a reward aligned model.

See https://huggingface.co/collections/nvidia/nemotron-4-340b-66... for all the checkpoints

vosper1y ago

Why does nvidia release models that compete with its customers businesses but don’t make any money for nvidia?

Are they commodotising their complements?

vineyardmike1y ago

> [commoditizing] their complements

That's exactly what this would be.

> compete with its customers businesses

I suspect most of their business comes from a few massive corporate spenders, not a "long tail" of smaller businesses, so it seems like a questionable goal to disrupt those customers without a clear path to new customers. Then again, few have the resources to run this model, so I guess this just ensures that their big customers are all working with some floor in model size? Probably won't impact anything realistically.

Jlagreen1y ago

Nvidia offers AI Enterprise suite with NeMo, NIMS and many other services and consultancy to enterprise customers. These customers than can either use any AI models or Nvidia models.

Nvidia has no intention to earn money on models but to offer foundation models and extending their SW products which require their HW platform.

Basically, just like CUDA costs you nothing, it costs you nothing to use Nvidia models. And since you're on it you might want to use Nvidia HW for better performance and then you might want security and get interested in Nvidia SW enterprise.

logicchains1y ago

They target this model at generating synthetic data. Data is the lifeblood of LLM training; quality synthetic data means more training can occur which means more demand for GPUs.

WithinReason1y ago

The model is big enough that you need expensive Nvidia GPUs to run it effectively

j / k navigate · click thread line to collapse

46 comments

vineyardmike1y ago

> The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs.

All for the low price of 2x A100s...

jsheard1y ago

> Most big models and APIs have clauses that ban its use to improve other models.

vineyardmike1y ago

> because model output has some kind of exclusive super-copyright apparently

Well, its not copyright that is being used to forbid this, its terms of service, but yea, it is quite a hypocrisy.

1 more reply

logicchains1y ago

>They explicitly are releasing this to help generate synthetic training data

cyanydeez1y ago

GIGOaaS

observationist1y ago

A 340B model should require around 700GB vram or ram to run inference. To train or finetune, you're looking at almost double, which is probably why Nvidia recommends 2xA100 nodes with 1.28TB vram.

Jensen Huang is the king of AI summer.

samspenc1y ago

lhl1y ago

Nvidia also released a top-notch Llama3 70B SteerLM reward model as well (although RLHFlow/ArmoRM-Llama3-8B-v0.1 might still be a better choice).

rogerdox141y ago

Or you could run it quantized on about $6k worth of 192GB Mac Studio — probably not that fast.

throwaway_ab1y ago

How would a server/workstation like this be setup?

I thought you could only use the vram on the GPU, so for 700GB you would need 8-9 A100 nodes as 2 only gives 160GB.

toshinoriyagi1y ago

An A100 node has 8 A100s in it, each with 80GB, which is how they got the 1.28TB number 2 * (80 * 8).

rthnbgrredf1y ago

With CPU inference you just need a server with 1.28TB RAM. Yes, the inference will be super slow, but it is more realistic than to spend 100k+ dollars for A100 clusters with 1.28TB VRAM.

One example: HP DL580 Gen8. Use the 32GB PC3L-14900L LRDIMMs (HP PN 715275-001; 712384-001, 708643-B21) for a maximum of 3TB. You can get the LRDIMMs in the $32-$45 range on the second-hand market.

Tepix1y ago

I was thinking the same. Jart has gotten very impressive performance out of her 8 channel Zen 4 Threadripper Pro 7995WX (Storm peak). I'm using a Zen 3 TR Pro (Chagall) myself.

waldrews1y ago

do you mean 1.28 TB?

observationist1y ago

Yes, thank you for catching that!

latchkey1y ago

It would only require 4x AMD MI300x.

1 more reply

diggan1y ago

The "open" and "permissive" license has an interesting section on "AI Ethics":

https://developer.download.nvidia.com/licenses/nvidia-open-m...

jerbear43281y ago

I read that as "NVIDIA encourages you to be ethical and prohibits breaking the law. That doesn't seem so bad to me. What is bad, however, is section 2.1.

If you sue or file a copyright claim that the model violates copyright, you lose your license to use the model. That's a really weird restriction, I'm not sure what the point is.

bcatanzaroOP1y ago

The point is: if you sue claiming this model breaks the law, you lose your license to use it.

2 more replies

sebzim45001y ago

Sounds reasonable to me. If you are going to claim in court the the model is illegal then why exactly are you using it?

IncreasePosts1y ago

It says "NVIDIA encourages You to..."

Which, in terms of a contract, means absolutely nothing at all.

brianshaler1y ago

mushufasa1y ago

imglorp1y ago

Very weaselly worded. Some things that appear to be allowed:

    * intended bias
    * legal surveillance
    * legal collection of biometrics without consent
    * legal harrassment

Ie, state sanctioned killbots are just fine!

telotortium1y ago

abdullahkhalids1y ago

kirilligum1y ago

option1y ago

3 models are included: base, instruct, and reward. All under license permitting synthetic data generation and commercial use.

ilaksh1y ago

Or is this in one of the chat arenas or whatever? Very curious to see some numbers related to the performance.

But if it's at least somewhat better than the existing open source models then that is a big boost for open source training and other use cases.

rllearneratwork1y ago

this is june-chatbot model currently running on chatbot arena from lmsys

belter1y ago

https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_...

"...Nemotron-4-340B-Base was trained using 768 DGX H100 nodes"

That is 350 million dollars for you...Poor Startups, better have a rich sponsor.

1 more reply

hilux1y ago

I'm so confused.

Isn't "training LLMs on LLM output" the very definition of "model collapse" or "model poisoning"?

int_19h1y ago

WithinReason1y ago

"...and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision"

OK I see the goal is to sell more H100s, they made it big enough so it's not compatible with a cheaper GPU

bguberfain1y ago

"Nemotron-4-340B-Instruct is a chat model intended for use for the English language" - frustrating

Something12341y ago

What is it? Is it an llms or what?

danielhanchen1y ago

Oh NVIDIA released an open weights 340 billion parameter LLM!

It should be the biggest open weights to date I think (Grok 314b).

It's trained on 8 trillion tokens, and some benchmarks show it does better than or equal to GPT-4o!

They released 3 checkpoints - the base, the instruct and a reward aligned model.

See https://huggingface.co/collections/nvidia/nemotron-4-340b-66... for all the checkpoints

vosper1y ago

Why does nvidia release models that compete with its customers businesses but don’t make any money for nvidia?

Are they commodotising their complements?

vineyardmike1y ago

> [commoditizing] their complements

That's exactly what this would be.

> compete with its customers businesses

Jlagreen1y ago

Nvidia offers AI Enterprise suite with NeMo, NIMS and many other services and consultancy to enterprise customers. These customers than can either use any AI models or Nvidia models.

Nvidia has no intention to earn money on models but to offer foundation models and extending their SW products which require their HW platform.

logicchains1y ago

They target this model at generating synthetic data. Data is the lifeblood of LLM training; quality synthetic data means more training can occur which means more demand for GPUs.

WithinReason1y ago

The model is big enough that you need expensive Nvidia GPUs to run it effectively

j / k navigate · click thread line to collapse