I feel like everyone is missing this from the announcement. They explicitly are releasing this to help generate synthetic training data. Most big models and APIs have clauses that ban its use to improve other models. Sure it maybe can compete with other big commercial models at normal tasks, but this would be a huge opportunity for ML labs and startups to expand training data of smaller models.
Nvidia must see a limit to the growth of new models (and new demand for training with their GPUs) based on the availability of training data, so they're seeking to provide a tool to bypass those restrictions.
All for the low price of 2x A100s...
I will never get over the gall of anything and everything being deemed fair game to use as training data for a model, except you're not allowed to use the output of a model to train your own model without permission, because model output has some kind of exclusive super-copyright apparently.
Well, its not copyright that is being used to forbid this, its terms of service, but yea, it is quite a hypocrisy.
Synthetic training data is basically free money for NVidia; there's only a fixed amount of high-quality original data around, but there's a potential for essentially infinite synthetic data, and more data means more training hours means more GPU demand.
A 340B model should require around 700GB vram or ram to run inference. To train or finetune, you're looking at almost double, which is probably why Nvidia recommends 2xA100 nodes with 1.28TB vram.
Jensen Huang is the king of AI summer.
Funnily enough, I don't think it's actually the most interesting model that Nvidia released this week. Nvidia also published this paper https://arxiv.org/abs/2406.07887 and released https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-128k (Apache 2.0 licensed, to boot). It looks like it matches (and sometimes even edges out) Transformer performance, while having linear scaling for context length. Can't wait for a scaled up version of this.
Nvidia also released a top-notch Llama3 70B SteerLM reward model as well (although RLHFlow/ArmoRM-Llama3-8B-v0.1 might still be a better choice).
I thought you could only use the vram on the GPU, so for 700GB you would need 8-9 A100 nodes as 2 only gives 160GB.
I've been trying to figure out how to build a local system to run inference and train on top of LLM models, I thought there was no way to add vram to a system outside of adding more and more GPU's or use system ram (DDR5) even though that would be considerably slower.
One example: HP DL580 Gen8. Use the 32GB PC3L-14900L LRDIMMs (HP PN 715275-001; 712384-001, 708643-B21) for a maximum of 3TB. You can get the LRDIMMs in the $32-$45 range on the second-hand market.
> AI Ethics. NVIDIA is committed to safety, trust and transparency in AI development. NVIDIA encourages You to (a) ensure that the product or service You develop, use, offer as a service or distributes meets the legal and ethical requirements of the relevant industry or use case, (b) take reasonable measures to address unintended bias and to mitigate harm to others, including underrepresented or vulnerable groups, and (c) inform users of the nature and limitations of the product or service. NVIDIA expressly prohibits the use of its products or services for any purpose in violation of applicable law or regulation, including but not limited to (a) illegal surveillance, (b) illegal collection or processing of biometric information without the consent of the subject where required under applicable law, or (c) illegal harassment, abuse, threatening or bullying of individuals or groups of individuals or intentionally misleading or deceiving others
https://developer.download.nvidia.com/licenses/nvidia-open-m...
Besides limiting the freedom of use (making it less "open" in my eyes), it's interesting that they tell you to meet "ethical requirements of the relevant industry or use case". Seems like that'd be super hard to pin down in a precise way.
> 2.1 ... If You institute ... litigation against any entity ... alleging that the Model or a Derivative Model constitutes direct or contributory copyright or patent infringement, then any licenses granted to You under this Agreement for that Model or Derivative Model will terminate...
If you sue or file a copyright claim that the model violates copyright, you lose your license to use the model. That's a really weird restriction, I'm not sure what the point is.
Apache 2.0 has a similar restriction: “ If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.”
Which, in terms of a contract, means absolutely nothing at all.
* intended bias
* legal surveillance
* legal collection of biometrics without consent
* legal harrassment
Ie, state sanctioned killbots are just fine!Or is this in one of the chat arenas or whatever? Very curious to see some numbers related to the performance.
But if it's at least somewhat better than the existing open source models then that is a big boost for open source training and other use cases.
"...Nemotron-4-340B-Base was trained using 768 DGX H100 nodes"
That is 350 million dollars for you...Poor Startups, better have a rich sponsor.
Isn't "training LLMs on LLM output" the very definition of "model collapse" or "model poisoning"?
OK I see the goal is to sell more H100s, they made it big enough so it's not compatible with a cheaper GPU
It should be the biggest open weights to date I think (Grok 314b).
It's trained on 8 trillion tokens, and some benchmarks show it does better than or equal to GPT-4o!
They released 3 checkpoints - the base, the instruct and a reward aligned model.
See https://huggingface.co/collections/nvidia/nemotron-4-340b-66... for all the checkpoints
Are they commodotising their complements?
That's exactly what this would be.
> compete with its customers businesses
I suspect most of their business comes from a few massive corporate spenders, not a "long tail" of smaller businesses, so it seems like a questionable goal to disrupt those customers without a clear path to new customers. Then again, few have the resources to run this model, so I guess this just ensures that their big customers are all working with some floor in model size? Probably won't impact anything realistically.
Nvidia has no intention to earn money on models but to offer foundation models and extending their SW products which require their HW platform.
Basically, just like CUDA costs you nothing, it costs you nothing to use Nvidia models. And since you're on it you might want to use Nvidia HW for better performance and then you might want security and get interested in Nvidia SW enterprise.