Cerebras-GPT: Open Compute-Optimal Language Models Trained on Cerebras Cluster (opens in new tab)

(arxiv.org)

97 pointscs-fan-1013y ago12 comments

12 comments

11 comments · 5 top-level

cs-fan-101OP3y ago· 4 in thread

Recently, we announced in this post (https://news.ycombinator.com/item?id=35343763#35345980) the release of Cerebras-GPT — a family of open-source GPT models trained on the Pile dataset using the Chinchilla formula. Today, we are excited to announce the availability of the Cerebras-GPT research paper on arXiv.

krasin3y ago

Thank you for open sourcing these models!

I mentioned that the sizes of the models are relatively small (13B max). Is it an inherent limitation, or training a bigger model is possible, just has not been done in this exercise?

runnerup3y ago

Someone else can answer this better than I, so I'll probably end up deleting this in an hour or two. But I think the purpose of this research was not to create an excellent GPT model. I believe it was to explore the scaling effects on Cerebras hardware and determine a helpful framework for compute-optimal training regimes so that customers who might use Cerebras hardware can be confident that:

1) Standard AI/ML scaling assumptions still apply on this hardware.

2) They have a starting point for hyper-parameter estimation and can get better results sooner.

1 more reply

bee_rider3y ago

> Maximal Update Parameterization (μP)

The use of μ (mu) as a sort of… pun acronym thing is pretty clever, nice one.

groodt3y ago

Thanks for publishing this. I quickly skimmed the paper, I saw the impressive linear scaling as you scaled to 16 nodes. How long did it take to train the various models in wall clock time?

ramshanker3y ago· 1 in thread

If they can release ChatGPT level competitive open source models, this will be there biggest proof-of-concept backed marketing. After all, their business is selling hardware to a variety of business and institutions.

bordercases3y ago

This would be an interesting step in the industry as it would couple AI reg to hardware sales and black market law enforcement.

rvz3y ago· 1 in thread

You all do realize that the O̶p̶e̶n̶AI.com founders, Sam Altman, Greg Brockman, et al have hedged and invested in Cerebras?

I can only see Cerebras being an acquisition target, if they continue releasing their AI models out there. The value in Cerebras is their AI accelerator hardware and O̶p̶e̶n̶AI.com certainly needs that, since that is where the money is.

visarga3y ago

"Open"AI doesn't have time to muck around with hardware design, every month counts.

ersiees3y ago

Very interesting that someone finally tries out muP in the real world. Do I understand the usage correctly:

MuP is only used to get around choosing an lr for each size? Here I wonder how it compares to standard heuristics like the one in the OG scaling laws paper by OAI and tricks like back winding a few steps after loss explosion.

For some reason muP was not trusted with the largest trainings? Why is that?

vorticalbox3y ago

You can download these models on huggingface[0]

[0] https://huggingface.co/cerebras

j / k navigate · click thread line to collapse