They don't usually think they are gonna slowly over the next 5 years tape out a chip and build a solar powered datacenter with the company they raised $5M for...all part of the tiny corp master plan. I'll write it up better as it comes together better.
We have 417 preorders for tinyboxes btw. https://tinygrad.org
If that's not factored into his 3450 wafer estimate, it could be double that.
When he mentions the prices of the wafers, does it come with the bad yields or without?
Mostly compute has piggy-backed off consumer-scale production (e.g., GPU's repurposed for crypto).
The suggestion is that an AI model can justify few-shot chip production.
His proposal is for development, i.e., to build the model, and depends mostly on such models being qualitatively better.
It seems more likely that chips would be built to offer model processing, instead of forcing users into a service (with its risk of confidentiality and IP leaks). To get GPT-100, you'd incorporate the chip into your device -- and then know for sure that nothing could leak. That eliminates the primary transaction cost for AI compute: the risk.
Which presents the question: does anyone know of research or companies working on such chip models?
As I keep mentioning on HN.
AI is real results + hype as opposed to crypto of only hype.
A computer of this size will be immensely useful for training Vision models very quickly, as well as other kinds of models.
Why only AI? Any branch of Science (all of them) that benefit from fast, parallel compute, will benefit from this.
It is extremely narrow view to view AI only as LLMs.
When I say "real use" I mean solving real world problems that existed before AI. And I don't even count generative AI of any shape or form.
It doesn't have anything to do with crypto, science, or Real World Problems.
No joke. That’s his plan.
There's two cooling problems, pulling heat off the wafer (water cooled loop or immersion cooling, this is easy) and dumping the heat somewhere. The second part is why the place being cool matters, I imagine a big radiator on the roof. Could also dump the heat into a river if it's chill.
An ideal machine designed to train GPT4 in a day is likely very different to the ideal machine to train 50 GPT4s as once over a few weeks, which is very different from the ideal machine to train a model 100x bigger than GPT4 (perhaps the most interesting).
Also really hoping he makes more progress on amd ml
NCAR has a supercomputing center there: https://en.m.wikipedia.org/wiki/NCAR-Wyoming_Supercomputing_...