It links to a Tom's Hardware article (https://www.tomshardware.com/news/teslas-dollar300-million-a...) from August 28 that says "Tesla is about to flip the switch on its new AI cluster, featuring 10,000 Nvidia H100 compute GPUs") and says "Tesla is set to launch its highly-anticipated supercomputer on Monday..." (presumably the September 1 event).
So, like, does Tesla actually have 10k H100s? Or do they have an order for 10k H100s? Or an intention to buy 10k H100s?
Is the sole source for these articles this (https://twitter.com/SawyerMerritt/status/1696011140508045660) random Twitter post by some guy who runs an online clothing company?
I don't mean to snipe, but this article doesn't seem to rise to the extremely high editorial standards of such tech-press luminaries as "TechRadar" and "Hacker News".
If you would’ve just scrolled just a little bit on that Twitter post that you linked. You would’ve seen these:
https://x.com/sawyermerritt/status/1696012091964915744
https://x.com/tim_zaman/status/1695488119729238147
Also, just FYI. Sawyer posts most of the Tesla and SpaceX breaking news on Twitter before major outlets even write their articles.
For example, here’s one just 12mins ago as confirmed by Elon: https://x.com/sawyermerritt/status/1728092021628313777
A “random Twitter post by some guy who runs an online clothing company” is definitely a wrong assumption.
I don't see those when I scroll. I see
"Buckle up everyone, the acceleration of progress is about to get nutty!"
and this is the end of the post?
Maybe I'm misusing this thing?
> https://x.com/tim_zaman/status/1695488119729238147
So another guy who claims to be a Tesla employee says (again, strangely future tense) that this is true? I mean, I am willing to believe--'cause he paid $20 for a blue check--that he probably is a Tesla employee.
But the use of future tense is a bit weird, right? And the lack of any followup?
> A “random Twitter post by some guy who runs an online clothing company” is definitely a wrong assumption.
I guess I'm old. Back in my day, "evidence" wasn't some random dude's online posts. But I know things have changed. ;)
==
More seriously:
https://www.hpcwire.com/2023/08/17/nvidia-h100-are-550000-gp... says Nvidia is producing 550k H100s in 2023. And there's obviously a significant lead-time requirement.
So, yes, I can sorta imagine Tesla pre-ordered 2% of global supply of H100s early in 2023 and was bragging about it at the end of August just 'cause.
But I can also imagine this is smoke and mirrors, and they have, like, a handful with the rest on backorder, and we haven't heard more about it 'cause Tesla doesn't have marketing people, it just has wahoos who post things on Twitter.
Either way, I guess?
I've never worked inside one of the leading edge AI companies like OpenAI, Google, Microsoft or Meta.
Is this comparable to what they would work with?
My first guess is that it seems much smaller. And if you are running many parallel training jobs then you are getting about 1,000 chips at most to work with.
Or is this about what the leading competitors are working with?
Azure, for one, seems to have orders of magnitude more chips at their disposal.
That said, these GPUs aren't just the GPUs. They are whole chassis. They are huge onboard storage arrays, TB's of RAM, 800G networking (and associated cables), racks, cooling, power distribution, backup power, etc...
None of it is easy.
I don’t know much about AV processing, that’s highly customized to only a few customers but I’d expect it to also have very large computational requirements to do video processing and reinforcement learning.
I can imagine they either underestimated the software effort needed to squeeze as much performance as possible out of those things, or they underestimated the pace at which Nvidia scales FLOPS/$, or both.
What would you say grants you the standing to opine here?
Previus article: https://www.tomshardware.com/news/teslas-dollar300-million-a...
This is second-hand blogspam.
I was curious why this statement lead with fp64 flops (instead of fp32, perhaps), but I looked up the H100 specs, and NV’s marketing page does the same thing. They’re obviously talking about the H100 SXM here, which has the same peak theoretical fp64 throughput as fp32. The cluster perf is estimated by multiplying the GPU perf by 10k.
Also, obviously, int8 tensor ops aren’t ‘FLOPS’. I think Nvidia calls them “TOPS” (tensor ops). There is a separate metric for ‘tensor flops’ or TF32.
Also, back in the day, integer ops were just called 'ops', grumble grumble. But yeah FLOPS specifically refers to floating point. Calling them TOPS doesn't make sense to me, since tensor cores were meant for matrix operation speedup, and these matrices are rarely integer.
> knowing you can drop to high precision when necessary without penalty is nice.
I guess I maybe don’t know why you’d ever have 1:1 fp32 and fp64 perf. Aren’t the fp64 multipliers (for example) basically 4x fp32 multipliers? I am under the possibly naive impression that if you have all the transistors for 1 fp64 core, that you’d end up with all the transistors you need for 2 or 4 fp32 cores. Maybe that’s not true today, but there does have to be at least 2x the transistors overall for 64-bit vs 32-bit, and lots of those should be shared or reusable, no? It doesn’t seem quite right to frame naturally higher 32-bit op throughput as a “penalty” on 64-bit ops. You’re asking the hardware to do more with 64, and it makes complete sense that given the exact same budget for bandwidth, energy, memory, compute, etc. that 32-bit ops would go faster, no? If the op throughput of fp64 and fp32 is the same, doesn’t that possibly imply that the fp32 ops are potentially being wasted / penalized, just for the sake of having matching numbers?
I see mention of using this supercomputer for training models. Is that the only purpose? What other types of things do orgs usually do with these supercomputers?
Are there any good boots-on-the-ground technical blogs that provide interesting detail on day-to-day experiences with these things?
In other words, they're used when you want to share some kind of state across all of the computers, without the potential overhead of communicating to some other system like a database.
Physics simulations and like, molecular modeling come to mind as common examples.
In the case of ML training, model parameters and broadcasting the deltas that get calculated during training are that shared state.
The cost of finding the next prime is likely into the millions now.
Weird to think that his next company's compute platform is this.
That seems like a bold claim. Google, Microsoft and Meta make so much more money than Telsa that if making AI chips was so easy, then they could clearly out design and build Tesla without thinking too hard about it.
What makes you think that Telsa, a company with far less AI workers and knowledge, and far less money than the above companies can out design and out build them?
Wow, that's some really early hardware access. /s