Does Tesla need tens of thousands of these things?
I think in 25% of cases it will not get them significantly more performance vs Nvidia.
There is a 50% chance that they can outperform off the shelf chips by a significant amount to make it maybe worth it. (This is pretty likely because dedicated hardware tends to outperform general hardware).
However, there is maybe a 25% risk buying Nvidia doesn't get them there soon.
So building their own chips de-risks the worst case, and it's probably not that much more expensive (at Tesla scale). So seems like a pretty good bet to me.
For one, Google TPU. Another: Cerebras wafer scale AI. AMD MI100. Etc etc.
Even if they screwed the pooch with Nvidia, there are plenty of competitors in this space.
Now Tesla has to build its own software stack for large scale distributed learning, which might be harder than the chip design.
Is Tesla really the kind of company that wants to carry the expensive loadstone of training and inference software + hardware?
It's not like PyTorch is gonna run on this thing unless they create a fork. And a huge advantage of things like NVidia are NVlink / NVswitch. Both hardware, and software, that efficiently distributes data at 600GBps across your GPU clusters.
We couldn’t put 50B transistors on a square inch in the 1960s, though. We can now. https://en.wikipedia.org/wiki/Transistor_count lists several larger designs.
So, the engineering is impressive, but not spectacular.
Also, this being a grid of interconnected CPUs means the design is simpler than a single design filling the entire die would be. It’s ‘just’ repeating the same design over and over (possibly with some small variations near the edge)
Of course looking at it without knowledge of the state of the art it is astounding that we can even think of constructing machines with 50 billion working parts
Artificial intelligence (AI) has
seen a broad adoption over the past
couple of years.
And continues: At Tesla, who as many know is a
company that works on electric
and autonomous vehicles, AI has
a massive value to every aspect
of the company's work.
Who is writing like this? And why?What would Tom's Hardware lose if they left out this type of cheap fillwords?
Should I also start writing like this?
Is this type of "reader hostile writing" a new thing or have newspapers always written like this?
These are not rhetorical questions. I am honestly confused.
That's also just wrong. During the recent "Tesla AI Day", when asked during Q/A, Elon Musk specifically mentioned that they intentionally use machine learning only for very few cases:
Q: "Is Tesla using machine learning within its manufacturing, design
or any other engineering processes?"
Elon: "I discourage use of machine learning, because it's really
difficult. Unless you have to use machine learning, don't do it. It's
usually a red flag when somebody is saying 'We wanna use machine
learning to solve this task'. I'm like: That sounds like bullshit.
99.9% of the time you don't need it."
https://www.youtube.com/watch?v=j0z4FweCy4M&t=9307sDie Size is 645mm^2 on a 7nm. This is important because we know the reticle limit which is around ~800mm^2.
The Nvidia AI Chip has 54 billion transistors with a die size of 826 mm2 on 7nm.
I recently saw a Ted Talk, If Content is King, then Context is God. I think it capture everything that is wrong in today's society.
One basic thing I didn't see in the body was power consumption though, anyone know more details on that?
It says TSMC 7nm - is that DUV or EUVL?
[1]: https://en.wikichip.org/wiki/7_nm_lithography_process#TSMC
Honest question, how much is chip design a factor separate to fab process?
This chip is largely memory and multipliers, both of which are pretty dense.
Fab processes improve over time to have higher density and lower defect rate (which allows bigger chips while getting acceptable yield). So it's not surprising to see a chip on the same node but shipping a year or 2 later (than Ampere) having more transistors.
The "tiles of tiles" chip architecture seems like an Elon-obvious, let's just scale what we have approach. Do their neural networks map to that multiscale tiling well?
The WSE2 is much larger obviously, but I would also think it can result in a large performance boost given everything is on a single chip.