Meta MTIA v2 – Meta Training and Inference Accelerator (opens in new tab)

(ai.meta.com)

189 points_yo2u1y ago60 comments

60 comments

I like the interactive 3D widget showing off the chip. Yep, that sure is a metal rectangle.

Really annoys me that the loading animation of these before-/after-images doesn't finish on firefox and that it won't let me drag the knob with the separator. ...no "Under the hood" for me.

a_wild_dandan1y ago

Dragging the top-left corner works, for some reason. Really bizarre UI issue.

TulliusCicero1y ago

Exactly what I was thinking. Like showing off a model of a blank DVD.

huevosabio1y ago

jajajaj I thought the same! I thought maybe someone with hardware experience can make a sense of this?

modeless1y ago

Intel Gaudi 3 has more interconnect bandwidth than this has memory bandwidth. By a lot. I guess they can't be fairly compared without knowing the TCO for each. I know in the past Google's TPU per-chip specs lagged Nvidia but the much lower TCO made them a slam dunk for Google's inference workloads. But this seems pretty far behind the state of the art. No FP8 either.

leetharris1y ago

They are different architectures optimized for different things.

From the Meta post: "This chip’s architecture is fundamentally focused on providing the right balance of compute, memory bandwidth, and memory capacity for serving ranking and recommendation models."

Optimizing for ranking/recommendation models is very different from general purpose training/inference.

janalsncm1y ago

Translation: you don’t need to serve 96 layer transformers for ranking and recommendation. You’re probably using a neural net with around 10-20 million parameters. But it needs to be fast and highly parallelizable, and perhaps perform well in lower precisions like f16. And it would be great to have a very large vector LUT on the same chip.

1 more reply

modeless1y ago

Yeah, it may fit their current workload perfectly, but it doesn't seem very future proof with the limited bandwidth. Given how fast ML is evolving these days I question if it makes sense to design and deploy a chip like this. I guess they do have a very large workload that will benefit immediately.

2 more replies

chabons1y ago

> Intel Gaudi 3 has more interconnect bandwidth than this has memory bandwidth.

LPDDR5 vs HBMe2. I'm guessing there's a 2-5x price difference between those, but even so it's an interesting choice, I don't know any other accelerators which spec DDR. But yeah, without exact TCO numbers it's hard to compare exactly.

namibj1y ago

Bandwidth is far more power hungry for DDR, but capacity is far cheaper.

If the bandwidth capability of DDR suffices, HBM isn't worth it.

At least with LPDDR's; GDDRs may well not be worth it under data center TCO considerations due to the high interface power usage. Feel free to correct me if I'm mistaken, the numbers in question aren't too easy to search for so I didn't confirm this (LPDDR vs. GDDR) part.

chessgecko1y ago

Also its at 90 watts vs 900 watts for gaudi 3, the flops/mem bw per watt is much more comparable.

modeless1y ago

With high end chips like that it's often possible to get dramatically better efficiency by running it at less than peak power consumption, like 90% performance at 50% power or something like that. It's hard to compare the numbers in a fair way.

moffkalast1y ago

It would be interesting if this could be made into a reasonably priced (lmao) card for home inference if they intend to mass produce it.

Can't imagine any other reason other than cost as to why they went with LPDDR5, LPDDR5X has more bandwidth and GDDR6 has even more.

1 more reply

cma1y ago

Only 48MB of SRAM on Gaudi 3 per die (96 MB across both) vs 256MB here maybe increases the memory bandwidth needs for Gaudi. Way different power consumption too.

mlsu1y ago

Certainly an interesting looking chip. It looks like it's for recommendation workloads. Are those workloads very specific, or is there a possibility to run more general inference (image, language, etc) on this accelerator?

And, they mention a compiler in PyTorch, is that open sourced? I really liked the Google Coral chips -- they are perfect little chips for running image recognition and bounding box tasks. But since the compiler is closed source it's impossible to extend them for anything else beyond what Google had in mind for them when they came out in 2018, and they are completely tied to Tensorflow, with a very risky software support story going forward (it's a google product after all).

Is it the same story for this chip?

chessgecko1y ago

I thought MTIA v2 would use the mx formats https://arxiv.org/pdf/2302.08007.pdf, guess they were too far along in the process to get it in this time.

Still this looks like it would make for an amazing prosumer home ai setup. Could probably fit 12 accelerators on a wall outlet with change for a cpu, would have enough memory to serve a 2T model at 4bit and reasonable dense performance for small training runs and image stuff. Potentially not costing too much to make either without having to pay for cowos or hbm.

I'd definitely buy one if they ever decided to sell it and could keep the price under like $800/accelerator.

buildbot1y ago

I suppose it might, there are not a lot of details (what kind of sparsity for example?) about what they mean in terms of INT8 support - it could be MXINT8, or something else.

Glad someone was thinking the same thing I was though!

chessgecko1y ago

its gotta be that 2/4 sparsity that everyone has, but I haven't seen used anywhere right? If they put it in though they must be using it, but I'm not sure for what. And without details I think its a good bet that int8 is the standard int8.

Wishful thinking maybe they'll announce selling it with the giant llama3 cause there's no good, cheap way to inference something like that at home at the moment and this could change that.

teaearlgraycold1y ago

Still seems pretty primitive. Very cool though.

I can only imagine the lack of fear Jensen experiences when reading this.

airstrike1y ago

It would be foolish to underestimate the long term capabilities of a sufficiently funded and driven competitor

moffkalast1y ago

adjusts black leather jacket "Look at what they need to mimic a fraction of our power."

prng20211y ago

3x performance but >3x TDP. Am I missing something or is that unimpressive?

jrgd1y ago

I find it weird that not everyone agree Meta and Facebook and social networks in general are doing some good the the society and our democracies; yet they manage to spend incredible amount of money/energy/time to develop solutions to problems we aren't exactly sure are worth solving…

pptr1y ago

What is worth solving in your opinion? Should they not make their service more efficient?

I assume this helps reduce their server and electricity costs. At a certain scale these things pay off.

ixaxaar1y ago

If all this turns out to be useless, burning their cash for nothing seems like a great way to accelerate tech while going down. I guess that would actually be a positive thing.

duchenne1y ago

Is it possible to buy it?

ein0p1y ago

Come on, Zuck, undermine Google Cloud and take NVIDIA down a few pegs by offering this for purchase in good quantities.

sroussey1y ago

Pretty large increase in performance over v1, particularly in sparse workloads.

Low power 25W

Could use higher bandwidth memory if their workloads were more than recommendation engines.

tasty_freeze1y ago

First gen was 25W. The new one is 90W.

sroussey1y ago

Ah, thanks for the correction.

Still relatively low compared to GPUs.

throwaway484761y ago

It's interesting that they are not separating training and inference.

noiseinvacuum1y ago

This is specifically designed for inference for recommendations models. It’s not for LLM training or inference.

xnx1y ago

My mind still boggles that a BBS+ads company would think it needs to design its own chips.

libria1y ago

Or that an online bookseller would try to rent out compute.

falcor841y ago

"Depending on how you want to think about it, it was funny or inevitable or symbolic that the robotic takeover did not start at MIT, NASA, Microsoft or Ford. It started at a Burger-G restaurant ..."

https://marshallbrain.com/manna1

searchableguy1y ago

https://www.ycombinator.com/companies/ofone/jobs/u2E2fCX-fou...

I saw this YC startup ad right after I finished reading this.

pksebben1y ago

dangit, I've got things I should be doing. Posting interesting stories during business hours continues grumbling incoherently

rsynnott1y ago

Well, the first commercial computer was created by a company whose primary business was running cafes... https://en.wikipedia.org/wiki/LEO_(computer)

hackerlight1y ago

You're thinking like a startup founder where you should only focus on innovating your main product. FB is a mature company where some vertical integration can make sense.

okdood641y ago

They literally print money; smart move for them to make this investment imo.

bevekspldnw1y ago

Pretty fascinating they mention applications for ad serving but not Metaverse.

I feel like Zuck figured out he’s just running an ads network, the world is a long way anway from some VR fever dream, and to focus on milking each DAU for as many clicks as possible.

ec1096851y ago

It’s not a gpu, and these chips aren’t able to generate images fast enough at inference time to be usable in VR context.

photonbeam1y ago

Hes always known what pays the bills

bevekspldnw1y ago

I dunno, he burned a lot of cash on metaverse and wasn’t focused on FB. All the top talent was moved over to Metaverse and FB was treated as career killer. My impression is ads work is once again a good career play. People chase promo.

j / k navigate · click thread line to collapse

60 comments

jsheard1y ago

I like the interactive 3D widget showing off the chip. Yep, that sure is a metal rectangle.

whilenot-dev1y ago

Really annoys me that the loading animation of these before-/after-images doesn't finish on firefox and that it won't let me drag the knob with the separator. ...no "Under the hood" for me.

a_wild_dandan1y ago

Dragging the top-left corner works, for some reason. Really bizarre UI issue.

TulliusCicero1y ago

Exactly what I was thinking. Like showing off a model of a blank DVD.

huevosabio1y ago

jajajaj I thought the same! I thought maybe someone with hardware experience can make a sense of this?

modeless1y ago

leetharris1y ago

They are different architectures optimized for different things.

Optimizing for ranking/recommendation models is very different from general purpose training/inference.

janalsncm1y ago

1 more reply

modeless1y ago

2 more replies

chabons1y ago

> Intel Gaudi 3 has more interconnect bandwidth than this has memory bandwidth.

namibj1y ago

Bandwidth is far more power hungry for DDR, but capacity is far cheaper.

If the bandwidth capability of DDR suffices, HBM isn't worth it.

chessgecko1y ago

Also its at 90 watts vs 900 watts for gaudi 3, the flops/mem bw per watt is much more comparable.

modeless1y ago

moffkalast1y ago

It would be interesting if this could be made into a reasonably priced (lmao) card for home inference if they intend to mass produce it.

Can't imagine any other reason other than cost as to why they went with LPDDR5, LPDDR5X has more bandwidth and GDDR6 has even more.

1 more reply

cma1y ago

Only 48MB of SRAM on Gaudi 3 per die (96 MB across both) vs 256MB here maybe increases the memory bandwidth needs for Gaudi. Way different power consumption too.

mlsu1y ago

Is it the same story for this chip?

chessgecko1y ago

I thought MTIA v2 would use the mx formats https://arxiv.org/pdf/2302.08007.pdf, guess they were too far along in the process to get it in this time.

I'd definitely buy one if they ever decided to sell it and could keep the price under like $800/accelerator.

buildbot1y ago

I suppose it might, there are not a lot of details (what kind of sparsity for example?) about what they mean in terms of INT8 support - it could be MXINT8, or something else.

Glad someone was thinking the same thing I was though!

chessgecko1y ago

Wishful thinking maybe they'll announce selling it with the giant llama3 cause there's no good, cheap way to inference something like that at home at the moment and this could change that.

teaearlgraycold1y ago

Still seems pretty primitive. Very cool though.

I can only imagine the lack of fear Jensen experiences when reading this.

airstrike1y ago

It would be foolish to underestimate the long term capabilities of a sufficiently funded and driven competitor

moffkalast1y ago

adjusts black leather jacket "Look at what they need to mimic a fraction of our power."

prng20211y ago

3x performance but >3x TDP. Am I missing something or is that unimpressive?

jrgd1y ago

pptr1y ago

What is worth solving in your opinion? Should they not make their service more efficient?

I assume this helps reduce their server and electricity costs. At a certain scale these things pay off.

ixaxaar1y ago

If all this turns out to be useless, burning their cash for nothing seems like a great way to accelerate tech while going down. I guess that would actually be a positive thing.

duchenne1y ago

Is it possible to buy it?

ein0p1y ago

Come on, Zuck, undermine Google Cloud and take NVIDIA down a few pegs by offering this for purchase in good quantities.

sroussey1y ago

Pretty large increase in performance over v1, particularly in sparse workloads.

Low power 25W

Could use higher bandwidth memory if their workloads were more than recommendation engines.

tasty_freeze1y ago

First gen was 25W. The new one is 90W.

sroussey1y ago

Ah, thanks for the correction.

Still relatively low compared to GPUs.

throwaway484761y ago

It's interesting that they are not separating training and inference.

noiseinvacuum1y ago

This is specifically designed for inference for recommendations models. It’s not for LLM training or inference.

xnx1y ago

My mind still boggles that a BBS+ads company would think it needs to design its own chips.

libria1y ago

Or that an online bookseller would try to rent out compute.

falcor841y ago

"Depending on how you want to think about it, it was funny or inevitable or symbolic that the robotic takeover did not start at MIT, NASA, Microsoft or Ford. It started at a Burger-G restaurant ..."

https://marshallbrain.com/manna1

searchableguy1y ago

https://www.ycombinator.com/companies/ofone/jobs/u2E2fCX-fou...

I saw this YC startup ad right after I finished reading this.

pksebben1y ago

dangit, I've got things I should be doing. Posting interesting stories during business hours continues grumbling incoherently

rsynnott1y ago

Well, the first commercial computer was created by a company whose primary business was running cafes... https://en.wikipedia.org/wiki/LEO_(computer)

hackerlight1y ago

You're thinking like a startup founder where you should only focus on innovating your main product. FB is a mature company where some vertical integration can make sense.

okdood641y ago

They literally print money; smart move for them to make this investment imo.

bevekspldnw1y ago

Pretty fascinating they mention applications for ad serving but not Metaverse.

I feel like Zuck figured out he’s just running an ads network, the world is a long way anway from some VR fever dream, and to focus on milking each DAU for as many clicks as possible.

ec1096851y ago

It’s not a gpu, and these chips aren’t able to generate images fast enough at inference time to be usable in VR context.

photonbeam1y ago

Hes always known what pays the bills

bevekspldnw1y ago

j / k navigate · click thread line to collapse