undefined | Better HN

0 pointssmnscu1mo ago0 comments

Nice post! You piqued my curiosity, so after a bit of research it turns out that, with techniques like MTP/MLA/CSA, it's quite probable that these models are much more efficient (and maybe bigger? tho 400B sounds about right) than a simple RAM breakdown would suggest.

MTP - https://blog.google/innovation-and-ai/technology/developers-...

MLA - https://machinelearningmastery.com/a-gentle-introduction-to-...

CSA - https://deepseek.ai/blog/deepseek-v4-compressed-attention

0 comments

1 comments · 1 top-level

Doxon1mo ago

These techniques are used by DeepSeek, and work well with the commodity (NVIDIA) GPU's they use. Google designs their entire AI stack from the custom silicon up. So they have different optimization approaches. (Though Gemma does use MTP)

j / k navigate · click thread line to collapse