undefined | Better HN

0 pointsslekker1y ago0 comments

What's MoE?

0 comments

3 comments · 2 top-level

Havoc1y ago· 1 in thread

Mixture of experts like other guy said - everything gets loaded into mem but not every byte is needed to generate a token (unlike classic LLMs like gemma).

So for devices that have lots of mem but weaker processing power it can get you similar output quality but faster. So tends to do better on CPU and APU like setups

trebligdivad1y ago

I'm not even sure they're loading everything into memory for MoE; maybe they can get away with only the relevant experts being paged in.

zamalek1y ago

Mixture of Experts. Very broadly speaking, there are a bunch of mini networks (experts) which can be independently activated.

j / k navigate · click thread line to collapse