Mixture of experts like other guy said - everything gets loaded into mem but not every byte is needed to generate a token (unlike classic LLMs like gemma).
So for devices that have lots of mem but weaker processing power it can get you similar output quality but faster. So tends to do better on CPU and APU like setups