undefined | Better HN

0 pointsavazhi3mo ago0 comments

Qwen's MoE models are god awful when they are only running 2B parameters or whatever they downscale to while active. It isn't a 400B model if there's only several orders of magnitude less parameters active when you're actually inferencing...

0 comments

No comments yet.