Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
avazhi
3mo ago
0 comments
Save
Share
Qwen's MoE models are god awful when they are only running 2B parameters or whatever they downscale to while active. It isn't a 400B model if there's only several orders of magnitude less parameters active when you're actually inferencing...
0 comments
No comments yet.