Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
avazhi
1mo ago
0 comments
Share
Qwen's MoE models are god awful when they are only running 2B parameters or whatever they downscale to while active. It isn't a 400B model if there's only several orders of magnitude less parameters active when you're actually inferencing...
0 comments
No comments yet.