Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
zozbot234
19d ago
0 comments
Save
Share
Qwen 27B maxes out at a 16GB context. A nice thing about DeepSeek V4, especially Flash, is that its context size stays tiny even at 1M tokens! Which in turn opens up wide batching on common consumer platforms.
0 comments
3 comments · 1 top-level
top
newest
oldest
lostmsu
19d ago
· 2 in thread
DeepSeek V4 Flash is 160GB while Qwen 27B is about 27GB. You can't even run DS Flash on consumer platforms, let alone batch it.
zozbot234
OP
19d ago
These are the sizes of model weights, not the KV cache. The former are a sparse (for MoE models) read workload that can be streamed from SSD.
lostmsu
18d ago
You can't batch MoE
1 more reply
j
/
k
navigate · click thread line to collapse