Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
xienze
1mo ago
0 comments
Save
Share
I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.
0 comments
5 comments · 3 top-level
top
newest
oldest
fgfarben
1mo ago
· 1 in thread
That prefill number isn't right. M4 Max hits 200-300:
https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_...
hadlock
1mo ago
M5 studio is gonna sell like hot cakes
aiscoming
1mo ago
· 1 in thread
if it's just the coding agent system prompt and tools, you can cache that
xienze
OP
1mo ago
Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.
1 more reply
throwdbaaway
1mo ago
Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.
j
/
k
navigate · click thread line to collapse