undefined | Better HN

0 pointsxienze1mo ago0 comments

I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.

0 pointsxienze1mo ago0 comments

I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.

0 comments

5 comments · 3 top-level

fgfarben1mo ago· 1 in thread

That prefill number isn't right. M4 Max hits 200-300: https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_...

hadlock1mo ago

M5 studio is gonna sell like hot cakes

aiscoming1mo ago· 1 in thread

if it's just the coding agent system prompt and tools, you can cache that

xienzeOP1mo ago

Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.

1 more reply

throwdbaaway1mo ago

Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.

j / k navigate · click thread line to collapse