Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
lostmsu
1mo ago
0 comments
Save
Share
Qwen recommends to preserve_thinking: true for agentic/coding workloads.
0 comments
3 comments · 1 top-level
top
newest
oldest
rayboy1995
1mo ago
· 2 in thread
Thanks!! I had disabled that previously while debugging, I can confirm this is helping accuracy from what I can tell so far. (And speed since the cache is preserved more often!)
satvikpendem
1mo ago
Use the MTP models which 2x token generation speed, for example:
https://unsloth.ai/docs/models/qwen3.6#mtp-guide
rayboy1995
1mo ago
Very interesting I'll have to check this out thank you. This is why I love HN.
j
/
k
navigate · click thread line to collapse