undefined | Better HN

0 pointslostmsu1mo ago0 comments

Qwen recommends to preserve_thinking: true for agentic/coding workloads.

0 comments

3 comments · 1 top-level

rayboy19951mo ago· 2 in thread

Thanks!! I had disabled that previously while debugging, I can confirm this is helping accuracy from what I can tell so far. (And speed since the cache is preserved more often!)

satvikpendem1mo ago

Use the MTP models which 2x token generation speed, for example: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

rayboy19951mo ago

Very interesting I'll have to check this out thank you. This is why I love HN.

j / k navigate · click thread line to collapse