To be fair: if you're happy with ds4 then IMO stick with it!
Step 3.7 is notably better than 3.5
1. Use the official StepFun GGUF, IQ4_XS - theirs is better tuned in my experience than the other quants
2. Temp 1.0 top_p 0.95 sampling parameters for reasoning/agentic coding
3. It's really quite important that you don't quantise the KV cache: it made a surprising amount of difference to the looping and over thinking I found, at least for the quantised version of the model. I'm using the full F16 for K, and Q8 for V
4. Note that it now supports `reasoning_effort: low|medium|high` in your chat_template_kwargs; this is super useful :)