undefined | Better HN

0 pointsttoinou11d ago0 comments

I tried Step 3.7 Flash on my mac 128GB and it seemed very dumb. antirez ds4 flash is much better !

0 comments

3 comments · 1 top-level

girvo11d ago· 2 in thread

It isn’t though, I’ve run both through a bunch of coding evals. You nearly certainly didn’t have the right sampling parameters or quantised the KV cache?

Ds4 is impressive for what it is, but it loops and over thinks even more, burning massive wall clock time to not even get great outcomes. It’s also limited to a slow speed on my Spark

ttoinouOP11d ago

I tried a bunch of stuff with step 3.5 and step 3.7 maybe not as much as you. Could you tell me what parameters and launched you’re using ? Antirez ds4 flash q2-q4 works almost out of the box for me

girvo11d ago

To be fair: if you're happy with ds4 then IMO stick with it!

Step 3.7 is notably better than 3.5

1. Use the official StepFun GGUF, IQ4_XS - theirs is better tuned in my experience than the other quants

2. Temp 1.0 top_p 0.95 sampling parameters for reasoning/agentic coding

3. It's really quite important that you don't quantise the KV cache: it made a surprising amount of difference to the looping and over thinking I found, at least for the quantised version of the model. I'm using the full F16 for K, and Q8 for V

4. Note that it now supports `reasoning_effort: low|medium|high` in your chat_template_kwargs; this is super useful :)

1 more reply

j / k navigate · click thread line to collapse