undefined | Better HN

0 pointsa_wild_dandan2y ago0 comments

Do you know how Deepseek 33b compares to 6.7b? I'm trying 33b on my (96GB) MacBook just because I have plenty of spare (V)RAM. But I'll run the smaller model if the benefits are marginal in other peoples' experience.

0 comments

5 comments · 2 top-level

wokwokwok2y ago· 3 in thread

The smaller model is great at trivial day-to-day tasks.

However, when you ask hard things, it struggles; you can ask the same question 10 times, and only get 1 answer that actually answers the question.

...but the larger model is a lot slower.

Generally, if you don't want to mess around swapping models, stick with the bigger one. It's better.

However, if you are heavily using it, you'll find the speed is a pain in the ass, and when you want a trivial hint like 'how do I do a map statement in kotlin again?', you really don't need it.

What I have setup personally is a little thumbs-up / thumbs-down on the suggestions via a custom intellij plugin; if I 'thumbs-down' a result, it generates a new solution for it.

If I 'thumbs-down' it twice, it swaps to the larger model to generate a solution for it.

This kind of 'use ok model for most things and step up to larger model when you start asking hard stuff' approach scales very nicely for my personal workflow... but, I admit that setting it up was a pain, and I'm forever pissing around with the plugin code to fix tiny bugs, which I would prefer to be spending doing actual work.

So... there's not really much tooling out there at the moment to support it, but the best solution really is to use both.

If you don't want to and just want 'use the best model for everything', stick with the bigger one.

The larger model is more capable of turning 'here is a description of what I want' into 'here is code that does it that actually compiles'.

The smaller model is much better at 'I want a code fragment that does X' -> 'rephrased stack overflow answer'.

tarruda2y ago

> but the larger model is a lot slower.

I found the performance to be very acceptable for 33b 4 bit on a m3 max with 36gb ram (much faster than reading speed)

wokwokwok2y ago

I’m not sure what to say; responsive fast output is ideal, and the larger model is distinctly slower for me, particularly for long completions (2k tokens) if you’re using a restricted grammar like json output.

I’m using an M2 not an M3 though; maybe it’s better for you.

I was under the impression quantised results were generally slower too, but I’ve never dug into it (or particularly noticed a difference between q4/q5/q6).

If you find it fast enough to use then go for it~

bravura2y ago

Do you mind sharing your plugin as a gist?

How do you run both models in memory? Two separate processes?

jyap2y ago

You would want to test it out manually day to day. That’s always the best. Some models can out score but not actually be “better” when you use it.

But there is also the benchmarking: https://github.com/deepseek-ai/deepseek-coder

33B Instruct doesn’t beat 6.7B Instruct by much but maybe those % improvements mean more for your usage.

I run 6.7B since I have 16GB RAM.

Quantization of the model also makes a difference.

j / k navigate · click thread line to collapse