undefined | Better HN

0 pointsantirez1y ago0 comments

The model performs very poorly in practice, while in the benchmark it is shown to be DeepSeek V3 level. It's not terrible but it's at another level compared to the models it is very close to (a bit better / a bit worse) in the benchmarks.

0 comments

anon3738391y ago

I’d recommend trying it on Google AI Studio (aistudio.google.com). I am getting exceptional results on a handful of novel problems that require deep domain knowledge and structured reasoning. I’m not able to replicate this performance with Ollama, so I suspect something is a bit off.

tarruda1y ago

Same experience here: On AI Studio, this is easily one of the strongest models I have used, including when compared to proprietary LLMs.

But ollama and openwebui performance is very bad, even when running the FP16 version. I also tried to mirror some of AI studio settings (temp 1 and top p 0.95) but couldn't get it to produce anything useful.

I suspect there's some bug in the ollama releases (possibly wrong conversation delimiters?). If this is fixed, I will definitely start using Gemma 3 27b as my main model.

anon3738391y ago

Update: Unsloth is recommending a temperature of 0.1, not 1.0, if using Ollama. I don’t know why Ollama would require a 10x lower value, but it definitely helped. I also read some speculation that there might be an issue with the tokenizer.

pzo1y ago

Maybe model is sensitive to quantization, by default ollama quantize it significantly.

tarruda1y ago

I tried ollama fp16 and it had the same issues.

alekandreev1y ago

Hey, Gemma engineer here. Can you please share reports on the type of prompts and the implementation you used?

antirezOP1y ago

Hello! I tried to show it Redis code yet not released (llama.cpp 4 bit quants and the official web interface) and V3 can reason about the design tradeoffs, but (very understandably) Gemma 3 can't. I also tried to make it write a simple tic tac toe Montecarlo program, and it didn't account for ties, while SOTA models consistently do.

tarruda1y ago

Can you share the all the recommended settings to run this LLM? It is clear that the performance is very good when running on AI studio. If possible, I'd like to use the all the same settings (temp, top-k, top-p, etc) on Ollama. AI studio only shows Temperature, top-p and output length.

sgt1011y ago

vibe testing, vibe model engineering...

kamranjon1y ago

I really respect the work that you've done, but I am always very surprised when people just speak anecdotally as though it is truth with regards to AI models these days. It's as if everyone believes they are an expert now, but have nothing of substance to provide but their gut feelings.

It's as if people don't realize that these models are used for many different purposes, and subjectively one person could think one model is amazing and another person think it's awful. I just would hope that we could at least back up statements like "The model performs very poorly in practice" with actual data or at least some explanation of how it performed poorly.

tarruda1y ago

In my experience, Gemma models were always bad at coding (but good at other tasks).

j / k navigate · click thread line to collapse

0 comments

anon3738391y ago

tarruda1y ago

Same experience here: On AI Studio, this is easily one of the strongest models I have used, including when compared to proprietary LLMs.

I suspect there's some bug in the ollama releases (possibly wrong conversation delimiters?). If this is fixed, I will definitely start using Gemma 3 27b as my main model.

anon3738391y ago

pzo1y ago

Maybe model is sensitive to quantization, by default ollama quantize it significantly.

tarruda1y ago

I tried ollama fp16 and it had the same issues.

alekandreev1y ago

Hey, Gemma engineer here. Can you please share reports on the type of prompts and the implementation you used?

antirezOP1y ago

tarruda1y ago

sgt1011y ago

vibe testing, vibe model engineering...

kamranjon1y ago

tarruda1y ago

In my experience, Gemma models were always bad at coding (but good at other tasks).

j / k navigate · click thread line to collapse