I had issues with Qwen thinking endlessly when I didn’t know I wasn’t using the temp/top_k/min_p/etc settings specified in the readme. I’ve never had an issue with Gemma 4 thinking endlessly but could possibly be the same.
I use Ollama and kinda just assumed that Ollama would have everything except for context length (which I've explicitly overwritten) setup properly for me.