I'm grateful for anyone's contributions to anything, but I kinda shake my head about ollama. the reason stuff like this happens is they're doing the absolute minimal job necessary, to get the latest model
running, not working.
I make a llama.cpp wrapper myself, and it's somewhat frustrating putting effort in for everything from big obvious UX things, like error'ing when the context is too small for your input instead of just making you think the model is crap, to long-haul engineering commitments, like integrating new models with llama.cpp's new tool calling infra, and testing them to make sure it, well, actually works.
I keep telling myself that this sort of effort pays off a year or two down the road, once all that differentiation in effort day-to-day adds up. I hope :/