https://github.com/ggerganov/llama.cpp/pull/11016#issuecomme...
It’s amazing to see others build on top of open-source projects. Forks like RamaLama are exactly what open source is all about. Developers with different design philosophies can still collaborate in the open for everyone’s benefit.
Some folks on the Ollama team have contributed directly to the OCI spec, so naturally we started with tools we know best. But we made a conscious decision to deviate because AI models are massive in size - on the order of gigabytes - and we needed performance optimizations that the existing approaches didn’t offer.
We have not forked llama.cpp, We are a project written in Go, so naturally we’ve made our own server side serving in server.go. Now, we are beginning to hit performance, reliability and model support problems. This is why we have begun transition to Ollama’s new engine that will utilize multiple engine designs. Ollama is now naturally responsible for the portability between different engines.
I did see the complaint about Ollama not using Jinja templates. Ollama is written in Go. I’m listening but it seems to me that it makes perfect sense to support Go templates.
We are only a couple of people, and building in the open. If this sounds like vendor lock-in, I'm not sure what vendor lock-in is?
You can check the source code: https://github.com/ollama/ollama
Those rejected README changes only served to provide greater transparency to would-be users, and here we are a year and a half later with woefully inadequate movement on that front.
I am very glad folks are working on alternatives.
EDIT: I'm seeing a newly added comment in the Vulkan PR GitHub thread, at https://github.com/ollama/ollama/pull/5059#issuecomment-2628... . Quite overdue, but welcome nonetheless!
There isn't an about section, a tiny snippet in a FAQ somewhere, nothing.
I'm glad there is a more open source alternative to Ollama now.
While we're at it, is there already some kind of standardized local storage location/scheme for LLM models? If not, this project could potentially be a great place to set an example that others can follow, if they want. I've been playing with different runtimes (Ollama, vLLM) the last days, and I really would have appreciated better interoperability in terms of shared model storage, instead of everybody defaulting to downloading everything all over again.
If there was a contract about how models were laid out on disk, then downloading, managing and tracking model weights could be handled by a different tool or subsystem.
ollama stores things under ~/.ollama/models/blobs/ named sha256-whatevershaisit
ramalama stores things under ~/.local/share/ramalama/repos/ollama/blobs/ named sha256:whatevershaisit
Note the ":" in ramalama names instead of the "-" .. that may not fly under windows.
if one crosslinks ramalama things over to ollama with that slight rename, ollama will remove them as they are not pulled via itself - no metadata on them.
i guess vllm etc everybody-else has yet-another schema and/or metadata.
btw Currently, arch-linux-wise, there is llm-manager (pointing to https://github.com/xyproto/llm-manager ), but it's made dependent on some of ollama packages, and can't be installed just by itself (without overforcing).
I wish your app could add some model as dependency and on install would download only if such model is not avialable locally. Also would check if ollama is installed and only bootstrap if also doesn't exist on drive. Maybe with some nice interface for user to confirm download and nice onboarding.
Once you have private models and RAG, I believe you will want to run these models and data on edge devices in in Kubernetes clusters. Getting the AI Models and data into OCI content. Would allow us to take advantage of content signing, trust, mirroring. And make running the AI in production easier.
Also allowing users to block access to outside "untrusted" AI Models stored in the internet. Allow companies to only use "trusted" AI.
Since Companies already have OCI registries, it makes sense to store your AI Models and content in the same location.
Strange. At the same time I see numerous items that are on the front page posted 2 hours or older with fewer points.
I'm willing to take a reputation hit on this meta post. I wonder why this got demoted so quickly from front page despite people clearly voting on it. I wonder if it has anything to do with being backed by YC.
I sincerely hope it's just my miss understanding of hn algorithm though
When is that a problem?
Based on the linked issue in eigenvalue's comment[1], this seems like a very good thing. It sounds like ollama is up to no good and this is a good drop-in replacement. What is the deeper problem being solved here though, about configuring the host? I've not run into any such issue.
It's been a while since I used llama.cpp directly, and I don't know whether I'm correct about its current scope.
We write the higher level abstractions in python3 (with no dependancies on python libs outside of the standard library) because it's the heavy-lifting that needs to be done in C++. Python is a nice community friendly language also, many people know how to write it.
That being said, considering what you've done with Open Home and Home Assistant (which has run my home for years, thank you!), perhaps there is some hope of an open standard in the near future.
Hopefully both will be easy for users to play around with, but RamaLama should be easier to get your PR merged as a developer and swap out different registries. Vendor lock-in is rarely a good thing in the world of open source.
The killer features of Ollama for me right now are the nice library of quantized models and the ability to automatically start and stop serving models in response to incoming requests and timeouts. The first send to be solved by reusing the Ollama models, but I can't see if the service is possible from my cursory look.
https://fosdem.org/2025/schedule/event/fosdem-2025-4486-rama...