RamaLama (opens in new tab)

(github.com)

163 pointsjohlo1y ago49 comments

49 comments

This is the point of it:

https://github.com/ggerganov/llama.cpp/pull/11016#issuecomme...

I’m one of the maintainers of Ollama.

It’s amazing to see others build on top of open-source projects. Forks like RamaLama are exactly what open source is all about. Developers with different design philosophies can still collaborate in the open for everyone’s benefit.

Some folks on the Ollama team have contributed directly to the OCI spec, so naturally we started with tools we know best. But we made a conscious decision to deviate because AI models are massive in size - on the order of gigabytes - and we needed performance optimizations that the existing approaches didn’t offer.

We have not forked llama.cpp, We are a project written in Go, so naturally we’ve made our own server side serving in server.go. Now, we are beginning to hit performance, reliability and model support problems. This is why we have begun transition to Ollama’s new engine that will utilize multiple engine designs. Ollama is now naturally responsible for the portability between different engines.

I did see the complaint about Ollama not using Jinja templates. Ollama is written in Go. I’m listening but it seems to me that it makes perfect sense to support Go templates.

We are only a couple of people, and building in the open. If this sounds like vendor lock-in, I'm not sure what vendor lock-in is?

You can check the source code: https://github.com/ollama/ollama

justinmayer1y ago

These comments would carry more merit if they weren’t coming from the very person who closed this pull request: https://github.com/jmorganca/ollama/pull/395

Those rejected README changes only served to provide greater transparency to would-be users, and here we are a year and a half later with woefully inadequate movement on that front.

I am very glad folks are working on alternatives.

1 more reply

zozbot2341y ago

These comments seem reasonable to me. Could you clarify the Ollama maintainers' POV wrt. the recent discussion of Ollama Vulkan support at https://news.ycombinator.com/item?id=42886680 ? Many people seem to be upset that this PR seems to have gotten zero acknowledgment from the Ollama folks, even with so many users being quite interested in it for obvious reasons. (To be clear, I'm not sure that the PR is in a mergeable state as-is, so I would disagree with many of those comments. But this is just my personal POV - and with no statement on the matter from the Ollama maintainers, users will be confused.)

EDIT: I'm seeing a newly added comment in the Vulkan PR GitHub thread, at https://github.com/ollama/ollama/pull/5059#issuecomment-2628... . Quite overdue, but welcome nonetheless!

creesch1y ago

Since you are one of the maintainers of Ollama, maybe you can help me answer a related question. It is great that the software itself is open source, but hosting the models must cost a fortune. I know this is funded by VC money, yet nowhere on the Ollama website or repository there is any mention of this. Why is that?

There isn't an about section, a tiny snippet in a FAQ somewhere, nothing.

1 more reply

mohsen11y ago

I see! Now I understand why I need to create those useless `Modelfile` files...

I'm glad there is a more open source alternative to Ollama now.

zozbot2341y ago

I don't get it. The 'Modelfile' files are used to save and restore chat history as well, set custom system prompts and lots of other stuff that would require custom coding with most other local AI frameworks. Llama.cpp certainly doesn't offer anything like that out of the box. Those sorts of complaints seem pointless to me.

1 more reply

threecheese1y ago

Interested to know why one is “more open source” than the other.

ericyd1y ago

I wish this were on the readme. Or if it already is, I wish it were significantly higher up.

buyucu1y ago

Thanks for this context, I will give RamaLlama a try!

mckirk1y ago

This looks great!

While we're at it, is there already some kind of standardized local storage location/scheme for LLM models? If not, this project could potentially be a great place to set an example that others can follow, if they want. I've been playing with different runtimes (Ollama, vLLM) the last days, and I really would have appreciated better interoperability in terms of shared model storage, instead of everybody defaulting to downloading everything all over again.

ggerganov1y ago

The llama.cpp tools and examples download the models by default to a OS-specific cache folder [0]. We try to follow the HF standard (as discussed in the linked thread), though the layout of the llama.cpp cache is not the same atm. Not sure about the plans for RamaLama, but it might be something worth to consider.

[0] https://github.com/ggerganov/llama.cpp/issues/7252

sitkack1y ago

I think it would be the most important thing to consider, because the biggest thing that predecessor to RamaLama provided was a way to download a model (and run it).

If there was a contract about how models were laid out on disk, then downloading, managing and tracking model weights could be handled by a different tool or subsystem.

ecurtin1y ago

In RamaLama an OCI container-like store is used (at least from the UX perspective it feels like that) for all models in RamaLama, it's protocol agnostic supports oci artefacts, huggingface, ollama, etc.

svilen_dobrev1y ago

i just started to play with ollama and ramalama.. on linux. The models are quite some gigabytes.. not pretty to keep N copies..

ollama stores things under ~/.ollama/models/blobs/ named sha256-whatevershaisit

ramalama stores things under ~/.local/share/ramalama/repos/ollama/blobs/ named sha256:whatevershaisit

Note the ":" in ramalama names instead of the "-" .. that may not fly under windows.

if one crosslinks ramalama things over to ollama with that slight rename, ollama will remove them as they are not pulled via itself - no metadata on them.

i guess vllm etc everybody-else has yet-another schema and/or metadata.

btw Currently, arch-linux-wise, there is llm-manager (pointing to https://github.com/xyproto/llm-manager ), but it's made dependent on some of ollama packages, and can't be installed just by itself (without overforcing).

pzo1y ago

To make it AI really boring all those projects need to be more approachable to non-tech savvy people, e.g. some minimal GUI for searching, listing, deleting, installing ai models. I wish e.g. this or ollama could work more as invisible AI models dependency manager. Right now every app that want to have STT like whisper will bundle such model inside. User waste more memory storage and have to wait to download big models. We had similar problems with and static libraries and then moved to dynamic linking libraries.

I wish your app could add some model as dependency and on install would download only if such model is not avialable locally. Also would check if ollama is installed and only bootstrap if also doesn't exist on drive. Maybe with some nice interface for user to confirm download and nice onboarding.

rhatdan1y ago

One of my primary goals of RamaLama was to allow users to move AI Models into containers, so they can be stored in OCI Registries. I believe there is going to be a proliferation of "private" models, and eventually "private" RAG data. (Working heavily in RAG support in RamaLama now.

Once you have private models and RAG, I believe you will want to run these models and data on edge devices in in Kubernetes clusters. Getting the AI Models and data into OCI content. Would allow us to take advantage of content signing, trust, mirroring. And make running the AI in production easier.

Also allowing users to block access to outside "untrusted" AI Models stored in the internet. Allow companies to only use "trusted" AI.

Since Companies already have OCI registries, it makes sense to store your AI Models and content in the same location.

rhatdan1y ago

Bottom line we want to take advantage of the infrastructure created by Podman, Docker and Kubernetes.

jerrygenser1y ago

122 points 2 hours ago yet this is currently #38 and not on the front page.

Strange. At the same time I see numerous items that are on the front page posted 2 hours or older with fewer points.

I'm willing to take a reputation hit on this meta post. I wonder why this got demoted so quickly from front page despite people clearly voting on it. I wonder if it has anything to do with being backed by YC.

I sincerely hope it's just my miss understanding of hn algorithm though

mchiang1y ago

Can confirm it doesn't. Many Ollama posts get pushed off the front page too despite having hundreds of points. Over time I understood. If they did this for YC companies, it would ruin the trust of HN, YC, and probably the most important to YC companies, the reputation of the startup itself.

zozbot2341y ago

I assume this is what happens when many HN users just flag every AI- and LLM-related post out of sheer frustration with the reality distortion field around this particular topic.

guerrilla1y ago

> Running in containers eliminates the need for users to configure the host system for AI.

When is that a problem?

Based on the linked issue in eigenvalue's comment[1], this seems like a very good thing. It sounds like ollama is up to no good and this is a good drop-in replacement. What is the deeper problem being solved here though, about configuring the host? I've not run into any such issue.

1. https://news.ycombinator.com/item?id=42888129

sitkack1y ago

So you have never hit the issue so no one else has?

guerrilla1y ago

... orrrrr I have never hit the issue, so that's why I'm asking.

Calm down. It's Friday, time to relax, my friend. ;)

1 more reply

2mlWQbCK1y ago

What benefit does Ollama (or RamaLama) offer over just plain llama.cpp or llamafile? The only thing I understand is that there is automatic downloading of models behind the scenes, but a big reason for me to want to use local models at all is that I want to to know exactly what files I use and keep them sorted and backed up properly, so a tool automatically downloading models and dumping in some cache directory just sounds annoying.

rahimnathwani1y ago

IIRC it makes things a little easier, e.g. you don't need to specify a ClI flag to set how many layers to offload to GPU, and it provides an API that other programs on your system can use (e.g. openwebui).

It's been a while since I used llama.cpp directly, and I don't know whether I'm correct about its current scope.

ecurtin1y ago

RamaLama stands on the shoulders of giants by building upon llama.cpp (and other projects like minja, podman, vllm, etc.), we've been contributing back also Sergio Lopez, Michael Engel and I are contributing back to llama.cpp (just three examples of RamaLama people off the top of my head)

We write the higher level abstractions in python3 (with no dependancies on python libs outside of the standard library) because it's the heavy-lifting that needs to be done in C++. Python is a nice community friendly language also, many people know how to write it.

baron-bourbon1y ago

Does this provide a Ollama compatible API endpoint? I've got at least one other project running that only supports Ollama's API or OpenAI's hosted solution (ie. the API endpoint isn't configurable to use llama.cpp and friends)

balloob1y ago

We need to stop chasing compatible API endpoints and work towards an AI standard. I wrote about it here https://news.ycombinator.com/item?id=42887610

baron-bourbon1y ago

I agree with what you wrote. The whole situation reminds me of the old "Standards" XKCD, to an extent. In the short term something like LiteLLM, which I just discovered doing more research on the whole topic, can at least hide some of the underlying complexity.

That being said, considering what you've done with Open Home and Home Assistant (which has run my home for years, thank you!), perhaps there is some hope of an open standard in the near future.

glitchc1y ago

Great, finally an alternative to ollama's convenience.

jniles1y ago

It sounds like this project isn't addressing the user convenience aspect of ollama, but rather the developer convenience.

Hopefully both will be easy for users to play around with, but RamaLama should be easier to get your PR merged as a developer and swap out different registries. Vendor lock-in is rarely a good thing in the world of open source.

threecheese1y ago

That was kinda funny :)

Y_Y1y ago

So it's a replacement for Ollama?

The killer features of Ollama for me right now are the nice library of quantized models and the ability to automatically start and stop serving models in response to incoming requests and timeouts. The first send to be solved by reusing the Ollama models, but I can't see if the service is possible from my cursory look.

maxamillion1y ago

ramalama can just pull (almost) any arbitrary model off huggingface and run it ... you're not limited to just what ollama has repackaged into their non-standard format

mchiang1y ago

Ollama has the ability to pull models off Hugging Face as well:

https://huggingface.co/docs/hub/en/ollama

ecurtin1y ago

I am doing a short talk on this tomorrow at FOSDEM:

https://fosdem.org/2025/schedule/event/fosdem-2025-4486-rama...

wsintra20221y ago

I’m using openwebui, can this replace ollama in my setup?

n144q1y ago

It seems that all instructions are based on Mac/Linux? Can someone confirm this works smoothly on Windows?

ecurtin1y ago

The windows solution is WSL2

esafak1y ago

Is this useful? Can someone help me see the value add here?

quantumwoke1y ago

It was mentioned in the other thread on the front page [1]

[1] https://news.ycombinator.com/item?id=42886680

BubbleRings1y ago

Well, if you aren’t that great with Docker but you want to try out a variety of LLMs under Docker, how much would this help you? How much trouble is it to enable an LLM to reach outside of a container to make use of your GPU? How much does this tool help with that?

philipswood1y ago

I think the post arose from:

https://news.ycombinator.com/item?id=42886768

maxamillion1y ago

ramalama can just pull (almost) any arbitrary model off huggingface and run it ... you're not limited to just what ollama has repackaged into their non-standard format

j / k navigate · click thread line to collapse

49 comments

eigenvalue1y ago

This is the point of it:

https://github.com/ggerganov/llama.cpp/pull/11016#issuecomme...

mchiang1y ago

I’m one of the maintainers of Ollama.

I did see the complaint about Ollama not using Jinja templates. Ollama is written in Go. I’m listening but it seems to me that it makes perfect sense to support Go templates.

We are only a couple of people, and building in the open. If this sounds like vendor lock-in, I'm not sure what vendor lock-in is?

You can check the source code: https://github.com/ollama/ollama

justinmayer1y ago

These comments would carry more merit if they weren’t coming from the very person who closed this pull request: https://github.com/jmorganca/ollama/pull/395

Those rejected README changes only served to provide greater transparency to would-be users, and here we are a year and a half later with woefully inadequate movement on that front.

I am very glad folks are working on alternatives.

1 more reply

zozbot2341y ago

EDIT: I'm seeing a newly added comment in the Vulkan PR GitHub thread, at https://github.com/ollama/ollama/pull/5059#issuecomment-2628... . Quite overdue, but welcome nonetheless!

creesch1y ago

There isn't an about section, a tiny snippet in a FAQ somewhere, nothing.

1 more reply

mohsen11y ago

I see! Now I understand why I need to create those useless `Modelfile` files...

I'm glad there is a more open source alternative to Ollama now.

zozbot2341y ago

1 more reply

threecheese1y ago

Interested to know why one is “more open source” than the other.

ericyd1y ago

I wish this were on the readme. Or if it already is, I wish it were significantly higher up.

buyucu1y ago

Thanks for this context, I will give RamaLlama a try!

mckirk1y ago

This looks great!

ggerganov1y ago

[0] https://github.com/ggerganov/llama.cpp/issues/7252

sitkack1y ago

I think it would be the most important thing to consider, because the biggest thing that predecessor to RamaLama provided was a way to download a model (and run it).

If there was a contract about how models were laid out on disk, then downloading, managing and tracking model weights could be handled by a different tool or subsystem.

ecurtin1y ago

svilen_dobrev1y ago

i just started to play with ollama and ramalama.. on linux. The models are quite some gigabytes.. not pretty to keep N copies..

ollama stores things under ~/.ollama/models/blobs/ named sha256-whatevershaisit

ramalama stores things under ~/.local/share/ramalama/repos/ollama/blobs/ named sha256:whatevershaisit

Note the ":" in ramalama names instead of the "-" .. that may not fly under windows.

if one crosslinks ramalama things over to ollama with that slight rename, ollama will remove them as they are not pulled via itself - no metadata on them.

i guess vllm etc everybody-else has yet-another schema and/or metadata.

pzo1y ago

rhatdan1y ago

Also allowing users to block access to outside "untrusted" AI Models stored in the internet. Allow companies to only use "trusted" AI.

Since Companies already have OCI registries, it makes sense to store your AI Models and content in the same location.

rhatdan1y ago

Bottom line we want to take advantage of the infrastructure created by Podman, Docker and Kubernetes.

jerrygenser1y ago

122 points 2 hours ago yet this is currently #38 and not on the front page.

Strange. At the same time I see numerous items that are on the front page posted 2 hours or older with fewer points.

I sincerely hope it's just my miss understanding of hn algorithm though

mchiang1y ago

zozbot2341y ago

I assume this is what happens when many HN users just flag every AI- and LLM-related post out of sheer frustration with the reality distortion field around this particular topic.

guerrilla1y ago

> Running in containers eliminates the need for users to configure the host system for AI.

When is that a problem?

1. https://news.ycombinator.com/item?id=42888129

sitkack1y ago

So you have never hit the issue so no one else has?

guerrilla1y ago

... orrrrr I have never hit the issue, so that's why I'm asking.

Calm down. It's Friday, time to relax, my friend. ;)

1 more reply

2mlWQbCK1y ago

rahimnathwani1y ago

It's been a while since I used llama.cpp directly, and I don't know whether I'm correct about its current scope.

ecurtin1y ago

baron-bourbon1y ago

balloob1y ago

We need to stop chasing compatible API endpoints and work towards an AI standard. I wrote about it here https://news.ycombinator.com/item?id=42887610

baron-bourbon1y ago

That being said, considering what you've done with Open Home and Home Assistant (which has run my home for years, thank you!), perhaps there is some hope of an open standard in the near future.

glitchc1y ago

Great, finally an alternative to ollama's convenience.

jniles1y ago

It sounds like this project isn't addressing the user convenience aspect of ollama, but rather the developer convenience.

threecheese1y ago

That was kinda funny :)

Y_Y1y ago

So it's a replacement for Ollama?

maxamillion1y ago

ramalama can just pull (almost) any arbitrary model off huggingface and run it ... you're not limited to just what ollama has repackaged into their non-standard format

mchiang1y ago

Ollama has the ability to pull models off Hugging Face as well:

https://huggingface.co/docs/hub/en/ollama

ecurtin1y ago

I am doing a short talk on this tomorrow at FOSDEM:

https://fosdem.org/2025/schedule/event/fosdem-2025-4486-rama...

wsintra20221y ago

I’m using openwebui, can this replace ollama in my setup?

n144q1y ago

It seems that all instructions are based on Mac/Linux? Can someone confirm this works smoothly on Windows?

ecurtin1y ago

The windows solution is WSL2

esafak1y ago

Is this useful? Can someone help me see the value add here?

quantumwoke1y ago

It was mentioned in the other thread on the front page [1]

[1] https://news.ycombinator.com/item?id=42886680

BubbleRings1y ago

philipswood1y ago

I think the post arose from:

https://news.ycombinator.com/item?id=42886768

maxamillion1y ago

ramalama can just pull (almost) any arbitrary model off huggingface and run it ... you're not limited to just what ollama has repackaged into their non-standard format

j / k navigate · click thread line to collapse