It would be so much more convenient to collapse threads you're not interested in and read the rest of the discussion. Is there any reason HN doesn't support this?
Ollama is enjoying a LOT of hype, but I'm struggling to find a real world production use case for it. Where might I really want to use this? It's a wrapper around llama.cpp and makes it easier to download LLMs. Where might I want to download models in production like this? In production I would rather deploy thoroughly tested models. Plus, the model offloading capability on the fly is really not meant for production as it hinders performance. Thoughts?