Are you finding specific pros/cons for some of the ones that try to be a platform. As an example, we've found LangSmith's integration with LangChain super useful, even though LangChain itself has its pros and its cons.
I'm taking a DIY approach to RAG/function calling for a work tool. We're looking for data sovereignty, so we're probably going to self-host. To that end, I'm using Ollama to serve some models. If you want to do DIY I would highly recommend using NexusRaven for your function calling model.
No promises but I'm hopeful we can opensource our work eventually.
+1, Did post the exact query in Nexusraven discord asking for an example or quick start with ollama yesterday. Before that, tried to hack their NexusRaven pip client which uses TGI inference endpoint and non-langchain.py from their evaluation repo which uses TGI pipeline. Both failed.
In my testing it seems good at function calling including nested ones even when compared to GPT4 , since OpenAI function definition does not allow to specify return value name and its type . With ollama it’s quantized and can run on laptop GPU. While there are other ones like Functionary and fireworks.ai function calling on hugging face , they are not quantized so could not test them.
I used LangChain and models hosted on Ollama for my latest project [1]. Since I have a GPU now and Ollama is now available for Windows I can build LLM based applications quickly with local debugging.