undefined | Better HN

0 pointslhl2y ago0 comments

I put together a list of OpenAI API compatibility layers for local LLMs recently: https://llm-tracker.info/books/llms/page/openai-api-compatib...

Some like c0sogi/llama-api are pretty neat because they support concurrency, and supports multiple backends (llama.cpp and Exllama, although it could be expanded).

While you might lose out on some low-level configurability, being able to easily swap between OpenAI and local models is a big win in my book.

0 comments

2 comments · 2 top-level

d4rkp4ttern2y ago

Ooba is my favorite. It automatically converts chats to single prompts using the model-specific (finicky) formatting specs (e.g [INST] [/INST] etc for llama2) so that you can directly submit chat dialogs to the endpoint. This is a subtle point not obvious to many and I wrote about it here —

https://langroid.github.io/langroid/blog/2023/09/19/language...

mgreg2y ago

Very cool and thanks for sharing.

To me a killer feature would be easily running different models simultaneously such as one for embeddings and another for completion (e.g. Chat). This likely can be done already by specifying the model parameter in Ollama (and others) but I've not explored it much yet.

j / k navigate · click thread line to collapse