Some like c0sogi/llama-api are pretty neat because they support concurrency, and supports multiple backends (llama.cpp and Exllama, although it could be expanded).
While you might lose out on some low-level configurability, being able to easily swap between OpenAI and local models is a big win in my book.