Problem:
- We want to solve the problem of balancing cost and accuracy between models like GPT-3.5 and 4, and also using the best models for specific tasks, like Claude for safety, creative writing, fine tuned models for domain-specific tasks, etc.
Key Features:
- Maximize response quality while optimizing for costs and latency
- Concurrently generate and compare responses across different closed and open-source models
- Automatically sample and evaluate responses, improving routing performance over time
You can use it with the OpenAI SDK or with LangChain by just changing the api base and api key to point to Neutrino and the model name to your own router ID
Would welcome any and all feedback!
No comments yet.