Show HN: Route your prompts to the best LLM (opens in new tab)

(unify.ai)

298 pointsdanlenton2y ago126 comments

Hey HN, we've just finished building a dynamic router for LLMs, which takes each prompt and sends it to the most appropriate model and provider. We'd love to know what you think!

Here is a quick(ish) screen-recroding explaining how it works: https://youtu.be/ZpY6SIkBosE

Best results when training a custom router on your own prompt data: https://youtu.be/9JYqNbIEac0

The router balances user preferences for quality, speed and cost. The end result is higher quality and faster LLM responses at lower cost.

The quality for each candidate LLM is predicted ahead of time using a neural scoring function, which is a BERT-like architecture conditioned on the prompt and a latent representation of the LLM being scored. The different LLMs are queried across the batch dimension, with the neural scoring architecture taking a single latent representation of the LLM as input per forward pass. This makes the scoring function very modular to query for different LLM combinations. It is trained in a supervised manner on several open LLM datasets, using GPT4 as a judge. The cost and speed data is taken from our live benchmarks, updated every few hours across all continents. The final "loss function" is a linear combination of quality, cost, inter-token-latency and time-to-first-token, with the user effectively scaling the weighting factors of this linear combination.

Smaller LLMs are often good enough for simple prompts, but knowing exactly how and when they might break is difficult. Simple perturbations of the phrasing can cause smaller LLMs to fail catastrophically, making them hard to rely on. For example, Gemma-7B converts numbers to strings and returns the "largest" string when asking for the "largest" number in a set, but works fine when asking for the "highest" or "maximum".

The router is able to learn these quirky distributions, and ensure that the smaller, cheaper and faster LLMs are only used when there is high confidence that they will get the answer correct.

Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.

The router can be used off-the-shelf, or it can be trained directly on your own data for improved performance.

What do people think? Could this be useful?

Feedback of all kinds is welcome!

Show HN: Route your prompts to the best LLM

(unify.ai)

298 pointsdanlenton2y ago126 comments

Hey HN, we've just finished building a dynamic router for LLMs, which takes each prompt and sends it to the most appropriate model and provider. We'd love to know what you think!

Here is a quick(ish) screen-recroding explaining how it works: https://youtu.be/ZpY6SIkBosE

Best results when training a custom router on your own prompt data: https://youtu.be/9JYqNbIEac0

The router balances user preferences for quality, speed and cost. The end result is higher quality and faster LLM responses at lower cost.

The router is able to learn these quirky distributions, and ensure that the smaller, cheaper and faster LLMs are only used when there is high confidence that they will get the answer correct.

Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.

The router can be used off-the-shelf, or it can be trained directly on your own data for improved performance.

What do people think? Could this be useful?

Feedback of all kinds is welcome!

126 comments

101 comments · 38 top-level

jefftk2y ago· 8 in thread

> Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.

What's your plan for making money? Are you planning to eventually take a margin? Negotiate discounts with your backend providers? Mine the data flowing through your system?

cosmojg2y ago

The data flowing through LLM routers is a hot commodity right now. OpenRouter, for example, even provides a flat-rate 1% discount across the board if you agree to let them use your API calls for model training, and rumor has it that they're already profitable. To be fair, they do seem to be collaborating with model providers on some level, so they are likely getting discounted access on top of selling data.

ec1096852y ago

It’s surprising how these app developers are okay with this much data being shown: https://openrouter.ai/models/mistralai/mixtral-8x7b-instruct...

kromem2y ago

Man, this space would get so much more interesting so quickly if base model providers had a revenue share system in place for routed requests...

runako2y ago

This would quickly erode confidence in the routers themselves...

2 more replies

namdnay2y ago

So they end up in the same situation as hotels or airlines, beholden to the middlemen? They’ll never allow that :)