We’re been building Composo - a platform that helps teams achieve high performance, guarantee accuracy & minimise the cost of LLM applications.
Problem we’re solving:
LLM applications are non-deterministic, so evaluating whether results are good or bad is highly subjective and often requires domain expertise. Iterating over 1000s of combinations of prompts, models, temperatures, RAG settings (& many other elements) is therefore very manual & time consuming.
How we are solving it:
Composo links directly to your application (in a simple to set up, but highly powerful way) which enables it to function like a remote control for your application. Once set up, anyone on your team (inc. non technical domain experts or PMs), can use Composo to easily test out your application with different models, prompts, temperatures & RAG settings (or any other variable in your codebase you decide to make available at initial set up). Crucially, this is simple enough to be used by anyone, but powerful enough for any application (e.g. real apps built in code using agents etc).
This testing can be done in both our playground & our evaluation suite:
1) Playground: Here you can ‘chat’ with your application in a UI similar to the openAI playground, but with inputs being runs on your actual application rather than a simple LLM call & with the ability to change any variables you like directly within the Composo UI (e.g. system message, temperature, model, RAG settings).
2) Evaluation suite: Here you can conduct rigorous testing & evaluation on your application either ad-hoc while in development, or repeated over time to check for performance regression. Our test suite contains automated evaluation tools including: evaluation in comparison to ground truth answers (with exact match, vector similarity & LLM graded similarity), with specific criteria (e.g. code validity, JSON validity, specific keyword inclusion or exclusion) & AI grading (this uses the Composo AI critic which leverages the latest research in LLM auto-evaluation under the hood).
The easiest things to get started with, without having to link an application or even sign up, are:
1) Play with different models in our playground by chatting directly or using our demo apps (e.g. an AI doctor)
2) Automate your prompt writing & optimisation with our AI prompt writer
Thanks so much, and would be hugely grateful for any feedback!