I built this library because langchain was too bloated and I needed a simple abstraction to call multiple LLM APIs. litellm has two functions - completion(), embedding()
You can use tenacity for retries and wouldn't you want to cache the request / response around the endpoint instead of the gpt call -> that's what we ended up doing.
Streaming output and function-calling support is interesting
Could you elaborate on this — “wouldn’t want to cache around the endpoint instead of GPT call”. Just want to see if I’m missing an important consideration here
azure uses the openai python sdk, just remaps certain components. The models are also user-named. This makes it hard to detect if a model passed in is an azure model