Relying on hosted inference with LLMs, such as via OpenAI API, in production has some challenges. The use of APIs should be designed around unstable latency, rate limits, token counts, costs, etc. To make it observable we've built tracing and monitoring specifically for AI apps. For example, the OpenAI Python library is monitored automatically, no need to do anything. We'll be adding support for more libraries. If you'd like to give it try, see
https://github.com/graphsignal/graphsignal or the docs.