Fully agree - even as a founder of an ‘LLM observability company’. Observability does not need to be reinvented to get detailed traces/metrics/logs of the LLM part of an application.
LLM Observability usually means: prompts and completions, which model was used, errors and exceptions (rate limits, network errors), as well as metrics (latency, output speed, time to first token when streaming, USD/token and cost breakdowns). All of this is well suited to be captured in the existing observability stack. OpenLLMetry makes this really easy and interoperable - chapeau.
In my view, observability is not the core value that solutions like Baserun, Athina, LangSmith, Parea, Arize, Langfuse (my project) and many others solve for.
Developing a useful LLM application requires iterative workflows and tinkering. That's what these solutions help with and augment.
There are specific problems to building an LLM application such as managing/versioning of prompts, running evaluations, blending multiple different evaluation sources, collecting datasets to test/benchmark an application, helping with fine-tuning models on high-quality production completions, debugging root causes of quality/latency/cost issues, ...
Most solutions either replicate logs (LLM I/O) or traces at first, as they are a necessary starting point to then build solutions for the other workflow problems. As the observability piece gets more standardized over time, I can see how integrating with the standard makes a ton of sense. Always happy to chat about this.