A couple of weeks ago, I launched a fully automated podcast, AI Arxiv that focuses exactly on this. I have built an automated pipeline that does the research for this podcast by sifting through various sources such as engineering blogs, research papers, youtube videos etc., extracts high signal content, categorizes it and feeds it to Google's NotebookLM for audio generation before getting published on public RSS feeds, Spotify and Apple podcast. I continue to be fascinated by how AI is being applied in novel ways beyond chatbots for solving unique use cases.
The podcast has already crossed 40 episodes, 50 subscribers and 500 listens. Some of the more popular ones include:
- How Meta uses LLMs for efficient incident response
- How Salesforce operationalizes models at scale
- How DoorDash built a high quality RAG for dasher support
- How Pinterest built text-to-sql
I was glad to hear from some folks that this was helping them stay on top of novel GenAI use cases and also to stretch their thinking beyond chatbots. The nice thing is, these podcast episodes are semi-automated with myself in the loop directing and approving workflows. As a result, it hardly takes few minutes to generate and publish new episodes.
For the tech stack, I used CrewAI for content research, DSPy for content metadata & podcast notes/sponsor message and Langtrace for experimentation & testing.
If you are someone actively tinkering with LLMs or even generally curious about the GenAI space, do check it out.
1. RSS Feed - https://anchor.fm/s/fb0d9ea8/podcast/rss
2. Apple - https://podcasts.apple.com/us/podcast/ai-arxiv/id1768464164
3. Spotify - https://open.spotify.com/show/0Toon5UiQc5P7DNDjsrr9K
I’ve recently launched a podcast that dives into how companies are leveraging large language models (LLMs) to solve real-world problems in unique and creative ways. Each episode explores case studies, technical blogs and research papers to explain how LLMs are applied.
Some of the cool episodes we’ve released so far:
- Uber's Use of LLMs for Mobile Testing.
- DoorDash Enriching Product Information with LLMs.
Each episode dives deep into the why and how, breaking down technical implementations, business impact, and the broader implications of using LLMs in innovative ways.
If you’re working with AI or just curious about how it's being applied in ways you’ve never thought of, I’d love for you to check it out and share your thoughts!
Links to the podcast: Spotify - https://open.spotify.com/show/0Toon5UiQc5P7DNDjsrr9K?si=536d...
Apple Podcast - https://podcasts.apple.com/us/podcast/ai-arxiv/id1768464164
I have recorded 14 episodes so far, including some exciting ones like:
- How Uber engineering uses GenAI for mobile testing.
- How OpenAI's latest reasoning models work.
- How Box uses Amazon Q to power Box AI.
- How DoorDash uses LLMs to enrich its SKUs.
The episodes are semi-automated and fully powered using:
- Exa search and some python code for researching content
- NotebookLM from Google for TTS
- Riverside.fm for editing
The content for these episodes is sourced from various engineering blogs, case studies, and arXiv papers. Check it out and let me know how you like it.
Spotify https://open.spotify.com/show/0Toon5UiQc5P7DNDjsrr9K?si=536d0ce471c44439
Apple
https://podcasts.apple.com/us/podcast/ai-arxiv/id1768464164
The LLM monitoring/evaluations space has seen a number of products off late, both open source and closed source ones. But, a couple of things we have observed are: lack of standard spans and traces that creates vendor lock in, different tools are optimized for solving different pain points - Evaluations, Prompt management, Datasets etc.
We believe that adopting open telemetry(OTEL) standard tracing not only lets teams use our SDK without having to switch their observability client, but will also enable developers to develop tooling for any custom needs such as capturing datasets, prompts, evaluations etc.
A quick note on what we have built so far:
[1] We have a Python and a TypeScript SDK and we have broken down the support for the LLM layer into 3 groups, LLMs, Frameworks and VectorDBs. Our SDKs are open telemetry compatible, can be installed and used independently and we also provide an option to pass custom exporters to export the traces and spans to any observability tool of your choice.
[2] An observability client that is hyper optimized for solving the unique pain points and challenges that come with LLM based apps like Evaluations, prompt iteration, datasets etc. We are SOC2 compliant and the client can also be self hosted if you have strict data privacy and protection requirements.
[3] Both the SDK and the client are fully open source. We are leaning on the community to try it out and provide feedback.
A note about Open Telemetry semantic conventions for LLMs - We would like to converge on standard names for trace attributes that follow the OTEL rules and are looking for feedback from experts here - https://github.com/Scale3-Labs/langtrace/discussions/71
We recognize that this project is early and there is a lot of room for improvement. Would love to hear your thoughts and feedback. Thanks!
Links:
Ola and Karthik here. We are working on Langtrace(https://github.com/Scale3-Labs/langtrace), an open source, open telemetry based SDK and monitoring/evaluations client for LLM based applications. The SDK generates OTEL standard spans and traces for popular LLMs like OpenAI, Anthropic and Cohere, popular frameworks like Langchain and LlamaIndex and vectorDBs like ChromaDB and Pinecone.
The LLM monitoring/evaluations space has seen a number of products off late, both open source and closed source ones. But, a couple of things we have observed are: lack of standard spans and traces that creates vendor lock in, different tools are optimized for solving different pain points - Evaluations, Prompt management, Datasets etc.
We believe that adopting open telemetry(OTEL) standard tracing not only lets teams use our SDK without having to switch their observability client, but will also enable developers to develop tooling for any custom needs such as capturing datasets, prompts, evaluations etc.
A quick note on what we have built so far:
[1] We have a Python and a TypeScript SDK and we have broken down the support for the LLM layer into 3 groups, LLMs, Frameworks and VectorDBs. Our SDKs are open telemetry compatible, can be installed and used independently and we also provide an option to pass custom exporters to export the traces and spans to any observability tool of your choice.
[2] An observability client that is hyper optimized for solving the unique pain points and challenges that come with LLM based apps like Evaluations, prompt iteration, datasets etc. We are SOC2 compliant and the client can also be self hosted if you have strict data privacy and protection requirements.
[3] Both the SDK and the client are fully open source. We are leaning on the community to try it out and provide feedback.
A note about Open Telemetry semantic conventions for LLMs - We would like to converge on standard names for trace attributes that follow the OTEL rules and are looking for feedback from experts here - https://github.com/Scale3-Labs/langtrace/discussions/71
We recognize that this project is early and there is a lot of room for improvement. Would love to hear your thoughts and feedback. Thanks!
Links:
[2] https://github.com/Scale3-Labs/langtrace
[3] https://docs.langtrace.ai/introduction
[4] https://langtrace.ai/blog/why-you-need-opentelemetry-based-o...
https://www.youtube.com/watch?v=9SGAnDZqB8U
It's too early to give out the URL for people to try it out as there are plenty of bugs. That's why I was reluctant to post on "show". I would appreciate if you leave a comment/feedback after watching the video.
Is this something that you would pay to have?