TLDR: There are dozens of audio transcription APIs, but nothing for video and visual transcriptions. So we built one.
If you want visual chaptering, summarization, OCR / text-extraction, audio transcriptions, and sentiment analysis on your videos, there’s really nothing out there. We tried stitching this together with several audio/video understanding APIs but kept running into rate limits, hallucinations, high costs and poor accuracy.
Analyzing Audio Podcasts: https://vlm-docs.nos.run/guides/guide-audio-podcasts
Understanding Video Podcasts: https://vlm-docs.nos.run/guides/guide-video-podcasts