Since faster-whisper claims 4x speedup over base Whisper, and I've found WhisperX to be faster still (for longer audio where it can do batch inference), at least on consumer GPUs.
So with AiOla saying "50% speedup", is that actually noteworthy?
50% on its own doesn’t make this the current best choice for production. But I imagine this could become the new base model that all of the inference optimizations are applied to.
Wonder if it’s plug and play or if faster-whisper and others would need to reimplement from scratch?
If so, is the quality still acceptable?
If you're interested, you might as well check out Gladia, at least they have a pricing section and allow you to use it as a developer, unlike just asking you to "Request a Demo".
And while a sibling comment links to the GitHub repository, their entire website does not contain such a link.
---
Edit: My bad, for some reason I first checked the website instead of the blog post. Looks much more interesting now.
1. https://github.com/aiola-lab/whisper-medusa
Any suggestions would be very welcome!