Maybe the current progress will help though. Models adjusted by your own dictionary or from postprocessing fixes would be amazing.
Here’s an iOS app to play with it: https://whispermemos.com
It even formats recording as paragraphs by running through GPT.
> Built using transformers.js and the whisper-tiny.en model.
Laparoscopic - that worked fine ;)
Edit: if you have one available, mind sending over a sample deidentified note to my email in profile? I’m working on something.
No, there's no Apple hardware available in my scenario.
Also in medical context, if I can't tell where the data goes, the solution is not usable.
Normally I expect a lot of things that push the limit to not work on iOS, but this one did!
Is there a version of this we can run on ESP32 (arduino) devices?
Not sure if it'll work on an Arduino, but maybe take a look at https://github.com/ggerganov/whisper.cpp -- it works on a Raspberry Pi at least, so resource requirements are fairly minimal
Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported. index-0dae94e71b526640.js:1:2992
Uncaught (in promise) DOMException: AudioContext.createMediaStreamSource: Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported. index-0dae94e71b526640.js:1
Media resource blob:https://www.ermine.ai/e762a6f1-f292-4b23-96e0-8059a7f9d635 could not be decoded. www.ermine.ai
Media resource blob:https://www.ermine.ai/e762a6f1-f292-4b23-96e0-8059a7f9d635 could not be decoded, error: Error Code: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
(also, the weights json doesn't download at all in Firefox incognito).Would be good if you could pop some kind of alert (literally alert() might do the trick) on an exception just so people don't wait for a couple of minutes before realizing something's gone wrong :)
I think so, yes. The website uses whisper-tiny.en, so I suppose all they need to do is switch to whisper-tiny instead.
Are you aware that whisper.cpp has a WASM-version as well? See https://github.com/ggerganov/whisper.cpp/tree/master/example... - demo at https://whisper.ggerganov.com/
I will have a look at the repository to find out, but maybe someone already looked into it.
SPEAKER A: blah blah
SPEAKER B: blah blah
So it can be used for transcribing phone calls?
Example: 20230406115923.mp3 ==> 20230406115923.txt 20230406083110.m4a ==> 20230406083110.txt
I wish someone would build one and sell it for $10 a copy.
The little "replay your audio capture" <audio> HTML element says "error" but the transcription actually worked.