Writeout.ai – Transcribe and translate any audio files (opens in new tab)

(writeout.ai)

172 pointsmpociot3y ago92 comments

92 comments

67 comments · 21 top-level

masukomi3y ago· 11 in thread

I don't understand why this doesn't actually do the transcription / translation locally. Sending the data to openAI for paid conversion makes no sense. Whisper can be legally run on your computer, for free.

Running it locally way more sense for an open source project, because why would you pay and be dependent upon a 3rd party if you don't have to be.

It also makes way more sense for a service because then _you_ don't have to give all your money to openAI and skim off of what's left.

This is just.... bewildering. I really wanted to use it, but I'm not going to pay openAI to transcribe podcasts for me when i can literally use the exact same language model and do it locally with free open source code.

I'm hoping someone will fork this and teach it to run whisper locally.

[edit: getting the exact right version of python and PyTorch and dependencies to make whisper run was a pain but now i've got it set up and it's a trivial command to transcribe every mp3 i feel like transcribing]

cloudking3y ago

Here is a version that runs locally with WASM https://whisper.ggerganov.com/

https://github.com/ggerganov/whisper.cpp/tree/master/example...

paxys3y ago

Because that would involve actual work. A box sitting somewhere that passes through API calls to OpenAI is trivial to set up.

Kevin23793y ago

To do Whisper transcription for free locally you can use AirCaption (www.aircaption.com). It's an electron desktop app running Whisper.cpp (https://github.com/ggerganov/whisper.cpp). Just released a few days ago.

Madane3y ago

Try Revoldiv.com , it uses Whisper. The transcription quality is near perfect and it's free.

MollyRealized3y ago

With your references to money and paying, where exactly does this indicate it's charging? As of 3/11 Sat 16:46 CST I see no reference to that.

corobo3y ago

How's it handling long files? Let's say worst case scenario, a 2 hour long podcast.

What ratio are you getting (podcast length to transcription time) and does it error out memory wise as others suggest?

masukomi3y ago

I dunno about openAI as a service, but on my M1 mac i think whisper took something on the order of 8x realtime to process with the "large" language model. That is to say... 8 minutes of processing for every 1 minute of audio. It was surprisingly not fast. I assume openAIs servers have more GPU at their disposal to make this go faster.

2 more replies

joshspankit3y ago

Seems like “local API” would help here: just something that duplicates the official API while running locally

smooke3y ago

Good points.

EForEndeavour3y ago

I agree with you, but the reason is cost and convenience.

Whisper v2 costs $0.006 per minute of transcribed text: https://openai.com/pricing

If you had meetings every working hour, you'd have up to ~160 hours of audio per month to transcribe. For most people, this is a gross overestimate.

Throwing this audio at OpenAI's API would cost $57.60 per month, and also frees you up from having to set up and maintain local inference.

masukomi3y ago

"cost and convenience": cost: $57.60 vs 0 Why would you want to pay nearly $700 a year just to avoid running a program in the background on whatever computer you already have open?

convenience: yes, it's a nicer interface, but the current state of the "geeky" version is type command on command line, with path to file. The end. unless you're really afraid of the command line it's not that much more convenient.

The text line being highlighted while you listen is nice but a) we wrote something that did it at the word level (as opposed to sentence..ish level) nearly 20 years ago, b) in this context it's not actually that useful. With video sure... you can click the text and go to teh right place in the video. With spoken text (what this is best at) you click and go to the point...where they're saying what you just read. Unless you really want to hear what you just read, there's not a lot of added value.

Would it be good for podcasts to use an interface like this for playback? absolutely. It'd be a massive upgrade, but that's not what this is offering.

maybe someone will extract that code and let us combine the MP3 and timestamped text file in a web site (if that doesn't already exist). That'd be cool.

But, the cost you propose is way too much for most people, especially in countries that aren't rich. In many places $400 a month is a really good salary. So yeah, if you're rich $700 a year is not a big deal, but...

1 more reply

rcme3y ago· 9 in thread

I love all of these random "companies" popping up that just make either one or a handful of API calls to OpenAI. Come on people, try harder!

nico3y ago

People are excited and doing things, that’s wonderful. What are you doing besides complaining?

Everything starts small.

Also, the most important thing about a service is attracting customers, not the tech stack under it. Facebook was made with PHP, Twitter famously failed constantly while struggling with user growth.

I’d much rather have tons of users with a tech stack that is a wrapper for a bunch of other stuff, than have super impressive in-house tech and no users.

rcme3y ago

> People are excited and doing things, that’s wonderful. What are you doing besides complaining?

I don't think people are excited about making API calls. They see a land grab and are clamoring for their piece. As for what I'm doing, I work on my own products that, I hope, push the envelope, at least slightly. And I have seen AI companies that are doing good work using OpenAI's tools, but this isn't one of them.

3 more replies

LunaSea3y ago

It's a good metaphor for a good part of the startup scene: good looking gift wrapping around tools built by smarter people.

masukomi3y ago

in their defense, there's a lot of value there. As i commented elsewhere here, I'm frustrated that this particular thing isn't running the transcription locally, but this is a _massive_ improvement on what the "tools built by smarter people" built.

Sometimes "good looking gift wrapping" is a huge value unto itself. Also, it isn't fair to good UX and UI developers to imply that that isn't also really hard work to get right. It's just different work using a different form of thinking. Not lesser in any way. And... without the people who could make the "good looking gift wrapping" most apps would suck a lot harder than they already do.

rvz3y ago

It is a complete grift which everyone and their llamas are somehow immediately AI companies, when they are all hitting the OpenAI API. So when it goes down their entire business is down as well.

This hype is going to eventually subside with lots of losers and a tiny minority of winners when the price increases come in.

The only winners of this race to the bottom is Stability.ai who are already open sourcing everything and OpenAI cannot afford to open source their flagship AI product(s) for free.

kkielhofner3y ago

Agreed.

The current AI hype cycle has driven companies to slap AI somewhere in their offering so they can call themselves an AI company - even if it's an API key and an intern spending half a day with an API wrapper.

Gatekeeping is always risky but in my mind if you're not at least touching an ML framework you're not an "AI company" - which is already IMO a pretty low bar. That said it starts to get really hazy when you look at things like SageMaker and other offerings where you're doing abstracted model development or substantial amounts of fine-tuning/training on a custom dataset, etc.

bonestamp23y ago

The low hanging fruit always comes first. Sure, you and I could whip it together in an afternoon, but for non-tech people these simple tools are very handy since they put a UI on an API.

j453y ago

Does it have to satisfy your standard of being novel, or be super approachable and adoptable by customers?

mpociotOP3y ago

This project is meant to be especially for educational reasons, which is why it's entirely open source

ptero3y ago· 8 in thread

I want something that I can self host. I am perfectly OK with a single language and a few mistakes here and there.

Does such a thing exist? I would gladly donate to a kickstarter project for this before trying to build one myself.

htrp3y ago

Just download whisper ....

If you own a gpu use this one https://github.com/openai/whisper

If you don't own a gpu use this one https://github.com/ggerganov/whisper.cpp (this one is very very slow)

inconceivable3y ago

whisperx also adds improved timestamping, closed captioning output, and beta diarization (speaker labeling) support. unfortunately it doesn't seem to support m4a out of the box but you can convert to mp3 (upgrade the sound lib dependency first) or wav with ffmpeg.

travisjungroth3y ago

whisper.cpp is not universally very very slow. With an M1 Macbook and the medium model it's faster than real time. There may be some accuracy lost because it uses a different search method and if you choose to run a smaller model.

mpociotOP3y ago

You mean without using the OpenAI API? This project is open source and on GitHub, so you can self-host this if you want!

kkielhofner3y ago

You (essentially) need GPU but here you go:

https://github.com/ahmetoner/whisper-asr-webservice

For your requirements the medium.en model (max) should be satisfactory.

adgjlsfhk13y ago

https://github.com/ggerganov/whisper.cpp makes it relatively feasible to run on CPU.

1 more reply

XCSme3y ago

> You (essentially) need GPU but here you go

Don't most devs most likely already have a powerful GPU? Maybe I am biased for also being a gamer or having worked in game-development, which requires a powerful GPU anyway.

stuxnet793y ago

whisper is extremely simple to use on the command line. Just install it with pip and you are off to the races.

CSSer3y ago· 6 in thread

Is there any chance you could expose a pathway to use a local instance of Whisper? I ask primarily because OpenAI completely open-sourced Whisper in September 2022[0]. It seems odd to me to default to or encourage the usage of a paid service for something that appears to be available for free under MIT license including models[1].

My understanding is that the only reason OpenAI even setup the paid API is because it "can also be hard to run [sic]". Personally, I'm skeptical. I"m not knocking them for it but I could see how this is just brand capitalization.

[0]: https://openai.com/blog/introducing-chatgpt-and-whisper-apis...

[1]: https://github.com/openai/whisper

nonoesp3y ago

If you use the large-v2 model they expose via the API, the more accurate, in your local machine, you'll see that even though it works great it's slow and won't work for long audio files because of memory limitations.

It's fairly easy and quick to run Whisper for free either locally in an Anaconda environment with Python or the command-line interface or, even better, in a Google Colab notebook.

Here's a sample notebook that builds on a notebook by Pete Warden.

https://colab.research.google.com/drive/1sxsey3n0jd09MjUd9Ky...

rolisz3y ago

On a 1080Ti (so a 6 year old GPU), the large model runs in 1x time (so transcribing 10 minutes takes 10 minutes) and I've successfully transcribed even 1h+ files.

1 more reply

paxys3y ago

> My understanding is that the only reason OpenAI even setup the paid API is because it "can also be hard to run [sic]". Personally, I'm skeptical. I"m not knocking them for it but I could see how this is just brand capitalization.

Why is it hard to see that not every organization has the capability to set up their own translation cluster, provision GPUs, frontends, scaling, on-call rotations, regularly update models..? It's not just "brand capitalization". An API that you can call to transcribe/translate a recording with zero extra work is absolutely essential to have for most.

cnbeining3y ago

I have a pipeline setup in https://github.com/cnbeining/Whisper_Notebook/blob/master/Wh... .

- Run Voice Activity Detection for better timestamp output - Transcribe with Whisper - Run Forced Alignment to get per word timestamp - Create better segmented SRT - Translate(with multiple APIs - implemented DeepL, Google Translate, Baidu and a couple more)

Tenoke3y ago

The API is useful because not everyone has quick 10+gb vram gpus lying around.

CSSer3y ago

You know, this is true. I was a bit too dismissive about it because I haven't done a lot of deploying models myself. I was making the assumption that it was similar to many other services, but even looking at pricing for managed GPUs on most instances shows me that's clearly not the case.

vhanda3y ago· 6 in thread

The title seems quite disingenuous.

A better description would be "A PHP based web app which calls OpenAI's Whisper API to transcribe speech"

CSSer3y ago

I agree. Kudos to the author for sharing a working example of using the OpenAI's PHP Whisper client though. Digging a bit deeper into the organization that released this seems to provide more context: https://beyondco.de/. It appears this is Laravel oriented.

cinntaile3y ago

The main reason people add the tech stack is for marketing reasons.

The title describes what it does, I think you're making a mountain out of an anthill.

singularity20013y ago

why php though, couldn't the whole thing not be completely running in the browser?

MrOwnPut3y ago

Many people on HN infamously called Dropbox just an rsync script, right?

It's usually all in the details and delivery (and ya'know we're lazy and lack time to setup stuff locally)

Though I wouldn't really knock anything free and open source either way.

blululu3y ago

The objection here is more structural than technical. The famous dropbox objection is 'anyone could do this' - even though they might not have the wherewithal to do so. The objection here is that the open source project is relying on a closed source paid service to do all the heavy lifting. Someone is going to need to foot the bill, which means this project will eventually have to answer some tough questions about funding, and what the project actually delivers.

1 more reply

phkahler3y ago

This is not open source. The wrapper may be, but it's using a non open source cloud service.

1 more reply

badloginagain3y ago· 2 in thread

Just a note for anyone basing their business on the .ai TLD.

It's technically the domain for Anguilla, a literal British colony in the Caribbean.

It appears to be managed by some random guy- check out the .ai registration FAQ: http://whois.ai/faq.html

If you are going to use .ai, just be aware the top level of the domain appears to be managed by some dude with a gmail account. Its not necessarily bad, but something to consider if you're planning to host your billion dollar AI startup on it.

Belphemur3y ago

I wonder how a simple individual can acquire control on a ccTLD.

I always thought they would need to be vetted by the government of that the ccTLD represents.

badloginagain3y ago

I think they found the one weird software dev in Anguilla willing to do it. The "Offshore Information Services" link on the .ai wikipedia page "registry" link redirects to the dudes wikipedia page.

https://en.wikipedia.org/wiki/.ai https://en.wikipedia.org/wiki/Vince_Cate

A colorful character to say the least, and exactly the kind of person I'd expect to be running the ccTLD of a small caribbean island.

1 more reply

elaus3y ago· 1 in thread

Thanks a lot for making this! Just last week I was trying out the transcription APIs of AWS and Google Cloud and they produced rather bad results for a German interview (wrong punctuation and capitalization, about 1 misheard word per sentence).

I didn't know OpenAI had an API for that as well, but now I was able to try it out and it's magnitudes better: Perfect spelling and only 1 wrong word in 2 minutes of audio (an abbreviation) that I was able to understand. It even filters out filler words!

You just saved me literally hours of work by showing the powers of OpenAI!

(Reading this back it sounds like an ad, but I'm in no way affiliated with any of those services. I'm just very happy.)

masukomi3y ago

note that you can run openAI's whisper locally. The language model and tools are open sourced. It is finicky to set up if you're not a python dev. Just wanted to let you know that it's an option and it works literally exactly as well. You can even choose if you want to sacrifice quality for speed of conversion. The experience of using it will just be a lot.... geekier. command line call that produces a text format with timestamps on every line.

paxys3y ago· 1 in thread

OpenAI isn’t running a charity. This “free” service is going to run into that reality sooner or later, so I’d suggest not using it for any real work and instead buying Whisper API tokens directly.

MrOwnPut3y ago

Or use this nice UI for now to trial, and when that reality comes... transition to the API if you still need to do what you need to to?

So I would suggest this project to try things out, then setup Whisper locally :)

But what do I know, I'm just in the ether...

osrec3y ago· 1 in thread

When you say free, do we still need to subscribe to other services?

mpociotOP3y ago

Well if you want to self-host it, then yes. You will need an OpenAI account/API key for it to work.

aktuel3y ago· 1 in thread

What languages does it support?

mpociotOP3y ago

It uses the OpenAI Whisper API, which according to their API documentation supports: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

See https://platform.openai.com/docs/guides/speech-to-text/quick...

steeve3y ago

FYI: "Writeout uses the recently released OpenAI Whisper API to transcribe audio files. You can upload any audio file, and the application will send it through the OpenAI Whisper API using Laravel's queued jobs. Translation makes use of the new OpenAI Chat API and chunks the generated VTT file into smaller parts to fit them into the prompt context limit."

mofle3y ago

If anyone wants transcription locally (on-device) on macOS or iOS, I just released a free app for it: https://sindresorhus.com/aiko It runs Whisper on your device.

1010083y ago

This works with Whisper behind the scenes and it is true: running Whisper locally it is easy and this may not be needed (it even adds overhead, since Whisper gives me the .srt file directly, I can ask to use the tiny model to make it faster, etc).

BUT if my brother (accountant) needs something like this, he wouldn't be able to install Whisper: he wouldn't even be able to open Github. So I think frontend GUI that behind the scenes runs models are always welcome.

I think this would be much better if it would run Whisper on their server instead of using an external API, but that's their decission.

adithyasrin3y ago

Btw your imprint and privacy policy requires margins and changes to references of whatthediff.

albert1803y ago

I don't see any benefit over using Whisper locally or the API directly

dinkblam3y ago

does anyone know of a complete system or player to automatically generate subtitles based on speech recognition (apart from YouTube)?

there are a lot of older series/movies with where the speech is hard to discern but no subtitles are available for download.

i have been thinking about creating an AutoSubtitle app for years, but haven't had a free day to tackle it - hope someone else beats me to it.

cvhashim043y ago

Good tool translate and decipher the lyrics of ATL rappers like Future.

endisneigh3y ago

at the very least they should host the model and point their service to their own locally hosted whisper instance...

eternalban3y ago

Why not just use whisper.cpp locally?

SlickStef113y ago

Just tried it out and wow!

insane.

Free and open source.

Thank you!

killthebuddha3y ago

The number of people on HN b****ing about people shipping MVPs that build on OpenAI is hilarious and makes me feel like HN has definitely jumped the shark.

1 more reply

j / k navigate · click thread line to collapse

92 comments

67 comments · 21 top-level

masukomi3y ago· 11 in thread

Running it locally way more sense for an open source project, because why would you pay and be dependent upon a 3rd party if you don't have to be.

It also makes way more sense for a service because then _you_ don't have to give all your money to openAI and skim off of what's left.

I'm hoping someone will fork this and teach it to run whisper locally.

cloudking3y ago

Here is a version that runs locally with WASM https://whisper.ggerganov.com/

https://github.com/ggerganov/whisper.cpp/tree/master/example...

paxys3y ago

Because that would involve actual work. A box sitting somewhere that passes through API calls to OpenAI is trivial to set up.

Kevin23793y ago

Madane3y ago

Try Revoldiv.com , it uses Whisper. The transcription quality is near perfect and it's free.

MollyRealized3y ago

With your references to money and paying, where exactly does this indicate it's charging? As of 3/11 Sat 16:46 CST I see no reference to that.

corobo3y ago

How's it handling long files? Let's say worst case scenario, a 2 hour long podcast.

What ratio are you getting (podcast length to transcription time) and does it error out memory wise as others suggest?

masukomi3y ago

2 more replies

joshspankit3y ago

Seems like “local API” would help here: just something that duplicates the official API while running locally

smooke3y ago

Good points.

EForEndeavour3y ago

I agree with you, but the reason is cost and convenience.

Whisper v2 costs $0.006 per minute of transcribed text: https://openai.com/pricing

If you had meetings every working hour, you'd have up to ~160 hours of audio per month to transcribe. For most people, this is a gross overestimate.

Throwing this audio at OpenAI's API would cost $57.60 per month, and also frees you up from having to set up and maintain local inference.

masukomi3y ago

"cost and convenience": cost: $57.60 vs 0 Why would you want to pay nearly $700 a year just to avoid running a program in the background on whatever computer you already have open?

Would it be good for podcasts to use an interface like this for playback? absolutely. It'd be a massive upgrade, but that's not what this is offering.

maybe someone will extract that code and let us combine the MP3 and timestamped text file in a web site (if that doesn't already exist). That'd be cool.

1 more reply

rcme3y ago· 9 in thread

I love all of these random "companies" popping up that just make either one or a handful of API calls to OpenAI. Come on people, try harder!

nico3y ago

People are excited and doing things, that’s wonderful. What are you doing besides complaining?

Everything starts small.

Also, the most important thing about a service is attracting customers, not the tech stack under it. Facebook was made with PHP, Twitter famously failed constantly while struggling with user growth.

I’d much rather have tons of users with a tech stack that is a wrapper for a bunch of other stuff, than have super impressive in-house tech and no users.

rcme3y ago

> People are excited and doing things, that’s wonderful. What are you doing besides complaining?

3 more replies

LunaSea3y ago

It's a good metaphor for a good part of the startup scene: good looking gift wrapping around tools built by smarter people.

masukomi3y ago

rvz3y ago

It is a complete grift which everyone and their llamas are somehow immediately AI companies, when they are all hitting the OpenAI API. So when it goes down their entire business is down as well.

This hype is going to eventually subside with lots of losers and a tiny minority of winners when the price increases come in.

The only winners of this race to the bottom is Stability.ai who are already open sourcing everything and OpenAI cannot afford to open source their flagship AI product(s) for free.

kkielhofner3y ago

Agreed.

bonestamp23y ago

The low hanging fruit always comes first. Sure, you and I could whip it together in an afternoon, but for non-tech people these simple tools are very handy since they put a UI on an API.

j453y ago

Does it have to satisfy your standard of being novel, or be super approachable and adoptable by customers?

mpociotOP3y ago

This project is meant to be especially for educational reasons, which is why it's entirely open source

ptero3y ago· 8 in thread

I want something that I can self host. I am perfectly OK with a single language and a few mistakes here and there.

Does such a thing exist? I would gladly donate to a kickstarter project for this before trying to build one myself.

htrp3y ago

Just download whisper ....

If you own a gpu use this one https://github.com/openai/whisper

If you don't own a gpu use this one https://github.com/ggerganov/whisper.cpp (this one is very very slow)

inconceivable3y ago

travisjungroth3y ago

mpociotOP3y ago

You mean without using the OpenAI API? This project is open source and on GitHub, so you can self-host this if you want!

kkielhofner3y ago

You (essentially) need GPU but here you go:

https://github.com/ahmetoner/whisper-asr-webservice

For your requirements the medium.en model (max) should be satisfactory.

adgjlsfhk13y ago

https://github.com/ggerganov/whisper.cpp makes it relatively feasible to run on CPU.

1 more reply

XCSme3y ago

> You (essentially) need GPU but here you go

Don't most devs most likely already have a powerful GPU? Maybe I am biased for also being a gamer or having worked in game-development, which requires a powerful GPU anyway.

stuxnet793y ago

whisper is extremely simple to use on the command line. Just install it with pip and you are off to the races.

CSSer3y ago· 6 in thread

[0]: https://openai.com/blog/introducing-chatgpt-and-whisper-apis...

[1]: https://github.com/openai/whisper

nonoesp3y ago

It's fairly easy and quick to run Whisper for free either locally in an Anaconda environment with Python or the command-line interface or, even better, in a Google Colab notebook.

Here's a sample notebook that builds on a notebook by Pete Warden.

https://colab.research.google.com/drive/1sxsey3n0jd09MjUd9Ky...

rolisz3y ago

On a 1080Ti (so a 6 year old GPU), the large model runs in 1x time (so transcribing 10 minutes takes 10 minutes) and I've successfully transcribed even 1h+ files.

1 more reply

paxys3y ago

cnbeining3y ago

I have a pipeline setup in https://github.com/cnbeining/Whisper_Notebook/blob/master/Wh... .

Tenoke3y ago

The API is useful because not everyone has quick 10+gb vram gpus lying around.

CSSer3y ago

vhanda3y ago· 6 in thread

The title seems quite disingenuous.

A better description would be "A PHP based web app which calls OpenAI's Whisper API to transcribe speech"

CSSer3y ago

cinntaile3y ago

The main reason people add the tech stack is for marketing reasons.

The title describes what it does, I think you're making a mountain out of an anthill.

singularity20013y ago

why php though, couldn't the whole thing not be completely running in the browser?

MrOwnPut3y ago

Many people on HN infamously called Dropbox just an rsync script, right?

It's usually all in the details and delivery (and ya'know we're lazy and lack time to setup stuff locally)

Though I wouldn't really knock anything free and open source either way.

blululu3y ago

1 more reply

phkahler3y ago

This is not open source. The wrapper may be, but it's using a non open source cloud service.

1 more reply

badloginagain3y ago· 2 in thread

Just a note for anyone basing their business on the .ai TLD.

It's technically the domain for Anguilla, a literal British colony in the Caribbean.

It appears to be managed by some random guy- check out the .ai registration FAQ: http://whois.ai/faq.html

Belphemur3y ago

I wonder how a simple individual can acquire control on a ccTLD.

I always thought they would need to be vetted by the government of that the ccTLD represents.

badloginagain3y ago

I think they found the one weird software dev in Anguilla willing to do it. The "Offshore Information Services" link on the .ai wikipedia page "registry" link redirects to the dudes wikipedia page.

https://en.wikipedia.org/wiki/.ai https://en.wikipedia.org/wiki/Vince_Cate

A colorful character to say the least, and exactly the kind of person I'd expect to be running the ccTLD of a small caribbean island.

1 more reply

elaus3y ago· 1 in thread

You just saved me literally hours of work by showing the powers of OpenAI!

(Reading this back it sounds like an ad, but I'm in no way affiliated with any of those services. I'm just very happy.)

masukomi3y ago

paxys3y ago· 1 in thread

MrOwnPut3y ago

Or use this nice UI for now to trial, and when that reality comes... transition to the API if you still need to do what you need to to?

So I would suggest this project to try things out, then setup Whisper locally :)

But what do I know, I'm just in the ether...

osrec3y ago· 1 in thread

When you say free, do we still need to subscribe to other services?

mpociotOP3y ago

Well if you want to self-host it, then yes. You will need an OpenAI account/API key for it to work.

aktuel3y ago· 1 in thread

What languages does it support?

mpociotOP3y ago

See https://platform.openai.com/docs/guides/speech-to-text/quick...

steeve3y ago

mofle3y ago

If anyone wants transcription locally (on-device) on macOS or iOS, I just released a free app for it: https://sindresorhus.com/aiko It runs Whisper on your device.

1010083y ago

I think this would be much better if it would run Whisper on their server instead of using an external API, but that's their decission.

adithyasrin3y ago

Btw your imprint and privacy policy requires margins and changes to references of whatthediff.

albert1803y ago

I don't see any benefit over using Whisper locally or the API directly

dinkblam3y ago

does anyone know of a complete system or player to automatically generate subtitles based on speech recognition (apart from YouTube)?

there are a lot of older series/movies with where the speech is hard to discern but no subtitles are available for download.

i have been thinking about creating an AutoSubtitle app for years, but haven't had a free day to tackle it - hope someone else beats me to it.

cvhashim043y ago

Good tool translate and decipher the lyrics of ATL rappers like Future.

endisneigh3y ago

at the very least they should host the model and point their service to their own locally hosted whisper instance...

eternalban3y ago

Why not just use whisper.cpp locally?

SlickStef113y ago

Just tried it out and wow!

insane.

Free and open source.

Thank you!

killthebuddha3y ago

The number of people on HN b****ing about people shipping MVPs that build on OpenAI is hilarious and makes me feel like HN has definitely jumped the shark.

1 more reply

j / k navigate · click thread line to collapse