Google Cloud Text-To-Speech Powered by DeepMind WaveNet Technology (opens in new tab)

Gatsky8y ago

Come on now, it's a cloud machine learning product, you can access it using purely open source tools all running on an open source operating system if you like. No need to pronounce doom because the demo doesn't work in your browser.

shdh8y ago

Just needs a WebComponents polyfill.

Other browser vendors don't support WebComponents out of the box (yet).

rhizome8y ago

Google used to at least generally support the idea of the open web

Yes, but history teaches us that they are terrible at product management. I mean, they could at least just call it "Chrome TTS" or something until they put the whole thing together.

[0]: https://developer.mozilla.org/en-US/docs/Web/Web_Components/...

thesandlord8y ago

GCP person here. We are aware and are working on a fix!

(Should be as "simple" as a polyfill for webcomponents, but I don't want to put words in the team's mouth)

thesandlord8y ago

Update: Should be fixed now!

pietroglyph8y ago

That's probably because custom elements aren't supported in Firefox yet, unless you set "dom.webcomponents.enabled" and "dom.webcomponents.customelements.enable" to true. It's supposed to be fully enabled in Firefox 60/61 according to MDN [0].

the84728y ago

Best viewed with Chrome Explorer™ 5.0, Optimized for 1024x786 truecolor screens.

ehsankia8y ago

Sure, but should they not be either using polyfills, or not use code that is only supported by one browser? I understand how a project like Allo Web would want to use latest streaming api tech, but a documentation/product launch page doesn't need crazy new web tech.

gcb08y ago

> something from google not working on firefox.

Call me paranoid all you want, but that is their plan. just like it was microsoft's plan for nothing on IE4/6 to work on netscape. Heck, a third of my corporate sites have DHTML popups saying "Use chrome" when i reach them with firefox already.

qeternity8y ago· 8 in thread

The average English word is 4.5 characters and the average English speaker speaks 110-150 words per minute. This means that at $16/1m characters, we can generate speech at a cost between $28.57-39/hr. Per Google's post, WaveNet now costs 50ms of TPU time per 1s of speech generated, meaning, at 100% utilization, a TPU can generate somewhere between $571.40-780/hr. Google's TPUs can be deployed (by third parties) at $6.50/hr. That's some sweet sweet margin.

teraflop8y ago

I think your math is wrong by a factor of 60. Under your assumptions, one hour of speech is equivalent to 30-40 thousand characters, costing between $0.48-$0.65. That translates to revenue of $9.50-$12.96/hour per TPU.

I double checked, you are correct:

    16 * (4.5 * 110 * 60) / 1M = $0.475/hr

    16 * (4.5 * 150 * 60) / 1M = $0.648/hr

If you multiply by the number of 50ms in one second (20), you do get $9.5 - $12.96

qeternity8y ago

You're right, I decided to switch from seconds as base unit to hours as base unit half way through writing the comment, and screwed up the conversion. I was accounting for seconds when I was already in minutes.

paulgb8y ago

Free SAAS business model: transform text by translating each word to its shortest homonym, and take a cut of the cost savings :)

froindt8y ago

Google does a remarkably good job with pronunciation. I'm consistently impressed by it.

I was messing around with the ancient VBA text to speech system. If most TTS systems sucked as much as that one, you could also make a SAAS business for finding "typos" that make the word sound correct when pronounced.

vgt8y ago

Nice analysis! That's the rub with doing tco analysis. How much value do you put on higher order of manageability?

You should also include a workload volatility component to be entirely fair. Your analysis assumes it's entirely steady state.

(Work at g)

z3t48y ago

"the cloud" is just a better DRM.

oh-kumudo8y ago

How so? They don't sell the model itself, they sell the 'tickets' to allow you to take a picture of it. That is not DRM.

PostOnce8y ago· 5 in thread

Am I wrong in thinking that the cost of generating (realistic-sounding, learned-model) speech on commodity hardware will be near-zero soon, largely negating the value of a SaaS?

I've been waiting a long time for decent sounding open source TTS software for narrating books to me, and now with deep learning it's either here or very near here, and the hardware is going to keep getting more performant at the same price. I guess that will be very appealing to businesses relying on TTS (e.g. call centers and phone robots and mobile apps with TTS, etc)

wslh8y ago

Some companies don't ever think in hiring people to build and maintain it, they prefer to pay a subscription like Google Cloud.

ariwilson8y ago

What open source TTS is as good as Google Cloud?

PostOnce8y ago

check the "kate" samples: https://github.com/Kyubyong/speaker_adapted_tts

This is with 1 minute of audio and 10 minutes of training, which is crazy to me. Maybe it's not "as good", but it's very good, and free, and it will get better, faster, and cheaper quickly?

https://research.googleblog.com/2018/03/expressive-speech-sy...

paulryanrogers8y ago

"... the hardware is going to keep getting more performant at the same price"

That is the hope but there are no guarantees. Perhaps specialized hardware can pick up where Moore's Law has tapered off.

PostOnce8y ago

I don't mean general purpose hardware, I mean specifically neural hardware, whether that means GPUs (as we know them), GPUs with special hardware like the "tensor cores", TPUs, "neuromorphic chips" whatever that is, FPGAs becoming part of average computers, or something else.

There's no end in sight to the improvement of neural hardware, not like the wall x86 CPUs have hit anyway.

bufferoverflow8y ago· 5 in thread

I wish they had some beautiful voices, not some of the most generic-sounding men and women.

forgot-my-pw8y ago

Like Scarlett Johansson's? https://www.youtube.com/watch?v=3n5muEWaE_Q

jonknee8y ago

They're working on it! Check out the samples here:

The last set is specifically interesting for your wish.

gene19748y ago

Meryl Streep should be worried right about now! (I heard that last set of renders... whoa!)

gene19748y ago

This is MY VOICE as rendered by lyrebird.ai - https://lyrebird.ai/g/hFj87pbl - So, tech is out there trying to replicate the actual CHARACTER of individual human voices.

aantix8y ago

We need a virtual Barry White.

tristanj8y ago· 4 in thread

Are there any voice samples?

joefourier8y ago

You can try it yourself here, just make sure to select English (United States) and Voicetype: Wavenet, as the other languages are not yet using the Wavenet system: https://cloud.google.com/text-to-speech/

alonmower8y ago

It is fun to mismatch voices/languages to hear some hilariously stereotypical accents

michaelbuckbee8y ago

If you switch between "Basic" and the "Wavenet" you can hear the difference noted in this announcement, it is noticeably much better.

dharma18y ago

Try changing the pitch by more than +-10 as well, it breaks up quite amusingly

coryfklein8y ago· 3 in thread

Google also announced today their Tacotron engine which features new prosody modeling speech generation. It allows them to generate speech that mimics personal intonation, accents, and rhythm, effectively mimicking an individuals "expression" in their speech.

HN discussion here: https://news.ycombinator.com/item?id=16691197

vosper8y ago

I find it amusing that here we have all the corporatey buzzwords - "DeepMind WaveNet Technology" - but the other thing is called Tacotron.

gcb08y ago

if they named themselves DeepMind WaveNet Blockchain, they could have acquired Googlebet* by now.

* that's how i call google/alphabet when it doesn't matter which side of the tax-avoiding entity i am referring to.

[1] https://www.crockford.com/wrrrld/anguish.html

jjeaff8y ago

They also announced their new fluerosinth-ai amplitudinal speech to text deep-cloudmind product, although I can't find a reference link...

ollin8y ago· 3 in thread

Does anyone have a GitHub project for epub -> mp3 using this service yet (for automatic audiobook generation)? May make it myself if I have time but curious if anyone already has set it up.

EDIT: this is almost exactly their sample application (https://github.com/GoogleCloudPlatform/python-docs-samples/t...). Was able to get it working with epubs using pypandoc within the hour. Now just need to make it upload to Overcast...

EDIT 2: Can now convert epubs directly to mp3s on Overcast. Yay!

Immortalin8y ago

Shameless plug: https://auditus.cc

Uses Amazon Polly

technics2568y ago

Is your code on github?

ollin8y ago

the code was hacked together during a lecture so it's not very clean or robust, but here's the gist if you're trying to build something similar: https://gist.github.com/madebyollin/508930c86fa12e1a70e32d91...

(overcast uploading not shown–that's a separate script using mechanize)

Jakob8y ago· 3 in thread

Having an English text but setting the language to another one like German or French is hilarious.

You get e.g. ze Dscherman aczent or de frensch onehe.

jfno678y ago

I decided to do the opposite so French text with the wavenet English voice, pretty funny too.

pacaro8y ago

Also feeding it text from the Anguish Languish[1] corpus is pretty good.

davidklemke8y ago

I am ashamed to admit I spent far too much time doing exactly that. The Japanese one really cracked me up.

benjismith8y ago· 2 in thread

Interesting! I'd love to see a thorough comparison with the Amazon Polly service...

https://aws.amazon.com/polly/

Polly is priced at $4 per million characters and the Google WaveNet voices are $16 (compared with the Google non-WaveNet voices, which are also $4).

After listening to a few samples from each service, the voice quality and prosody modeling seem roughly on par between Polly and WaveNet, or at least the differences I heard didn't seem to justify a 4x price multiplier.

But I'd love to hear an informed opinion from someone with more expertise...

jakozaur8y ago

A lot of voice generation is cost-center (call center that are outsourced to cheapest location) with short sentences. I doubt industry would pay 4x price multiplier for that use-case.

So in fact WaveNet competes more with voiceover and new use-cases such as voice assistants. Still I don't hear that much difference there today, but maybe WaveNet will improve in the future to human level sooner than the other models.

To me Polly is way behind WaveNet when it comes to realism. Polly is robotic, WN is fluid.

WheelsAtLarge8y ago· 2 in thread

It's very good. The voices reminds me of speech from real life people with accents. It's good enough for voice overs where previously real-life voices would be too expensive. I would say that it's better than Amazon's Polly when it's used to read long passages of text.

ghaff8y ago

I don't know. They're good but they still sound robotic. For me, they work for applications where I sort of expect/accept that I'll get computer-generated speech anyway. But I wouldn't use them as a general substitute for a human speaking, even someone like me who doesn't exactly have a radio voice.

remir8y ago

It's getting more and more human sounding. Take a look at this research (also from Google): https://research.googleblog.com/2018/03/expressive-speech-sy...

https://github.com/pndurette/gTTS

ryeguy_248y ago· 2 in thread

Based on the pricing of $16 per 1 million characters (roughly equal to a 400-500 page book), doesn't this severely threaten the voiceover market place? I just priced the cost of a human voiceover on VoiceBunny.com for a 400-page book and I got an average turnaround time of 90 days / $15K cost vs WaveNet's $16 cost and only 30 mins of computational time. That sounds like an interesting disruptor to me.

brad08y ago

It does if people are willing to listen to the voice for 10 hours+.

I could listen to this voice for a while, but the voice needs more emotion in it before it could be actually useful for long text.

dhon_8y ago

Tacotron (also by Google) looks promising in this area https://google.github.io/tacotron/publications/global_style_...

tmalsburg28y ago· 2 in thread

This is great, but there remain very difficult problems to be solved. The prosody generated by this is fairly generic and not informed by a true understanding of the text. Consider this sentence:

I have plans to leave.

If you stress the word "plans", the sentence means that the speaker is not necessarily intending to actually leave. However, when the stress is on "leave", the speaker definitely intends to leave. A human reader can easily infer the correct meaning from context but text-to-speech systems can't because they don't have any systematic understanding of the things being talked about and the social pragmatics of the discourse. As long as these issues aren't solved, text-to-speech systems will make mistakes. These mistakes will be easy to spot in some cases but can also have catastrophic consequences in other cases: "I have plans to bomb North Korea."

londons_explore8y ago

Google has solved that here: https://news.ycombinator.com/item?id=16692559

tmalsburg28y ago

This is really cool, thanks for the link, but it solves a different problem.

StavrosK8y ago· 2 in thread

Is there any API for generating speech that sounds like Google Now's assistant? The quality of that is much, much better than this new service.

panarky8y ago

Yes, I use this:

Very simple and has the Google Assistant voice.

StavrosK8y ago

Hmm, that seems much worse than Google Assistant as well. I think my mistake was that I had selected "Basic" instead of "WaveNet" for the voices (because it's only available for US English). WaveNet is much better.

daoudc8y ago· 2 in thread

I had an idea this morning for a personalised "podcast" that could read out e.g. the weather in your area, any new and important emails, the headlines and first paragraph of top stories from your favourite sources and notifications from social media.

I think this is the missing thing that was needed to make this viable.

dragonwriter8y ago

> I had an idea this morning for a personalised "podcast" that could read out e.g. the weather in your area, any new and important emails, the headlines and first paragraph of top stories from your favourite sources and notifications from social media.

Google Assistant already has all the pieces of that (maybe not all the social media connections one might want, I haven't looked much at that), and the ability to string them together.

nl8y ago

Try this:

Hey Google, Tell me about my Day.

and

Hey Google, Tell me the news.

There's a way you can get the news added to your daily briefing (the first trigger), but I can't remember how now.

StavrosK8y ago· 1 in thread

Here's a simple Python script that will fetch some sample audio using the request on the demo page and save it in a file:

https://www.pastery.net/nujfhw/

I have no idea what the rate limits are, so please don't abuse it, I wrote it because the demo didn't work in Firefox and I wanted to play around with it more extensively.

mxuribe8y ago

Thanks, pretty neat script; works nicely!

joelthelion8y ago· 1 in thread

I for one welcome our new wavenet telemarketing overlords...

dakna8y ago

I set the text to

"I'm sorry Dave. I'm afraid I can't do that."

to be prepared for whats coming ...

remir8y ago· 1 in thread

Imagine teaching these voices to sing. Something like DeepMind WaveNet Song Generator.

You upload your music to the cloud, set some parameters (genre, tempo, emotion, etc) and a bunch of lyrics and the thing will spit out awesome vocals for you.

severine8y ago

That's a billion dollar idea, I wonder who'll do it, maybe you!

ImJasonH8y ago· 1 in thread

Quick, someone remake Translation Party using Speech-to-Text-to-Speech-to-Text-to-Speech-ad-infinitum

https://cloud.google.com/text-to-speech/docs/quickstart https://cloud.google.com/speech/docs/sync-recognize

ImJasonH8y ago

Never say I never gave you anything, HN: https://gist.github.com/ImJasonH/78c22b36944b8ec189456e67e63...

verelo8y ago· 1 in thread

Is it just me, or would a demo really make this posting much more interesting?

Edit: There is one, on the actual Google Cloud Text-To-Speech page, so a few clicks in and you'll get one.

mintplant8y ago

Doesn't show up on Firefox or Edge. Only a blank space where I assume the demo should be. Console suggests some sort of Polymer/WebComponents error.

tambourine_man8y ago

The US English synthesized version is truly remarkable. Borderline scarily good.

The fact that the preview only seems to work on Chrome (and silently breaks everywhere else) is not cool, thought.

[0]: https://developer.mozilla.org/en-US/docs/Web/Web_Components

neom8y ago

As someone who struggles greatly with the written words, I'm so thankful to see this. For the last year or so I've poked around every few months to see if they'd opened this up more generally. I'd be more than happy to pay $30-60/mth (more if it had Spritz) for the ability to have high quality, high speech speed, text to speech for my emails, documents and news articles I'd like to consume.

aviv8y ago

I have not seen any mention on licensing and whether you can cache and replay voice responses. Amazon Polly specifically allows caching.

kokimame8y ago

I'm using Amazon Polly for a few of months to make videos for language learners. And I realize English voices powered by WaveNet slightly better than those of Amazon but the default Japanese sounds way too worse. Anyway, their pricing and platform are almost same with Amazon, so I definitely need to add another interface for this TTS into my app. You can listen to Amazon Polly voices with the video I made: https://www.youtube.com/watch?v=ysMp0k4oR5c

lysp8y ago

I picked 3 random paragraphs from a random article on a local online news site.

The voices did sound quite natural and "news-readery", however the one issue I did find is adding a pause between words.

With the example phrase: "He bought himself a boat and then took it to his house". You often expect a small pause after the word "boat".

I was able to manually fix it by adding some commas and full stops, however the AI was not able to pick up those pauses naturally.

It sounded like someone was rushing through the speech instead of stopping occasionally to "take a breath".

coryfklein8y ago

The demo is available at https://cloud.google.com/text-to-speech/

Requires Chrome.

j / k navigate · click thread line to collapse

118 comments

84 comments · 25 top-level

slaymaker19078y ago· 11 in thread

For any Google devs lurking out there, it doesn't seem to work at all in Firefox on Windows. It looks like it has something to do with custom web components with the following message:

ReferenceError: customElements is not defined

Also apparently some assertion errors with webcomponents (minified so line numbers not useful).

matthewmacleod8y ago

Doesn't work in Safari. Doesn't work in Firefox. Doesn't work in Edge.

halfteatree8y ago

> I don't want to be unreasonable, but Google used to at least generally support the idea of the open web.

Filing a bug report is good, but ranting on HN about how this is a sign of Google trying to steal the open Internet is at best unnecessary, and absolutely unreasonable.

Gatsky8y ago

shdh8y ago

Just needs a WebComponents polyfill.

Other browser vendors don't support WebComponents out of the box (yet).

rhizome8y ago

Google used to at least generally support the idea of the open web

Yes, but history teaches us that they are terrible at product management. I mean, they could at least just call it "Chrome TTS" or something until they put the whole thing together.

[0]: https://developer.mozilla.org/en-US/docs/Web/Web_Components/...

thesandlord8y ago

GCP person here. We are aware and are working on a fix!

(Should be as "simple" as a polyfill for webcomponents, but I don't want to put words in the team's mouth)

thesandlord8y ago

Update: Should be fixed now!

pietroglyph8y ago

the84728y ago

Best viewed with Chrome Explorer™ 5.0, Optimized for 1024x786 truecolor screens.

ehsankia8y ago

gcb08y ago

> something from google not working on firefox.

qeternity8y ago· 8 in thread

teraflop8y ago

I double checked, you are correct:

    16 * (4.5 * 110 * 60) / 1M = $0.475/hr

    16 * (4.5 * 150 * 60) / 1M = $0.648/hr

If you multiply by the number of 50ms in one second (20), you do get $9.5 - $12.96

qeternity8y ago

paulgb8y ago

Free SAAS business model: transform text by translating each word to its shortest homonym, and take a cut of the cost savings :)

froindt8y ago

Google does a remarkably good job with pronunciation. I'm consistently impressed by it.

vgt8y ago

Nice analysis! That's the rub with doing tco analysis. How much value do you put on higher order of manageability?

You should also include a workload volatility component to be entirely fair. Your analysis assumes it's entirely steady state.

(Work at g)

z3t48y ago

"the cloud" is just a better DRM.

oh-kumudo8y ago

How so? They don't sell the model itself, they sell the 'tickets' to allow you to take a picture of it. That is not DRM.

PostOnce8y ago· 5 in thread

Am I wrong in thinking that the cost of generating (realistic-sounding, learned-model) speech on commodity hardware will be near-zero soon, largely negating the value of a SaaS?

wslh8y ago

Some companies don't ever think in hiring people to build and maintain it, they prefer to pay a subscription like Google Cloud.

ariwilson8y ago

What open source TTS is as good as Google Cloud?

PostOnce8y ago

check the "kate" samples: https://github.com/Kyubyong/speaker_adapted_tts

This is with 1 minute of audio and 10 minutes of training, which is crazy to me. Maybe it's not "as good", but it's very good, and free, and it will get better, faster, and cheaper quickly?

https://research.googleblog.com/2018/03/expressive-speech-sy...

paulryanrogers8y ago

"... the hardware is going to keep getting more performant at the same price"

That is the hope but there are no guarantees. Perhaps specialized hardware can pick up where Moore's Law has tapered off.

PostOnce8y ago

There's no end in sight to the improvement of neural hardware, not like the wall x86 CPUs have hit anyway.

bufferoverflow8y ago· 5 in thread

I wish they had some beautiful voices, not some of the most generic-sounding men and women.

forgot-my-pw8y ago

Like Scarlett Johansson's? https://www.youtube.com/watch?v=3n5muEWaE_Q

jonknee8y ago

They're working on it! Check out the samples here:

The last set is specifically interesting for your wish.

gene19748y ago

Meryl Streep should be worried right about now! (I heard that last set of renders... whoa!)

gene19748y ago

This is MY VOICE as rendered by lyrebird.ai - https://lyrebird.ai/g/hFj87pbl - So, tech is out there trying to replicate the actual CHARACTER of individual human voices.

aantix8y ago

We need a virtual Barry White.

tristanj8y ago· 4 in thread

Are there any voice samples?

joefourier8y ago

alonmower8y ago

It is fun to mismatch voices/languages to hear some hilariously stereotypical accents

michaelbuckbee8y ago

If you switch between "Basic" and the "Wavenet" you can hear the difference noted in this announcement, it is noticeably much better.

dharma18y ago

Try changing the pitch by more than +-10 as well, it breaks up quite amusingly

coryfklein8y ago· 3 in thread

HN discussion here: https://news.ycombinator.com/item?id=16691197

vosper8y ago

I find it amusing that here we have all the corporatey buzzwords - "DeepMind WaveNet Technology" - but the other thing is called Tacotron.

gcb08y ago

if they named themselves DeepMind WaveNet Blockchain, they could have acquired Googlebet* by now.

* that's how i call google/alphabet when it doesn't matter which side of the tax-avoiding entity i am referring to.

[1] https://www.crockford.com/wrrrld/anguish.html

jjeaff8y ago

They also announced their new fluerosinth-ai amplitudinal speech to text deep-cloudmind product, although I can't find a reference link...

ollin8y ago· 3 in thread

Does anyone have a GitHub project for epub -> mp3 using this service yet (for automatic audiobook generation)? May make it myself if I have time but curious if anyone already has set it up.

EDIT 2: Can now convert epubs directly to mp3s on Overcast. Yay!

Immortalin8y ago

Shameless plug: https://auditus.cc

Uses Amazon Polly

technics2568y ago

Is your code on github?

ollin8y ago

(overcast uploading not shown–that's a separate script using mechanize)

Jakob8y ago· 3 in thread

Having an English text but setting the language to another one like German or French is hilarious.

You get e.g. ze Dscherman aczent or de frensch onehe.

jfno678y ago

I decided to do the opposite so French text with the wavenet English voice, pretty funny too.

pacaro8y ago

Also feeding it text from the Anguish Languish[1] corpus is pretty good.

davidklemke8y ago

I am ashamed to admit I spent far too much time doing exactly that. The Japanese one really cracked me up.

benjismith8y ago· 2 in thread

Interesting! I'd love to see a thorough comparison with the Amazon Polly service...

https://aws.amazon.com/polly/

Polly is priced at $4 per million characters and the Google WaveNet voices are $16 (compared with the Google non-WaveNet voices, which are also $4).

But I'd love to hear an informed opinion from someone with more expertise...

jakozaur8y ago

A lot of voice generation is cost-center (call center that are outsourced to cheapest location) with short sentences. I doubt industry would pay 4x price multiplier for that use-case.

To me Polly is way behind WaveNet when it comes to realism. Polly is robotic, WN is fluid.

WheelsAtLarge8y ago· 2 in thread

ghaff8y ago

remir8y ago

It's getting more and more human sounding. Take a look at this research (also from Google): https://research.googleblog.com/2018/03/expressive-speech-sy...

https://github.com/pndurette/gTTS

ryeguy_248y ago· 2 in thread

brad08y ago

It does if people are willing to listen to the voice for 10 hours+.

I could listen to this voice for a while, but the voice needs more emotion in it before it could be actually useful for long text.

dhon_8y ago

Tacotron (also by Google) looks promising in this area https://google.github.io/tacotron/publications/global_style_...

tmalsburg28y ago· 2 in thread

This is great, but there remain very difficult problems to be solved. The prosody generated by this is fairly generic and not informed by a true understanding of the text. Consider this sentence:

I have plans to leave.

londons_explore8y ago

Google has solved that here: https://news.ycombinator.com/item?id=16692559

tmalsburg28y ago

This is really cool, thanks for the link, but it solves a different problem.

StavrosK8y ago· 2 in thread

Is there any API for generating speech that sounds like Google Now's assistant? The quality of that is much, much better than this new service.

panarky8y ago

Yes, I use this:

Very simple and has the Google Assistant voice.

StavrosK8y ago

daoudc8y ago· 2 in thread

I think this is the missing thing that was needed to make this viable.

dragonwriter8y ago

Google Assistant already has all the pieces of that (maybe not all the social media connections one might want, I haven't looked much at that), and the ability to string them together.

nl8y ago

Try this:

Hey Google, Tell me about my Day.

and

Hey Google, Tell me the news.

There's a way you can get the news added to your daily briefing (the first trigger), but I can't remember how now.

StavrosK8y ago· 1 in thread

Here's a simple Python script that will fetch some sample audio using the request on the demo page and save it in a file:

https://www.pastery.net/nujfhw/

I have no idea what the rate limits are, so please don't abuse it, I wrote it because the demo didn't work in Firefox and I wanted to play around with it more extensively.

mxuribe8y ago

Thanks, pretty neat script; works nicely!

joelthelion8y ago· 1 in thread

I for one welcome our new wavenet telemarketing overlords...

dakna8y ago

I set the text to

"I'm sorry Dave. I'm afraid I can't do that."

to be prepared for whats coming ...

remir8y ago· 1 in thread

Imagine teaching these voices to sing. Something like DeepMind WaveNet Song Generator.

You upload your music to the cloud, set some parameters (genre, tempo, emotion, etc) and a bunch of lyrics and the thing will spit out awesome vocals for you.

severine8y ago

That's a billion dollar idea, I wonder who'll do it, maybe you!

ImJasonH8y ago· 1 in thread

Quick, someone remake Translation Party using Speech-to-Text-to-Speech-to-Text-to-Speech-ad-infinitum

https://cloud.google.com/text-to-speech/docs/quickstart https://cloud.google.com/speech/docs/sync-recognize

ImJasonH8y ago

Never say I never gave you anything, HN: https://gist.github.com/ImJasonH/78c22b36944b8ec189456e67e63...

verelo8y ago· 1 in thread

Is it just me, or would a demo really make this posting much more interesting?

Edit: There is one, on the actual Google Cloud Text-To-Speech page, so a few clicks in and you'll get one.

mintplant8y ago

Doesn't show up on Firefox or Edge. Only a blank space where I assume the demo should be. Console suggests some sort of Polymer/WebComponents error.

tambourine_man8y ago

The US English synthesized version is truly remarkable. Borderline scarily good.

The fact that the preview only seems to work on Chrome (and silently breaks everywhere else) is not cool, thought.