Show HN: JuliusJS – Speech recognition in JavaScript (opens in new tab)

(github.com)

202 pointszzmp11y ago52 comments

52 comments

40 comments · 16 top-level

zzmpOP11y ago· 7 in thread

Creator here - I ported this over from the open-source Julius using emscripten. AMA

Hi! Just curious on general emscripten work - how long does it take to port something like this? What are the main hurdles, and things that you can get stuck on?

zzmpOP11y ago

This was my first emscripten project - I would definitely recommend it. I found it to be a great tool. Two things to point out:

- If something is not documented well, it is probably tested well. If you can find the tests that cover it, you can usually get a good idea of how it works. - There is no multithreading. I had to fake it by breaking up my loops with setTimeouts.

There is an IRC channel if you need a place to go for help. Also, feel free to PM.

hardwaresofton11y ago

This is absolutely amazing, great work. This will open up a whole new world of possibilities.

In my time using offline speech-recognition tech, I never was able to use julius properly (docs were a little lacking) so I jumped on to CMUSphinx/pocketsphinx, but I think you've just done a huge amount of work to bring Julius out of obscurity (at least in my mind). Thanks very much

[EDIT] - I really can't get across enough how awesome this is, please add a gittip link or bitcoin address or something

cjbprime11y ago

Nice, thanks for doing this! Are you aware of any sites using speech rec for e.g. page navigation?

zzmpOP11y ago

No, but I think it's a great idea. If you end up using JuliusJS for this, please let me know. I still need to make some example applications to showcase what JuliusJS can do - I'll have them listed in the README when they're done.

I also came across PocketSphinx, which has been around a little longer - that may have some users in the wild already, and I wouldn't be surprised if it was used for navigation somewhere.

[PocketSphinx] https://github.com/syl22-00/pocketsphinx.js/

2 more replies

moron4hire11y ago

I'm using Chrome's speech recognition engine for a virtual reality in WebGL thing I'm building. It's a bit annoying, as it requires a network connection, and is rather buggy (I end up crashing Chrome about one ever ten sessions when using it, all of Chrome, every tab). Something like this would fit my needs a lot better.

ar7hur11y ago

The PIA Chrome extension enables you to open and manage some sites with speech. It was done at a hackathon using wit.ai

http://getpia.com

bubee11y ago· 2 in thread

Nice work. Can it return confidence scores? Say I want to load 3 commands in my page: 1. Click blue button 2. Scroll down in the yellow text area 3. Expand image of man I feed those to the engine, and when somebody speaks, I get a confidence score on each word so I can determine with a level of configurable certainty that the user is using the command: {click: 0.9878 confidence, blue: 0.8789 confidence, button: 0.1889 confidence)

Something like that...

zzmpOP11y ago

It does post them back from the worker, but the Julius interface doesn't expose them (yet). The way that Julius deals with confidence scores is also a little different (they're not fractional), so you'd need to account for that.

I'll be sure to include them soon - it's probably just a few more lines of code, so you can expect them in the onrecognition function this afternoon.

zzmpOP11y ago

OK it's done! Not well documented yet - that will wait for another day - but you can now access the score through the `onrecognition` event.

2 more replies

Gonzih11y ago· 2 in thread

Is there online demo anywhere on the web?

zzmpOP11y ago

I'll try to whip one up today and post it to the README.

Gonzih11y ago

Great, thanks!

bikamonki11y ago· 2 in thread

Can this be used to detect a voice's unique digital signature? For example I just say my name to login into a website?

crimsonalucard11y ago

I don't think voices have unique signatures. Voices vary widely but they can still share identical properties and be imitated.

ar7hur11y ago

Voices do have a unique signature. The technology do identify it is called Speaker Identification. See for instance http://research.microsoft.com/en-us/projects/whisperid

> Each person's voice is different. Some sounds, like "s", sound about the same no matter who says them, but other sounds, like vowels, tend to differ a lot from person to person. We use a special way of representing sound, the cepstrum, that captures lots of information, including the characteristic way you pronounce your vowels. Of course, someone could imitate the way you talk; fortunately, the cepstrum also captures certain fundamental characteristics of voices that are impossible to change. For instance, the length of your vocal tract -- the place where sound is produced in your body -- cannot be changed, and different length vocal tracts tend to produce cepstra with different characteristics. By identifying both the way you talk, and the way your body produces sound, WhisperID can do a great job of figuring out who you are.

2 more replies

cue232s11y ago· 2 in thread

Does your application need a nodejs backend for this library to work?

prezjordan11y ago

Looks like they're just providing examples on serving the library on a webserver. It's all static JavaScript.

zzmpOP11y ago

Right, it's all done on the client, so it is server agnostic.

1 more reply

kelvin011y ago· 2 in thread

This is the kind of technological challenge which must be fun to complete. And it must be quite satisfying for the author. However, whenever I see a 'XYZ in pure javascript', I keep getting the impression we are only delaying the inevitable moment browsers have to step to a superior language. Kinda like instead of quickly ripping off a bandaid is better than slooowwwwllly removing it ....

evan_11y ago

Seeing cool things written in JavaScript makes you think that JavaScript is doomed?

dubcanada11y ago

Not to be that guy, but it wasn't written in JS it was transpiled from C to JS.

2 more replies

ar7hur11y ago· 1 in thread

Thanks for sharing, nice work!

Quick question the Julius website says there is no English acoustic model available [1], how did you solve this? Do you provide a default acoustic model?

[1] http://julius.sourceforge.jp/en_index.php?q=en_grammar.html

zzmpOP11y ago

I used voxforge[1], a project made specifically to solve this problem:

> VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).

VoxForge's sample grammar is provided as a default, in its own folder [2]. It would be nice to get a high-quality acoustic model, as voxforge's is not that comprehensive yet, but I couldn't find anything with the right licensing and zero cost. If anyone knows of one, I'd love to hear about it.

[1] http://www.voxforge.org/ [2] https://github.com/zzmp/juliusjs/tree/master/dist/voxforge

danso11y ago· 1 in thread

This is sweet...to get an idea of how much fun this could be for web apps, check out the Annyang library (https://www.talater.com/annyang/), which wraps around the Google Web voice recognition API...it works very well, but of course, is subject to Google's terms...so an open source system is very welcome

_pius11y ago

Strictly speaking, Annyang wraps the HTML5 speech recognition API, not Google's specifically.

hugozap11y ago· 1 in thread

Great work, this will be another cool library i star on GitHub and never do anything about it :/

cue232s11y ago

LOL!! IKR

jergason11y ago· 1 in thread

I've played with pocketsphinx.js a fair amount, but this looks WAAAAAAAAAY easier to set up and consume. Nice work.

zzmpOP11y ago

pocketsphinx.js looks amazing, but there's definitely a barrier to entry if you've never worked with speech recognition before. That was my biggest goal in porting this tool - a nice, abstracted API. Glad you like it :)

sunsu11y ago· 1 in thread

Can you use any of the CMUSphinx compatible language models with this, or is there a tool to convert them to something Julius supports?

zzmpOP11y ago

I believe that Julius offers their own tools for conversions like these, but the best place to consult would be the JuliusBook [1].

[1] http://julius.sourceforge.jp/juliusbook/en/

borplk11y ago· 1 in thread

Genuinely not sure if the demo is a joke or not.

I said "hello" it said "DIAL OH OH".

I said "Apple" it said "GET KENT".

WTF?

zzmpOP11y ago

In order to get a more comprehensive vocabulary/grammar, you need to substitute out the sample that it comes with. There are instructions in the README. For the demo, it just uses the sample grammar that voxforge provides, which is (as you can see) fairly limited.

cssandjs11y ago· 1 in thread

This is awesome - who cares about Windows 10 or Linux, Javascript is the new OS.

zzmpOP11y ago

It's definitely a fun little toy...

http://node-os.com/

http://osjsv2.0o.no/

zzmpOP11y ago

There is now a (very rudimentary) demo on the GitHub page: zzmp.github.io/juliusjs

Much thanks to @iffy for writing the first pass.

It uses voxforge's sample vocabulary, so you'll need to say things like "Dial 1 2 3" or "Call Kenneth McDougall" for it to understand you, but the vocabulary is easily swapped out for your own projects, as explained in the README.

yeukhon11y ago

Pretty cool! When I did my project I had to use https://github.com/kn/speak.js which is an amazing library. The library still works on Firefox 30, 31 by the time I finished my project (and the project itself hasn't change much for a year or two!).

I would definitely give this JuliusJS library a try. I am actually amazed that JuliusJS doesn't carry all the heavy data like speak.js does (multiple languages support though). I love the fact that you state 100% client side!

CmonDev11y ago

Wow, speech recognition in a Turing-complete language. Amazing.

j / k navigate · click thread line to collapse

52 comments

40 comments · 16 top-level

zzmpOP11y ago· 7 in thread

Creator here - I ported this over from the open-source Julius using emscripten. AMA

binarymax11y ago

Hi! Just curious on general emscripten work - how long does it take to port something like this? What are the main hurdles, and things that you can get stuck on?

zzmpOP11y ago

This was my first emscripten project - I would definitely recommend it. I found it to be a great tool. Two things to point out:

There is an IRC channel if you need a place to go for help. Also, feel free to PM.

hardwaresofton11y ago

This is absolutely amazing, great work. This will open up a whole new world of possibilities.

[EDIT] - I really can't get across enough how awesome this is, please add a gittip link or bitcoin address or something

cjbprime11y ago

Nice, thanks for doing this! Are you aware of any sites using speech rec for e.g. page navigation?

zzmpOP11y ago

I also came across PocketSphinx, which has been around a little longer - that may have some users in the wild already, and I wouldn't be surprised if it was used for navigation somewhere.

[PocketSphinx] https://github.com/syl22-00/pocketsphinx.js/

2 more replies

moron4hire11y ago

ar7hur11y ago

The PIA Chrome extension enables you to open and manage some sites with speech. It was done at a hackathon using wit.ai

http://getpia.com

bubee11y ago· 2 in thread

Something like that...

zzmpOP11y ago

I'll be sure to include them soon - it's probably just a few more lines of code, so you can expect them in the onrecognition function this afternoon.

zzmpOP11y ago

OK it's done! Not well documented yet - that will wait for another day - but you can now access the score through the `onrecognition` event.

2 more replies

Gonzih11y ago· 2 in thread

Is there online demo anywhere on the web?

zzmpOP11y ago

I'll try to whip one up today and post it to the README.

Gonzih11y ago

Great, thanks!

bikamonki11y ago· 2 in thread

Can this be used to detect a voice's unique digital signature? For example I just say my name to login into a website?

crimsonalucard11y ago

I don't think voices have unique signatures. Voices vary widely but they can still share identical properties and be imitated.

ar7hur11y ago

Voices do have a unique signature. The technology do identify it is called Speaker Identification. See for instance http://research.microsoft.com/en-us/projects/whisperid

2 more replies

cue232s11y ago· 2 in thread

Does your application need a nodejs backend for this library to work?

prezjordan11y ago

Looks like they're just providing examples on serving the library on a webserver. It's all static JavaScript.

zzmpOP11y ago

Right, it's all done on the client, so it is server agnostic.

1 more reply

kelvin011y ago· 2 in thread

evan_11y ago

Seeing cool things written in JavaScript makes you think that JavaScript is doomed?

dubcanada11y ago

Not to be that guy, but it wasn't written in JS it was transpiled from C to JS.

2 more replies

ar7hur11y ago· 1 in thread

Thanks for sharing, nice work!

Quick question the Julius website says there is no English acoustic model available [1], how did you solve this? Do you provide a default acoustic model?

[1] http://julius.sourceforge.jp/en_index.php?q=en_grammar.html

zzmpOP11y ago

I used voxforge[1], a project made specifically to solve this problem:

> VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).

[1] http://www.voxforge.org/ [2] https://github.com/zzmp/juliusjs/tree/master/dist/voxforge

danso11y ago· 1 in thread

_pius11y ago

Strictly speaking, Annyang wraps the HTML5 speech recognition API, not Google's specifically.

hugozap11y ago· 1 in thread

Great work, this will be another cool library i star on GitHub and never do anything about it :/

cue232s11y ago

LOL!! IKR

jergason11y ago· 1 in thread

I've played with pocketsphinx.js a fair amount, but this looks WAAAAAAAAAY easier to set up and consume. Nice work.

zzmpOP11y ago

sunsu11y ago· 1 in thread

Can you use any of the CMUSphinx compatible language models with this, or is there a tool to convert them to something Julius supports?

zzmpOP11y ago

I believe that Julius offers their own tools for conversions like these, but the best place to consult would be the JuliusBook [1].

[1] http://julius.sourceforge.jp/juliusbook/en/

borplk11y ago· 1 in thread

Genuinely not sure if the demo is a joke or not.

I said "hello" it said "DIAL OH OH".

I said "Apple" it said "GET KENT".

WTF?

zzmpOP11y ago

cssandjs11y ago· 1 in thread

This is awesome - who cares about Windows 10 or Linux, Javascript is the new OS.

zzmpOP11y ago

It's definitely a fun little toy...

http://node-os.com/

http://osjsv2.0o.no/

zzmpOP11y ago

There is now a (very rudimentary) demo on the GitHub page: zzmp.github.io/juliusjs

Much thanks to @iffy for writing the first pass.

yeukhon11y ago

CmonDev11y ago

Wow, speech recognition in a Turing-complete language. Amazing.

j / k navigate · click thread line to collapse