Does anyone have experience using it? Is it any good?
- F-Droid: https://f-droid.org/packages/org.dicio.dicio_android/
- Source: https://github.com/Stypox/dicio-android
- HN: https://news.ycombinator.com/item?id=29762526
The accuracy of the English language recognition is not bad. I'm glad to see an implementation of Vosk for desktop Linux.
It was significantly better than the other FLOSS options I looked at--both in terms of getting it going initially & the quality of the speech to text results.
I tested it with a lightly modified version of this example script: https://github.com/alphacep/vosk-api/blob/master/python/exam...
What I found particularly interesting was when you have the "partial" recognition output shown in real-time you get to see how--at the end of a sentence--it may change a word earlier in the sentence in the final recognition output based on (I guess) the additional context of the full sentence.
(I just did a quick test again (with the installs from my testing last year) using an internal laptop microphone & the test script recognized a significant chunk of my speech (using a headset definitely improves things though) whereas with the same environment a test with `mic_vad_streaming` (from `DeepSpeech-examples-r0.9` with `deepspeech-0.9.0-models.pbmm`) failed to recognize any words at all.)
Woah, we really do write 2022.
At any rate, what I am trying to say is that if the case of having poorly documented (i.e. usually untested documentation) piece of software is high, then we definitely are doing something wrong. You should be able to follow the installation instructions and it should work, i.e. just read INSTALL or README and follow the instructions, like good old times!
You said it yourself: "I don't know why this is a hard bar to clear, but bravo.". It should not be, it should be expected, and it should be done. It should not be a magical or surprising thing.
This program outputs like a keyboard. And, in English at least, it works really well. I cannot believe it.
> just open a max this program output like a keyboard and in
> english at least it works really well i cannot believe it
The only problems that I see are:1. Capitalization and punctuation.
2. Doesn't know what emacs is, so it got that wrong. A user-installed dictionary might help here.
3. "outputs" came out as "output". I just tried a few more times, and I got the same results. I suspect that like "emacs", the word "outputs" is not in the dictionary.
This should make my life a lot easier because I find myself going to my phone and using the dictation feature a lot recently. It's not as good as the one on my android, but it's 95% of the way there.
> i'm throwing another hat in the ring as this technology totally working most
> of the time i used it to write this comment this should make my life ah lot
> easier because i find myself going to my phone and using third dictation
> feature a lot recently it's not as good as the one on my android for it's
> ninety five percent of the way fair
For use with no training that looks great. I'm sure that as I learn to speak more clearly, your 95% estimate is achievable.This remap makes the "." key add a period after the previous word, capitalize the current word, then move on to the next word:
:noremap . bea.<esc>w~wThis could be similar to what YouTube does with it's automatic subtitles. What do you guys say?
"...continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification ... can also create subtitles for movies, transcription for lectures and interviews."
* https://github.com/alphacep/vosk-api/blob/master/python/exam...
* https://github.com/alphacep/vosk-api/blob/master/python/exam...
* https://github.com/alphacep/vosk-api/blob/master/python/exam...
(Edit: Also, thanks for introducing me to "libretranslate", looks like an interesting project.)
The reason is probably something pragmatic, like perhaps a large enough corpus was available for that specifically.
evStart: start speaking in an erotic voice
evStop: stop speaking in an erotic voice
evQuery: query whether you are speaking in an erotic voice
evLinuxClassic: enable the inability to speak until Firefox is closed (experimental)
Meanwhile, the meat of this speech recognizition is Vosk, which is just Apache-2: https://github.com/alphacep/vosk-api/blob/master/COPYING
> $ ./run.sh
> [+] Prompting for admin to set up Tobii udev rule
> [sudo] password for dotancohen:
That does not build trust. I would prefer an instruction on how to set up a udev rule, or better yet, I would prefer that requirement to be relaxed. What does it need more than standard microphone access that e.g. nerd-dictation or even Telegram need?the at&t website with text to speech as audio file which were used in these anonymous publications are good, but not espeak. if i had sth like this for european (and russian and arab languages) as open source standalone, i would be happy :(
The project is called Larynx, and it is amazing: https://github.com/rhasspy/larynx/
I waxed lyrical about it recently in this thread about private alternatives to Alexa: https://news.ycombinator.com/item?id=29562526
I can only vouch for the quality/variety in English but it does note support for 50 voices over 9 languages, including all the first group of languages you mentioned, and also Russian. (I've "played" with all those languages to test them but can't really vouch for how a native speaker/listener might find it. :D )
It is miles ahead of any of the other Free/Open Source TTS solutions I've tried, including the ones you mentioned.
(It's still synthesized speech but the output quality is so good and the project is still extremely early days.)
And there's a range of options in accent & gender--which are in general sorely lacking in other FLOSS TTS options. (In terms of licensing, some voices are licensed more freely than others but the majority are without significant restriction.)
I like Larynx so much that I've been working on an editor for it to assist in "auditioning" & recording speech in a narrative context, e.g. game/film pre-viz.
I was looking for this type of tech for at least 2 years and I am glad it now exists.
FOSS is amazing!
I think of wax recording rolls - old days CDs, aka Phonograph cylinder: