It’s a sad sad day when you have an organisation getting hundreds of millions in funding and turning away from what’s its good at. The decline has begun in my eyes, it may not become apparent for a few years yet.
It would be nice if Mozilla could tell us what their focus is going to be, but I doubt that Mozilla management know at this point.
At this point I’m somewhat concerned that Firefox will be irrelevant in fives years, and I don’t currently feel that Mozilla is communicating clearly that they still care about Firefox. I assume they must, but it would be comforting to know that Firefox is still at the core of Mozillas strategy.
I disagree precisely because of the point you make later: "I’m somewhat concerned that Firefox will be irrelevant in fives years".
Functionality provided by deep learning is going to be an important component of many types of software interactions going forward. The logistics of this will be quite different from what we are used to in open source, with the need to fund and coordinate compute, collect and handle data being a more vital aspect compared to the past.
There are STT software, some mentioned in this thread, that match or are even better than DeepSpeech but none of them are as ergonomic. Accounting for the value of time, this means it will be more cost effective to outsource such capabilities to the cloud. Which comes with trade-offs that are difficult to appreciate in the short term: https://news.ycombinator.com/item?id=24236489
I'd say DeepSpeech fits in the mold of Mozilla as a company providing solutions to complicated software problems that are better at respecting the user and their privacy.
In the old days, the most accurate TTS and STT models were built into the OS. These days, you need to call into the cloud to get the best stuff. In [1], Internet Archive complains about the quality of their OCR software. It's not that OCR is so bad, it's that the best OCR is found on Google's and Microsoft's computers. It's possible to cobble something together using open source solutions like EasyOCR, Tesseract+OpenCV but that will only get you part of the way there. What makes the cloud offerings so good is they have enough resources to devote to pre-processing pipelines and architecture tweaks and settings better able to handle edge cases. Most of the mass resides in edge cases.
From my vantage, the future looks to be one of software as thin layers built atop APIs which call into programs running on the servers of a handful of companies. You might not think this a big deal but these software will be the ones scanning the environment, writing the emails, completing the thoughts and planning the calendars for the majority of humans.
[1] https://blog.archive.org/2020/08/21/can-you-help-us-make-the...
[0] - https://latecheckout.substack.com/p/the-guide-to-unbundling-...
EDIT: Add link
Google/Baidu, universities, and an assortment of Chinese/Japanese/Korean social media companies (Line, etc.) are posting the most compelling TTS research, models, and code. Mozilla's TTS system [2] is an amalgam of some of these models, but it lags pretty far behind state of the art.
Mozilla should focus on getting additional revenue streams. We can help them out by trying to get Congress / DOJ to strip Google of its ability to have and maintain a browser with which they entrench their search and advertising moat. I think they're clearly in antitrust/anticompetitive territory.
[1] I'm pretty familiar with this field as I wrote https://vo.codes and https://trumped.com TTS systems. Neither of those are state of the art in terms of mean opinion score (MOS), but they're incredibly efficient.
I also believe Mozilla team was restricted by a lack of computing resources. They had just a single 8GPU server or so.
This is worrying given that Google cripples the browser and web standards to favor its own search engine and advertising platform.
Killed the semantic web and semantic markup? Check.
Disabled APIs for blocking ads? Check.
Use Google.com as the default search? Yep.
Embrace and extend the web with AMP and instant apps? Bingo.
Auto log into your Google session or nag until users permit it? Absolutely.
Trying to destroy the notion of a URL? I thought those were cool.
Google is destroying the web and is about as anti-competitive as they come.
“ Most of the technical changes were already landed, and we see no reason not to ship it. We’ll be releasing 1.0 soon and encourage everyone to update their applications”
So looks like at least 1.0 is near and still gonna happen... I know these seem like dark times for Mozilla but I believe they will survive. As I recall the decline of Netscape was a pretty dark time and out of that came Phoenix - er Firefox and here we are today... I’m sure Mozilla and many of the great projects will survive
It’s not for a lack of trying on their part for sure, but it feels like just using their browser isn’t all there is to it any more
For someone that found Linux in the 90's and watched the birth of Mozilla from the ashes of Netscape, that's a very strange thing to read.
This site is not Slashdot, I know. It always had another kind of relation to business and money. But still...
I have no idea why Mozilla should need a business model. Much less I understand why should we think of one and agree on it.
How much money does it take to maintain a web browser? If it's a lot, maybe, just maybe, we should agree on a reduced feature set and refuse to use something more complex. Some people here talk about text mode browsers. I'm not so radical. Just keep it simple enough to be maintanable by a dozen of volunteers.
Isn’t the main problem that users are not willing to pay for the browser they use?
Google Chrome is probably maintained by much more than 12 people, so if we restrict Firefox to that, everyone is just going to move to Chrome anyways.
Because developers aren't free and "let's get money from Google searches" is great until Google decides not to fund a competitor any more.
It'd be really awesome if they could develop a search engine or phone (I know they tried) that had an open standards / web-compatible development kit.
I want an anti-Google / anti-Apple. Something we own and can extend. Something that doesn't sell our data.
I'd also like to see Mozilla doing lobbying. Partnering with the EFF. We've strayed so far from the bright and open Internet of the 90's and 00's. It's depressing to think about how locked up and proprietary it's all become.
I'll buy Mozilla / Firefox merch. I'll pay a subscription.
edit: Talk to Shuttleworth. Fold Ubuntu in. I'll buy a Mozilla phone and a Mozilla laptop.
If they offered something like the services offered by mailbox.org, or Librem One? I'd switch my GMail account tomorrow, including the storage fees I'm paying on it, and would do it at triple the cost for not abusing my data. Hell, they already have the domain experience with their proximity to Thunderbird devs.
Other good ones are https://github.com/daanzu/kaldi-active-grammar and https://talonvoice.com/
There are toolkits for research like https://github.com/kaldi-asr/kaldi, https://github.com/espnet/espnet, wav2letter, Espresso, Nvidia/Nemo, https://github.com/didi/athena. You can try them too if you want to go deep. Some of them have interesting capabilities.
When I use vosk-model-en-us-daanzu-20200328 the result is perfect on many of these tests, though it does not do punctuation or capitalization outside apostrophes. IIRC there is another project on Github that can add basic formatting though.
I am quite surprised with vosk's performance, it even handles odd words like Puget Sound well! Need to test our more accented audio on it, but this is quite exciting.
E.g. some very active projects are:
* Kaldi (https://github.com/kaldi-asr/kaldi/) obviously, probably the most famous one, and most mature one. For standard hybrid NN-HMM models and also all their more recent lattice-free MMI (LF-MMI) models / training procedure. This is also heavily used in industry (not just research).
* ESPnet (https://github.com/espnet/espnet), for all kind of end-to-end models, like CTC, attention-based encoder-decoder (including Transformer), and transducer models.
* Espresso (https://github.com/freewym/espresso).
* Google Lingvo (https://github.com/tensorflow/lingvo). This is the open source release of Googles internal ASR system, and used by Google in production (their internal version of it, which is not too much different).
* NVIDIA OpenSeq2Seq (https://github.com/NVIDIA/OpenSeq2Seq).
* Facebook Fairseq (https://github.com/pytorch/fairseq). Attention-based encoder-decoder models mostly.
* Facebook wav2letter (https://github.com/facebookresearch/wav2letter). ASG model/training.
* (RETURNN (https://github.com/rwth-i6/returnn) and RASR (https://github.com/rwth-i6/rasr), our own, although this is currently free for academic use only. It is used in production as well. Supports hybrid NN-HMM, CTC, end-to-end attention-based encoder-decoder, transducer, etc.)
And there are much more.
You will also find lots of ready-to-use trained models.
I sincerely hope we can help make this project continue and that Mozilla can help us do that.
Ensuring indigenous languages have digital representation is essential to their survival. Speech recognition and synthesis are a vital part of that. Indigenous communities are often ignored by Big Tech because they bring little financial value to their bottom lines, but financial bottom lines are not everything. Culture is more important. Open source tools like DeepSpeech allow communities to build the tools they need for themselves.
Māori have been working to help build tools for te reo Māori, and our project is at the forefront of using open source tools like DeepSpeech to revitalize the Māori language. The core of a good speech recognition system helps us in many practical ways, such as improved transcription, support for pronunciation, correct announcements in public transport, correct information on maps and in many other ways. We may well continue to support and use DeepSpeech if the project can continue.
But there are also many other projects in other countries in the world who may follow on - such as the Kabyle people of Algeria who are using DeepSpeech, or the Mohawk nation in North America who have been looking into it.
By the way we are working on our web presence but for now this quick one pager gives some idea of the work we are doing - https://papareo.nz.
https://discourse.mozilla.org/t/mozilla-org-wide-updates-imp...
That's a physical device with all computation and information local?
And you'd pay for an upfront cost of the device as well as a subscription?
What comes with the subscription? Updated data, new features? Just keeping the lights on?
Other options for similar assistants that can also use DS are Mycroft (https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customiz...) and DragonFire (https://github.com/DragonComputer/Dragonfire)
The actual forum post just says they don't know anything about the future of DeepSpeech yet, for those doing the same.
You could say that "keep the lights on" is the same as on hold.
There's no technology left anymore.