That said, the examples are impressive and I can't wait for this level of understandability to become standard in my calls.
There's about equal male and female speakers, though codecs always have slight perceptual quality biases (in either direction) that depend on the pitch. Oh, and everything here is speech only.
It's true that bias questions often come up in ML context because fundamentally these algorithms do not work without data, but _all_ algorithms are designed by people, and _many_ can involve data in setting their parameters. Both of which can be sources of bias. ML is more known for it, I believe, because the _inductive_ biases are less than in traditional algorithms, and therefore are more keen to adopt biases present in the dataset.
"Pretends" is a problematic choice of words, because it anthropomorphizes the algorithm. It would be more accurate and less misleading to replace "pretends to be" with "approximates". But then it wouldn't serve your goal of (seeming to) establish a categorical difference between this approach and "regular algorithms", because that's what a regular algorithm does too.
I apologize, because the above might sound rude. It's not intended to be.
In my country, public restroom facilities replaced all the buttons and levers on faucets, towel dispensers, etc. with sensors that detect your hand under the faucet. Black people tell me they aren't able to easily use these restrooms. I was surprised when I heard this, but if you google this, it's apparently a thing.
Why does this happen? After all, the companies that made these products aren't obviously biased against black people (outwardly, anyway). So this sort of mistake must be easy to fall into, even for smart teams in good companies.
The answer ultimately boils down to ignorance. When we make hand detector sensors for faucets, we typically calibrate them with white people in mind. Of course different skin tones have different albedo and different reflectance properties, so sensors are less likely to fire. Some black folks have a workaround where they hold a (white) napkin in their hand to get the faucet to work.
How do we prevent this particular case from happening in the products we build? One approach is to ensure that the development teams for skin sensors have a wide variety of skin types. If the product development team had a black guy for example, he could say "hey, this doesn't work with my skin, we need to tune the threshold." Another approach is to ensure that different skin types are reflected in the data used to fit the skin statistical models we use. Today's push for "ethics in ML" is borne out of this second path as a direct desire to avoid these sorts of problems.
I like this handwashing example because it's immediately apparent to everyone. You don't have to "prioritize DEI programs" to understand the importance of making sure your skin detector works for all skin types. But, teams that already prioritize accessibility, user diversity, etc. are less likely to fall into these traps when conducting their ordinary business.
For this audio codec, I could imagine that voices outside the "standard English dialect" (e.g. thick accents, different voices) might take more bytes to encode the same signal. That would raise bandwidth requirements, worsen latency, and increase data costs for these users. If the codec is designed for a standard American audience, that's less of an issue, but codecs work best when they fit reasonably well for all kinds of human physiology.
I wouldn't consider this a matter of ethics, and more of a technology limitations or ignorance.
Machine learning, statistical models, procedural generation, literally an usage of heuristics are all being called "AI" nowadays which obfuscates the "boring" nature in favor of "exciting buzzword"
Selecting the quality of a video based on your download speed? That's "AI" now.
Yes, that was the reason for my comment. :)
It might be an interesting test to compare what people mishear with and without this kind of compensation.
(aKKtually, there IS opus codec, supported by pixel phones - google made it for VR/AR stuff. No one uses it, there are about ~1 headphone with opus support )
Realistically for most headphones people actually buy AAC (LC and HE) is more than good enough encoding quality for the audio the headphones can produce. Even if Opus was in the spec and Opus-supporting chips were common there would still be a hojillion Bluetooth devices in the wild that wouldn't support it.
It would be cool to have Opus in A2DP but it would take a BT SIG member that was really in love with it to get it in the profile.
Also, on my system (Android phone + BTR5/BTR15 Bluetooth DAC + Sennheiser H600) all options sound realy crappy compared to plain old usb, everything else is the same. LDAC 990kbps is less crappy, by sheer brute force. I suspect it's not only codec but other co-factors as well (like mandatory DSP on phone side)
Folks set up pools all the time, but somehow they never offer indemnification for completeness of the pool - because they can't.
See https://en.kangxin.com/html/2/218/219/220/11565.html for a few examples how the patent pool extortion scheme already went wrong in the past.
The existence of a patent pool does not mean there are valid patent claims against it. But yes, you may be technically correct by saying "patent free" rather than "not infringing on any valid patents". That said historically Opus has had claims against it by patents that looked valid but upon closer investigation didn't actually cover what the codec does.
Just looks like FUD to me. Meanwhile, the patent pools of competing technologies definitely still don't offer indemnification they cover all patents, but have no problem paying a bunch of people to spew exactly this kind of FUD - they're the ones who tried to set up this "patent pool" to begin with!
Looks like the right direction for embedded algos and it seems to be a pretty unexplored one, as compared to the current fashion to do ML E2E.
https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...
I still think the pendulum will swing back there again, to have even better battery/larger models on mobile.
The quality left a little to be desired...!
$ ffmpeg -i female_ref.wav - acodec real_144 female_ref.ra
And if you can't support that I put it back to wav and posted it: http://9ol.es/female_ref-ra.wavThis was seen as "14.4" audio, for 14.4kb/s dialup in the mid-90s. The quality increase over those nearly 30 years for what you can get out of what's actually a fewer number of bytes is really impressive.
...How far can ML PLC "hallucinate" audio? A sound , a syllable, a whole word, half a sentence?
Can I trust anymore what I hear?
Very likely a coincidence.
https://opus-codec.org/release/stable/2024/03/04/libopus-1_5...
I'll definitely play around with these new ML features!
Interesting :)
AOO has already been used successfully in a few art projects. It's also used under hood by SonoBus. The Pd objects are already stable, I hope to finish the C/C++ API and add some code examples soon.
Any feedback from your side would of course be very appreciated.
But the encoders are already there: https://medium.com/@migel_95925/supercharging-jpeg-with-mach... https://compression.ai/
I'd be very interested if I could move a variety of mp3s, aacs and vorbis to Opus if I knew additional quality loss was minimal.