Opus 1.5 released: Opus gets a machine learning upgrade (opens in new tab)

(opus-codec.org)

387 pointssummm2y ago138 comments

138 comments

81 comments · 20 top-level

luplex2y ago· 16 in thread

I wonder: did they address common ML ethics questions? Specifically: Are the ML algorithms better/worse on male than on female speech? How about different languages or dialects? Are they specifically tuned for speech at all, or do they also work well for music or birdsong?

That said, the examples are impressive and I can't wait for this level of understandability to become standard in my calls.

jmvalin2y ago

Quoting from our paper, training was done using "205 hours of 16-kHz speech from a combination of TTS datasets including more than 900 speakers in 34 languages and dialects". Mostly tested with English, but part of the idea of releasing early (none of that is standardized) is for people to try it out and report any issues.

There's about equal male and female speakers, though codecs always have slight perceptual quality biases (in either direction) that depend on the pitch. Oh, and everything here is speech only.

radarsat12y ago

This is an important question. However, I'd like to point out that similar biases can easily exist for non-ML, hand-tuned algorithms. Even in the latter case test sets and often even "training" and "validation" sets are used for finding good parameters. Any of these can be a source of bias, as can the ears of evaluators making these decisions.

It's true that bias questions often come up in ML context because fundamentally these algorithms do not work without data, but _all_ algorithms are designed by people, and _many_ can involve data in setting their parameters. Both of which can be sources of bias. ML is more known for it, I believe, because the _inductive_ biases are less than in traditional algorithms, and therefore are more keen to adopt biases present in the dataset.

MauranKilom2y ago

As a notable example, the MP3 format was hand-tuned to vocals based on "Tom's Diner" (i.e. a female voice). It has been accused of being biased towards female vocals as a result.

thomastjeffery2y ago

Usually regular algorithms aren't generating data that pretends to be raw data. That's the significant difference here.

shwaj2y ago

Can you precisely define what you mean by "generating" and "pretends", in such a way that this neural network does both these things, but a conventional modern audio codec doesn't?

"Pretends" is a problematic choice of words, because it anthropomorphizes the algorithm. It would be more accurate and less misleading to replace "pretends to be" with "approximates". But then it wouldn't serve your goal of (seeming to) establish a categorical difference between this approach and "regular algorithms", because that's what a regular algorithm does too.

I apologize, because the above might sound rude. It's not intended to be.

1 more reply

samus2y ago

Not really. Any lossy codec is generating data that pretends to be close to the raw data.

1 more reply

unixhero2y ago

Why is the ethics question important? It is a new feature for an audio codec, not a new material to teach in your kids curriculum.

nextaccountic2y ago

Because this gets deployed in real world, affecting real people. Ethics don't exist only in kids curriculum.

unethical_ban2y ago

I get your point, but the questioner wasn't being rude or angry, only curious. I think it's a valid question, too. While it isn't as important to be neutral in this instance as, say, a crime prediction model or a hiring model, it should be boilerplate to consider ML inputs for identity neutrality.

gcr2y ago

This is a great question! Here's a related failure case that I think illustrates the issue.

In my country, public restroom facilities replaced all the buttons and levers on faucets, towel dispensers, etc. with sensors that detect your hand under the faucet. Black people tell me they aren't able to easily use these restrooms. I was surprised when I heard this, but if you google this, it's apparently a thing.

Why does this happen? After all, the companies that made these products aren't obviously biased against black people (outwardly, anyway). So this sort of mistake must be easy to fall into, even for smart teams in good companies.

The answer ultimately boils down to ignorance. When we make hand detector sensors for faucets, we typically calibrate them with white people in mind. Of course different skin tones have different albedo and different reflectance properties, so sensors are less likely to fire. Some black folks have a workaround where they hold a (white) napkin in their hand to get the faucet to work.

How do we prevent this particular case from happening in the products we build? One approach is to ensure that the development teams for skin sensors have a wide variety of skin types. If the product development team had a black guy for example, he could say "hey, this doesn't work with my skin, we need to tune the threshold." Another approach is to ensure that different skin types are reflected in the data used to fit the skin statistical models we use. Today's push for "ethics in ML" is borne out of this second path as a direct desire to avoid these sorts of problems.

I like this handwashing example because it's immediately apparent to everyone. You don't have to "prioritize DEI programs" to understand the importance of making sure your skin detector works for all skin types. But, teams that already prioritize accessibility, user diversity, etc. are less likely to fall into these traps when conducting their ordinary business.

For this audio codec, I could imagine that voices outside the "standard English dialect" (e.g. thick accents, different voices) might take more bytes to encode the same signal. That would raise bandwidth requirements, worsen latency, and increase data costs for these users. If the codec is designed for a standard American audience, that's less of an issue, but codecs work best when they fit reasonably well for all kinds of human physiology.

2 more replies

The_Colonel2y ago

Imagine you release a codec which optimizes for cis white male voice, every other kind of voice has perceptibly lower fidelity (at low bitrates). That would not go well...

panzi2y ago

Yeah, imagine a low bitrate situation where only English speaking men can still communicate. That would create quite a power imbalance.

overstay89302y ago

Meanwhile G.711 makes all dudes sound like disgruntled middle aged union workers

numpad02y ago

No offense/taken, but Codec2 seem to be affected a bit for this problem.

samus2y ago

This is actually a very technical question since it means the audio codec might simply not work that well in practice as it could and should.

kolinko2y ago

As a person from a different language/accent who has to deal with this on a regular basis - having assistants like Siri not understand what I want to say, even though native speakers don't have such problem... Or before an advent of UTF - websites and apps ignoring special characters usable in my language.

I wouldn't consider this a matter of ethics, and more of a technology limitations or ignorance.

mikae12y ago· 9 in thread

They’ll have my upvote just for writing ML instead AI. Seriously, this is very exciting developments for audio compression.

wilg2y ago

This is something you really shouldn’t spend any cycles worrying about.

sergiotapia2y ago

I'd just like to interject for a moment. What you're referring to as AI, is in fact, Machine Learning, or as I've recently taken to calling it, Machine Learning plus Traditional AI methods.

1 more reply

claudiojulio2y ago

Machine Learning is Artificial Intelligence. Just look at Wikipedia: https://en.wikipedia.org/wiki/Artificial_intelligence

declaredapple2y ago

Many people are annoyed by the recent influx of calling everything "AI".

Machine learning, statistical models, procedural generation, literally an usage of heuristics are all being called "AI" nowadays which obfuscates the "boring" nature in favor of "exciting buzzword"

Selecting the quality of a video based on your download speed? That's "AI" now.

mook2y ago

On the other hand, it means that you can assume anything mentioning AI is overhyped and probably isn't as great as they claim. That can be slightly useful at times.

sitzkrieg2y ago

im quite tired of this. every snake oil shop now calls any algorithm "a i" to sound hip and sophisticated

mikae12y ago

> Many people are annoyed by the recent influx of calling everything "AI".

Yes, that was the reason for my comment. :)

samus2y ago

So are compilers and interpreters. The terminology changes, but since we still don't have a general, systematic, and precise definition of what "intelligence" means, the term AI is and always was ill-founded and a buzzword for investors. Sometimes, people get disillusioned, and that's how you get AI winters.

xcv1232y ago

Machine Learning is a subset of AI

travisporter2y ago· 6 in thread

Very cool. seems like they addressed the problem of hallucination. would be interesting to see an example of it hallucinating without redundancy and corrected with redundancy

CharlesW2y ago

Isn't packet loss concealment (PLC) a form of hallucination? Not saying it's bad, just that it's still Making Shit Up™ in a statistically-credible way.

jmvalin2y ago

Well, there's different ways to make things up. We decided against using a pure generative model to avoid making up phoneme or words. Instead, we predict the expected acoustic features (using a regression loss), which means that model is able to continue a vowel. If unsure it'll just pick the "middle point", which won't be something recognizable as a new word. That's in line with how traditional PLCs work. It just sounds better. The only generative part is the vocoder that reconstructs the waveform, but it's constrained to match the predicted spectrum so it can't hallucinate either.

2 more replies

derf_2y ago

The PLC intentionally fades off after around 100 ms so as not to cause misleading hallucinations. It is really just about filling small gaps.

skybrian2y ago

In a broader context, though, this happens all the time. You’d be surprised what people mishear in noisy conditions. (Or if they’re hard of hearing.) The only thing for it is to ask them to repeat back what they heard, when it matters.

It might be an interesting test to compare what people mishear with and without this kind of compensation.

2 more replies

a_wild_dandan2y ago

To borrow from Joscha Bach: if you like the output, it's called creativity. If you don't, it's called a hallucination.

3 more replies

Sonic6562y ago

There something darkly funny that Opus could act psychotic because It glitched out or was fed something really complex. But you could argue transparent lossy compression at 80 ~ 320kbps is a controlled Deliriant like hallucination going by how only rare few can tell them apart from Lossless.

out_of_protocol2y ago· 6 in thread

Why the hell opus still not in Bluetooth? Well i know - sweet sweet license fees

(aKKtually, there IS opus codec, supported by pixel phones - google made it for VR/AR stuff. No one uses it, there are about ~1 headphone with opus support )

lxgr2y ago

As you already mention, it's already possible to use it. As for why hardware manufacturers don't actually use it, you can thank beautiful initiatives such as this: https://www.opuspool.com/ (previous HN discussion: https://news.ycombinator.com/item?id=33158475).

giantrobot2y ago

The BT SIG moves kind of slow and there's a really long tail of devices. Until there's a chip with native Opus support (that's as cheap as ones with AAC etc) you wouldn't get Opus support even if it was in the spec.

Realistically for most headphones people actually buy AAC (LC and HE) is more than good enough encoding quality for the audio the headphones can produce. Even if Opus was in the spec and Opus-supporting chips were common there would still be a hojillion Bluetooth devices in the wild that wouldn't support it.

It would be cool to have Opus in A2DP but it would take a BT SIG member that was really in love with it to get it in the profile.

out_of_protocol2y ago

They chose to make totally new inferior LC3 codec though.

Also, on my system (Android phone + BTR5/BTR15 Bluetooth DAC + Sennheiser H600) all options sound realy crappy compared to plain old usb, everything else is the same. LDAC 990kbps is less crappy, by sheer brute force. I suspect it's not only codec but other co-factors as well (like mandatory DSP on phone side)

2 more replies

dogma11382y ago

Opus isn’t patent free, and what’s worse it’s not particularly clear who owns what. The biggest patent pool is currently OpusPool but it’s not the only one.

https://www.opuspool.com/

pgeorgi2y ago

No codec (or any other technical development, really - edit: except for 20+ years old stuff, and only if you don't add any, even "obvious" improvements) is known patent free, or clear on "who owns what."

Folks set up pools all the time, but somehow they never offer indemnification for completeness of the pool - because they can't.

See https://en.kangxin.com/html/2/218/219/220/11565.html for a few examples how the patent pool extortion scheme already went wrong in the past.

2 more replies

rockdoe2y ago

Opus isn’t patent free

The existence of a patent pool does not mean there are valid patent claims against it. But yes, you may be technically correct by saying "patent free" rather than "not infringing on any valid patents". That said historically Opus has had claims against it by patents that looked valid but upon closer investigation didn't actually cover what the codec does.

Just looks like FUD to me. Meanwhile, the patent pools of competing technologies definitely still don't offer indemnification they cover all patents, but have no problem paying a bunch of people to spew exactly this kind of FUD - they're the ones who tried to set up this "patent pool" to begin with!

yalok2y ago· 5 in thread

The main limitation for such codecs is CPU/battery life - and I like how they sparsely applied ML in it here and there, combining it with classic approach (non-ML algos) to achieve better tradeoff of CPU vs quality. E.g. for better low bitrate support/LACE - "we went for a different approach: start with the tried-and-true postfilter idea and sprinkle just enough DNN magic on top of it." The key was not to feed raw audio samples to the NN - "The audio itself never goes through the DNN. The result is a small and very-low-complexity model (by DNN standards) that can run even on older phones."

Looks like the right direction for embedded algos and it seems to be a pretty unexplored one, as compared to the current fashion to do ML E2E.

indolering2y ago

It's a really smart application of ML: helping around the edges and not letting the ML algo invent pheonems or even whole words by accident. ML transcription has a similar trade-off of performing better on some benchmarks but also hallucinating results.

kolinko2y ago

A nice story about Xerox discovering this issue in 2003, when their copiers began slightly changing random numbers in copied documents

https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...

cedilla2y ago

I don't think machine learning was involved there at all. As I understand it, it was an issue of a specifically implemented feature (reusing a single picture of a glyph for all other instances to save space) being turned on in archive-grade settings, despite the manual stating otherwise.

h4x0rr2y ago

https://youtu.be/zXXmhxbQ-hk Interesting yet funny CCC video about this

yalok2y ago

fwiw, in ASR/speech transcription world, it looks reverse to me - in the past, there was lots of custom non-ML code & separate ML models for audio modeling and language modeling - but current SOTA ASRs are all e2e, and that's what's used even in mobile applications, iiuc.

I still think the pendulum will swing back there again, to have even better battery/larger models on mobile.

Dwedit2y ago· 4 in thread

I just want to mention that getting such good speech quality at 9kbps by using NoLACE is absolutely insane.

qingcharles2y ago

I was the lead dev for a major music streaming startup in 1999. I was working from home as they didn't have offices yet. My cable connection got cut and my only remaining Internet was 9600bps through my Nokia 9000 serial port. I had to re-encode the whole music catalog at 8000kbps WMA so I could stream it and continue testing all the production code.

The quality left a little to be desired...!

kristopolous2y ago

I wanted to see what it would sound like in comparison to a really early streaming audio codec, realaudio 1.0

    $ ffmpeg -i female_ref.wav - acodec real_144 female_ref.ra

And if you can't support that I put it back to wav and posted it: http://9ol.es/female_ref-ra.wav

This was seen as "14.4" audio, for 14.4kb/s dialup in the mid-90s. The quality increase over those nearly 30 years for what you can get out of what's actually a fewer number of bytes is really impressive.

anthk2y ago

I used to listen opus avant agarde music from https://dir.xiph.org at 16kb/s under a 2G connection and it was usable once mplayer/mpv cached back the stream for nearly a minute.

recursive2y ago

I don't know any of the details but maybe the CPUs of the time would have struggled to stream the decoding.

aredox2y ago· 4 in thread

>That's why most codecs have packet loss concealment (PLC) that can fill in for missing packets with plausible audio that just extrapolates what was being said and avoids leaving a hole in the audio

...How far can ML PLC "hallucinate" audio? A sound , a syllable, a whole word, half a sentence?

Can I trust anymore what I hear?

jmvalin2y ago

What the PLC does is (vaguely) equivalent to momentarily freezing the image rather than showing a blank screen when packets are lost. If you're in the middle of a vowel, it'll continue the vowel (trying to follow the right energy) for about 100 ms before fading out. It's explicitly designed not to make up anything you didn't say -- for obvious reasons.

aredox2y ago

Reassuring - thanks for clarifying that up.

samus2y ago

You never can when lossy compression is involved. It is commonly considered good practice to verify that the communication partner understood what was said, e.g., by restating, summarizing, asking for clarification, follow-up questions etc.

xyproto2y ago

It can already fill in all gaps and create all sorts of audio, but it may sound muddy and metallic. Give it a year, and then you can't trust what you hear anymore. Checking sources is a good idea in either case.

behnamoh2y ago· 3 in thread

Isn't it a strange coincidence that this shows up on HN while Claude Opus is also announced today and is on HN front page? I mean, what are the odds of seeing the word "Opus" twice in a day on one internet page?

mattnewton2y ago

Not that strange when you consider what “opus” means- product of work, with the connotation of being large and artistically important. It’s Latin, so it’s friendly phonemes to speakers of Romance languages and very scientific-and-important-sounding to English speaking ears. Basically the most generic name you can give your fine “work” in the western world.

behnamoh2y ago

Thanks for the definition. I like the word! I just haven't come across it in a long time, and seeing it twice on HN frontpage is bizarre!

1 more reply

declaredapple2y ago

Well it was released today

Very likely a coincidence.

https://opus-codec.org/release/stable/2024/03/04/libopus-1_5...

spacechild12y ago· 2 in thread

I'm using Opus as one of the main codecs in my peer-to-peer audio streaming library (https://git.iem.at/cm/aoo/ - still alpha), so this is very exciting news!

I'll definitely play around with these new ML features!

RossBencina2y ago

> peer-to-peer audio streaming library

Interesting :)

spacechild12y ago

Ha, now that is what I'd call a suprise! The "group" concept has obviously been influenced by oscgroups. And of course I'm using oscpack :)

AOO has already been used successfully in a few art projects. It's also used under hood by SonoBus. The Pd objects are already stable, I hope to finish the C/C++ API and add some code examples soon.

Any feedback from your side would of course be very appreciated.

m3kw92y ago· 2 in thread

Some people hyping it as AGI on social media

samus2y ago

Sadly, I see it even on forums where one might think people have background in technology...

m3kw92y ago

The tech background is likely IT and not AI. They used ChatGPT and they thought it’s conscious

1 more reply

WithinReason2y ago· 2 in thread

Someone should add an ML decoder to JPEG

viraptor2y ago

You can't do that much on the decoding side (apart from the equivalent of passing the normally decoded result through a low percent img2img ML)

But the encoders are already there: https://medium.com/@migel_95925/supercharging-jpeg-with-mach... https://compression.ai/

WithinReason2y ago

You can more accurately invert the quantisation step

1 more reply

frumiousirc2y ago· 1 in thread

How about adding a text "subtitle" stream to the mix. The encoder may use ML to perform speech-to-text. The decoder may then use the text, along with the audio surrounding the audio drop outs, to feed a conditional text-to-speech DNN. This way the network does not have to learn the harder problem of blindly interpolating across the drop outs from just the audio. The text stream is low bitrate so it may have substantial redundancy in order to increase the likelihood that any given (text) message is received.

jmvalin2y ago

Actually, what we're doing from DRED isn't that far from what you're suggesting. The difference is that we keep more information about the voice/intonation and we don't need the latency that would otherwise be added by an ASR. In the end, the output is still synthesized from higher-level, efficiently compressed information.

h4x0rr2y ago· 1 in thread

Does this new Opus version close the gap to xHE-AAC, which is (was?) superior at lower bitrates?

AzzyHN2y ago

Depends on whether you're encoding speech or music.

rhdunn2y ago

I find the interplay between audio codecs, speech synthesis, and speech recognition fascinating. Advancements in one usually results in advancements in the others.

Sonic6562y ago

Love how Opus 1.5 is now actually transparent at 16kbps for voice and 96kbps is still beats 192kbps MP3. Meanwhile xHE-AAC still feels like It was farted out since It 96 ~ 256kbps area Is legit worse than AAC-LC(Apple, FDK) are at ~160kbps.

brnt2y ago

What if there was a profiler or setting that helps to reencode existing lossy formats without introducing too many more artifacts? An sizeable collection runs into the issue, if the don't have (easily accessible) lossless masters.

I'd be very interested if I could move a variety of mp3s, aacs and vorbis to Opus if I knew additional quality loss was minimal.

cedilla2y ago

The quality at 80% package loss is incredible. It's straining to listen to but still understandable.

nimish2y ago

That 90% loss demo is bonkers. Completely comprehensible after maybe a second.

brcmthrowaway2y ago

This is game changing. When will H265 get a DL upgrade?

p1esk2y ago

Two inrelated “Opus” releases today, and both use ML. The other one is a new model from Anthropic.

1 more reply

j / k navigate · click thread line to collapse

138 comments

81 comments · 20 top-level

luplex2y ago· 16 in thread

That said, the examples are impressive and I can't wait for this level of understandability to become standard in my calls.

jmvalin2y ago

There's about equal male and female speakers, though codecs always have slight perceptual quality biases (in either direction) that depend on the pitch. Oh, and everything here is speech only.

radarsat12y ago

MauranKilom2y ago

As a notable example, the MP3 format was hand-tuned to vocals based on "Tom's Diner" (i.e. a female voice). It has been accused of being biased towards female vocals as a result.

thomastjeffery2y ago

Usually regular algorithms aren't generating data that pretends to be raw data. That's the significant difference here.

shwaj2y ago

Can you precisely define what you mean by "generating" and "pretends", in such a way that this neural network does both these things, but a conventional modern audio codec doesn't?

I apologize, because the above might sound rude. It's not intended to be.

1 more reply

samus2y ago

Not really. Any lossy codec is generating data that pretends to be close to the raw data.

1 more reply

unixhero2y ago

Why is the ethics question important? It is a new feature for an audio codec, not a new material to teach in your kids curriculum.

nextaccountic2y ago

Because this gets deployed in real world, affecting real people. Ethics don't exist only in kids curriculum.

unethical_ban2y ago

gcr2y ago

This is a great question! Here's a related failure case that I think illustrates the issue.

2 more replies

The_Colonel2y ago

Imagine you release a codec which optimizes for cis white male voice, every other kind of voice has perceptibly lower fidelity (at low bitrates). That would not go well...

panzi2y ago

Yeah, imagine a low bitrate situation where only English speaking men can still communicate. That would create quite a power imbalance.

overstay89302y ago

Meanwhile G.711 makes all dudes sound like disgruntled middle aged union workers

numpad02y ago

No offense/taken, but Codec2 seem to be affected a bit for this problem.

samus2y ago

This is actually a very technical question since it means the audio codec might simply not work that well in practice as it could and should.

kolinko2y ago

I wouldn't consider this a matter of ethics, and more of a technology limitations or ignorance.

mikae12y ago· 9 in thread

They’ll have my upvote just for writing ML instead AI. Seriously, this is very exciting developments for audio compression.

wilg2y ago

This is something you really shouldn’t spend any cycles worrying about.

sergiotapia2y ago

I'd just like to interject for a moment. What you're referring to as AI, is in fact, Machine Learning, or as I've recently taken to calling it, Machine Learning plus Traditional AI methods.

1 more reply

claudiojulio2y ago

Machine Learning is Artificial Intelligence. Just look at Wikipedia: https://en.wikipedia.org/wiki/Artificial_intelligence

declaredapple2y ago

Many people are annoyed by the recent influx of calling everything "AI".

Machine learning, statistical models, procedural generation, literally an usage of heuristics are all being called "AI" nowadays which obfuscates the "boring" nature in favor of "exciting buzzword"

Selecting the quality of a video based on your download speed? That's "AI" now.

mook2y ago

On the other hand, it means that you can assume anything mentioning AI is overhyped and probably isn't as great as they claim. That can be slightly useful at times.

sitzkrieg2y ago

im quite tired of this. every snake oil shop now calls any algorithm "a i" to sound hip and sophisticated

mikae12y ago

> Many people are annoyed by the recent influx of calling everything "AI".

Yes, that was the reason for my comment. :)

samus2y ago

xcv1232y ago

Machine Learning is a subset of AI

travisporter2y ago· 6 in thread

Very cool. seems like they addressed the problem of hallucination. would be interesting to see an example of it hallucinating without redundancy and corrected with redundancy

CharlesW2y ago

Isn't packet loss concealment (PLC) a form of hallucination? Not saying it's bad, just that it's still Making Shit Up™ in a statistically-credible way.

jmvalin2y ago

2 more replies

derf_2y ago

The PLC intentionally fades off after around 100 ms so as not to cause misleading hallucinations. It is really just about filling small gaps.

skybrian2y ago

It might be an interesting test to compare what people mishear with and without this kind of compensation.

2 more replies

a_wild_dandan2y ago

To borrow from Joscha Bach: if you like the output, it's called creativity. If you don't, it's called a hallucination.

3 more replies

Sonic6562y ago

out_of_protocol2y ago· 6 in thread

Why the hell opus still not in Bluetooth? Well i know - sweet sweet license fees

(aKKtually, there IS opus codec, supported by pixel phones - google made it for VR/AR stuff. No one uses it, there are about ~1 headphone with opus support )

lxgr2y ago

giantrobot2y ago

It would be cool to have Opus in A2DP but it would take a BT SIG member that was really in love with it to get it in the profile.

out_of_protocol2y ago

They chose to make totally new inferior LC3 codec though.

2 more replies

dogma11382y ago

Opus isn’t patent free, and what’s worse it’s not particularly clear who owns what. The biggest patent pool is currently OpusPool but it’s not the only one.

https://www.opuspool.com/

pgeorgi2y ago

Folks set up pools all the time, but somehow they never offer indemnification for completeness of the pool - because they can't.

See https://en.kangxin.com/html/2/218/219/220/11565.html for a few examples how the patent pool extortion scheme already went wrong in the past.

2 more replies

rockdoe2y ago

Opus isn’t patent free

yalok2y ago· 5 in thread

Looks like the right direction for embedded algos and it seems to be a pretty unexplored one, as compared to the current fashion to do ML E2E.

indolering2y ago

kolinko2y ago

A nice story about Xerox discovering this issue in 2003, when their copiers began slightly changing random numbers in copied documents

https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...

cedilla2y ago

h4x0rr2y ago

https://youtu.be/zXXmhxbQ-hk Interesting yet funny CCC video about this

yalok2y ago

I still think the pendulum will swing back there again, to have even better battery/larger models on mobile.

Dwedit2y ago· 4 in thread

I just want to mention that getting such good speech quality at 9kbps by using NoLACE is absolutely insane.

qingcharles2y ago

The quality left a little to be desired...!

kristopolous2y ago

I wanted to see what it would sound like in comparison to a really early streaming audio codec, realaudio 1.0

    $ ffmpeg -i female_ref.wav - acodec real_144 female_ref.ra

And if you can't support that I put it back to wav and posted it: http://9ol.es/female_ref-ra.wav

anthk2y ago

I used to listen opus avant agarde music from https://dir.xiph.org at 16kb/s under a 2G connection and it was usable once mplayer/mpv cached back the stream for nearly a minute.

recursive2y ago

I don't know any of the details but maybe the CPUs of the time would have struggled to stream the decoding.

aredox2y ago· 4 in thread

>That's why most codecs have packet loss concealment (PLC) that can fill in for missing packets with plausible audio that just extrapolates what was being said and avoids leaving a hole in the audio

...How far can ML PLC "hallucinate" audio? A sound , a syllable, a whole word, half a sentence?

Can I trust anymore what I hear?

jmvalin2y ago

aredox2y ago

Reassuring - thanks for clarifying that up.

samus2y ago

xyproto2y ago

behnamoh2y ago· 3 in thread

mattnewton2y ago

behnamoh2y ago

Thanks for the definition. I like the word! I just haven't come across it in a long time, and seeing it twice on HN frontpage is bizarre!

1 more reply

declaredapple2y ago

Well it was released today

Very likely a coincidence.

https://opus-codec.org/release/stable/2024/03/04/libopus-1_5...

spacechild12y ago· 2 in thread

I'm using Opus as one of the main codecs in my peer-to-peer audio streaming library (https://git.iem.at/cm/aoo/ - still alpha), so this is very exciting news!

I'll definitely play around with these new ML features!

RossBencina2y ago

> peer-to-peer audio streaming library

Interesting :)

spacechild12y ago

Ha, now that is what I'd call a suprise! The "group" concept has obviously been influenced by oscgroups. And of course I'm using oscpack :)

AOO has already been used successfully in a few art projects. It's also used under hood by SonoBus. The Pd objects are already stable, I hope to finish the C/C++ API and add some code examples soon.

Any feedback from your side would of course be very appreciated.

m3kw92y ago· 2 in thread

Some people hyping it as AGI on social media

samus2y ago

Sadly, I see it even on forums where one might think people have background in technology...

m3kw92y ago

The tech background is likely IT and not AI. They used ChatGPT and they thought it’s conscious

1 more reply

WithinReason2y ago· 2 in thread

Someone should add an ML decoder to JPEG

viraptor2y ago

You can't do that much on the decoding side (apart from the equivalent of passing the normally decoded result through a low percent img2img ML)

But the encoders are already there: https://medium.com/@migel_95925/supercharging-jpeg-with-mach... https://compression.ai/

WithinReason2y ago

You can more accurately invert the quantisation step

1 more reply

frumiousirc2y ago· 1 in thread

jmvalin2y ago

h4x0rr2y ago· 1 in thread

Does this new Opus version close the gap to xHE-AAC, which is (was?) superior at lower bitrates?

AzzyHN2y ago

Depends on whether you're encoding speech or music.

rhdunn2y ago

I find the interplay between audio codecs, speech synthesis, and speech recognition fascinating. Advancements in one usually results in advancements in the others.

Sonic6562y ago

brnt2y ago

I'd be very interested if I could move a variety of mp3s, aacs and vorbis to Opus if I knew additional quality loss was minimal.

cedilla2y ago

The quality at 80% package loss is incredible. It's straining to listen to but still understandable.

nimish2y ago

That 90% loss demo is bonkers. Completely comprehensible after maybe a second.

brcmthrowaway2y ago

This is game changing. When will H265 get a DL upgrade?

p1esk2y ago

Two inrelated “Opus” releases today, and both use ML. The other one is a new model from Anthropic.

1 more reply

j / k navigate · click thread line to collapse