undefined | Better HN

0 pointsghoul24y ago0 comments

What codec did you use for your 8000bps? In my experience opus is the only codec that can encode general audio to an "acceptable" (for low values of acceptable) quality at that bitrate. There are speech codecs that can go below 2kbps, but they are useless for non-speech audio (and are also very computationally expensive).

Opus is simply magic. Iirc, they now even have a 6kbps mode.

0 comments

6 comments · 2 top-level

retrac4y ago· 2 in thread

Deep-learning-based compression techniques may one day be able to get speech down to several hundred bits per second, and non-speech audio in not much more. (They share the computational expensiveness problem though; even more so.) Google's Lyra seems to perform similarly to Opus for speech, at less than half the bitrate: https://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-...

ghoul2OP4y ago

I am aware of lyra. Its pretty good. Very computationally expensive though - no audio application I have ever worked on had even close to the power/thermal budgets that would allow use of the deep-learning codecs. Maybe someday we will get very low-energy hardware accelerators for them, but until then, these are a non-starter for things I work on.

The thing is (and maybe this is a nitpick), once you are down to several hundred bps for speech, its getting to be more like speech-to-text (the encoder) and text-to-speech (decoder) than an audio codec.

I am actually not aware of any non-speech audio codecs which can go that low. Any links?

vletal4y ago

When we have anough compute or the models get much smaller. At this point it seems wasteful to utilize gaming level GPU to decode audio stream.

kingcharles4y ago· 2 in thread

It was Windows Media (WMA) at the time.

ghoul2OP4y ago

Ah yes, should have guessed. WMA was pretty big in the early 2000s. Around 2004 I even had access to the official wma source from Microsoft.

Spent about 4 months porting and then optimizing the wma decoder on a proprietary very-low-power dsp. It was hard work, but I got it down to about 12mhz for real-time decode on the dsp. The product this was intended for was going to be powered by a watch battery - hence the extreme optimization requirement. The source code, as received, used to consume ~400mhz on an athlon iirc. But with a lot of assistance from the dsp hardware and 4 months of elbow grease it could decode in less than 12mhz - for some profiles. Insane codebase!

kingcharles4y ago

Ah, I would have loved to see that. Never got to, even though I was a Windows Media MVP, I was NDA'd to the Moon, and I was the main tech guy in exec level meetings at Microsoft about Windows Media and the company I was working for was probably the biggest user of WMA in the world.

I wonder if the decoder source is in the Windows source code leaks? I guess Windows Media Player source will be in there. That'd be worth a look for me to see how utterly horrible it actually was.

j / k navigate · click thread line to collapse

0 comments

6 comments · 2 top-level

retrac4y ago· 2 in thread

ghoul2OP4y ago

I am actually not aware of any non-speech audio codecs which can go that low. Any links?

vletal4y ago

When we have anough compute or the models get much smaller. At this point it seems wasteful to utilize gaming level GPU to decode audio stream.

kingcharles4y ago· 2 in thread

It was Windows Media (WMA) at the time.

ghoul2OP4y ago

Ah yes, should have guessed. WMA was pretty big in the early 2000s. Around 2004 I even had access to the official wma source from Microsoft.

kingcharles4y ago

I wonder if the decoder source is in the Windows source code leaks? I guess Windows Media Player source will be in there. That'd be worth a look for me to see how utterly horrible it actually was.

j / k navigate · click thread line to collapse