Real-time audio programming 101: time waits for nothing (2011) (opens in new tab)

(rossbencina.com)

236 pointsssfrr1y ago121 comments

121 comments

87 comments · 12 top-level

swatcoder1y ago· 32 in thread

(2011) But a great summary and mostly evergreen

One practical reality it doesn't share is that your audio processing (or generation) code is often going to be running in a bus shared by a ton of other modules and so you don't have the luxury of using "5.6ms" as your deadline for a 5.6ms buffer. Your responsibility, often, is to just get as performant as reasonably possible so that everything on the bus can be processed in those 5.6ms. The pressure is usually much higher than the buffer length suggests.

Derbasti1y ago

When I was working on desktop audio software, our rule of thumb was to keep CPU occupancy below 30%. More than that, and you'd be sure to get hitching in your audio stream. (This factors in thread contention, other background tasks stealing CPU cycles, file system pauses...)

A much different experience from embedded programming, where 99% occupancy is no problem at all.

RossBencina1y ago

In the context of the article, I assume that the driver has arranged sufficient buffering so that the jitter in scheduling across a bus (PCI, USB) is masked with respect to the client code. But you are correct that communications overhead can cut into your compute time if it is not addressed. Some audio APIs (e.g. CoreAudio) allow for configuring the buffering margins, so you can trade off buffer latency against available audio compute %. There is a whole world of debate surrounding how to best schedule audio compute (e.g. interrupt driven vs. delay-locked high precision timers).

Assuming the context is a desktop OS (which is the context of TFA), I think that the main source of non-determinism is scheduling jitter (the time between the ideal start of your computation, and the time when the OS gives you the CPU to start the computation). Of course if you can't arrange exclusive or max-priority access to a CPU core you're also going to be competing with other processes. Then there is non-deterministic execution time on most modern CPUs due to cache timing effects, superscalar out of order instruction scheduling, inter-core synchronisation, and so on. So yeah, you're going to need some margin unless you're on dedicated hardware with deterministic compute (e.g. a DSP chip).

swatcoder1y ago

No, I'm just talking about the common case where you have some other stuff going on before or after your own audio processing code: a software instrument your framework provides, some AudioUnits or gstreamer nodes adding other effects, the whole device chain in the DAW that's hosting you, etc. All of those things need to get done within your window so you can't use the whole thing for yourself.

Most people learning audio programming aren't making a standalone audio app where they do all the processing, or at least not an interesting one. They're usually either making something like a plugin that ends up in somebody else's bus/graph, or something like a game or application that creates a bus/graph and shoves a bunch of different stuff into it.

ssfrrOP1y ago

Definitely good to keep in mind. The thing that I think is really interesting about audio programming is that you need to be deterministically fast. If your DSP callback executes in 1ms 99.99% of the time but sometimes takes 10ms, you’re hosed.

I would love to see a modern take on the real-world risk of various operations that are technically nondeterministic. I wouldn’t be surprised if there are cases where the risk of >1ms latency is like 1e-30, and dogmatically following this advice might be overkill.

RossBencina1y ago

> dogmatically following this advice might be overkill

It depends on your appetite for risk and the cost of failure.

A big part of the problem is that general purpose computing systems (operating systems and hardware) are not engineered as real-time systems and there are rarely vendor guarantees with respect to real-time behavior. Under such circumstances, my position is that you need to code defensively. For example, if your operating system memory allocator does not guarantee a worst-case bound on execution time, do not use it in a real-time context.

ssfrrOP1y ago

I don't mean to devalue the advice here. I think it's spot on, and I unreservedly recommend this article to folks who want to learn about writing reliable audio software.

I think in essence I'm repeating the comments of Justin from Cockos, which you summarize [1]:

> It is basically saying that you can reduce the risk of priority inversion to the point where the probability is too low to worry about.

In that comment you also say:

> 100% certainty can’t be guaranteed without a hard real-time OS. However 5ms is now considered a relatively high latency setting in pro/prosumer audio circles

Which I interpret as acknowledging that we're already forced into the regime of establishing an acceptable level of risk.

My point is that I would love to see more data on the actual latency distributions we can expect, so that we can make more informed risk assessments. For example, I know that not all `std::atomic` operations are lock-free, but when the critical section is so small, is it really a problem in practice? I want histograms!

[1]: http://www.rossbencina.com/code/real-time-audio-programming-...

jancsika1y ago

> If your DSP callback executes in 1ms 99.99% of the time but sometimes takes 10ms, you’re hosed.

I tend to agree, but...

From my recollection of using Zoom-- it has this bizarre but workable recovery method for network interruptions. Either the server or the client keeps some amount of the last input audio in a buffer. Then if the server detects connection problems at time 't', it grabs the buffer from t - 1 seconds all the way until the server detects better connectivity. Then it starts a race condition, playing back that amount of the buffer to all clients at something like 1.5 speed. From what I remember, this algo typically wins the race and saves the client from having to repeat themselves.

That's not happening inside a DSP routine. But my point is that some clever engineer(s) at Zoom realized that missing deadlines in audio delivery does not necessarily mean "hosed." I'm also going to rankly speculate that every other video conferencing tool hard-coupled missing deadlines with "hosed," and that's why Zoom is the only one where I've ever experienced the benefit of that feature.

ssfrrOP1y ago

The context for this article is writing pro audio software, where that kind of distortion would generally be as bad as a dropout, if not worse.

1 more reply

munificent1y ago

I don't know. When it comes to real-time audio... imagine a huge festival with a giant wall of speakers blasting at the audience. If the audio playback glitches and you something like a 22kHz buzz (alternating two samples), that is a lot of fried ears.

wizardforhire1y ago

This scenario is the stuff of nightmares for me!

When you have 100k people paying $500 to the sky is the limit, failure is not an option. Increasingly audio engineers and subsequently performers are at the mercy of the latest jr developers who don’t have to live with the failures of their short sightedness. Grimes’ Coachella set case in point. Wholly due to pioneer ignoring their users for over a decade. Sometimes we don’t have 3 days to copy files to a usb drive but I digress.

4 more replies

ssfrrOP1y ago

But you'll never be 100% sure. Most musicians aren't willing to pay for NASA-level QA and custom hardware running an RTOS, and even that doesn't guarantee perfect software.

We're always dealing with risk and trade-offs. Maybe you avoid a locking `atomic` synchronization point by implementing a more complicated lock-free ringbuffer, but in the process you introduce some other bug that has you dumping uninitialized memory into the DAC.

I think the advice in TFA is totally reasonable and worth following. I'm just saying that there may be cases where it's OK to violate some of these rules. I'd love to see more data to help inform those decisions.

This isn't even in opposition to the article, which says explicitly:

>Some low-level audio libraries such as JACK or CoreAudio use these techniques internally, but you need to be sure you know what you’re doing, that you understand your thread priorities and the exact scheduler behavior on each target operating system (and OS kernel version). Don’t extrapolate or make assumptions

1 more reply

varispeed1y ago

The real fun is optimising maths. Remove all divisions. Create LUTs, approximations, CPU specific tricks. Despite the fact CPUs are magnitudes faster now, they are still slow for real time processing.

kdjdjjz1y ago

Real time does not mean fast, it means deterministic

Thus such micro optimizations are seldomly used. Quite the opposite, you try to avoid jitter which could be the result of caches

2 more replies

PaulDavisThe1st1y ago

> Create LUTs

This has been slower for most things that raw computation for well over a decade (probably more like two).

3 more replies

nsguy1y ago

You don't really do these any more on a modern CPU. This is stuff I used to do 30 years ago and you might still do if you're on a micro-controller or some other tiny system. The CPUs aren't slow. Tne main problem is if the OS doesn't schedule your process it doesn't matter how fast the CPU is.

Derbasti1y ago

This is great fun! But it's much more prevalent in embedded DSP than desktop.

RossBencina1y ago

> deterministically fast

Indeed, like all real-time systems you need to think in terms of worst-case time complexity, not amortized complexity.

rzzzt1y ago

Use of Ethernet in real-time systems. Packet loss, collision rate, jitter is """good enough""" so it became an acceptable replacement of eg. ATM.

nsguy1y ago

Yes. Most modern Ethernet isn't running on shared media (i.e. there are no collisions) and for the most part no packet loss as long as there's no congestion. For networks and for the CPU, when you're fast enough the jitter matters less, if the cpu or the network "takes a break" (from the application perspective), it tends to be a very short break on really fast networks or cpus. e.g. if a packet gets in front of you in 10Mbps Ethernet that's a big deal for an audio application but a packet ahead of you in 10Gbps Ethernet isn't much of a delay for audio. 1ms vs. 1us sort of thing.

[fixed typo]

lukeh1y ago

Or you use AVB/TSN which gives you stronger guarantees, but requires cooperation of all bridges (switches).

empiricus1y ago

Notice that games are often able to render each frame in 5ms: which in practice means run multiple short programs for each pixel you see on the 4k screen. So modern computers are able to do huge huge amount of computation in 5ms (in the order of 10^10 flops, 10^8 bytes). If puny kilobytes of audio data cannot be processed in 5ms it means things are terribly wrong.

exDM691y ago

Games do a very impressive amount of work for graphics but there's a huge difference: a dropped/late graphics frame every now and then is not a big deal.

An audio glitch is very annoying by comparison, especially if the application is a live musical instrument or something like that. Even the choppy rocket motor sounds of Kerbal Space Program (caused by garbage collector pauses) are infuriating.

It's kind of the difference between soft and hard real time systems. Although most audio applications don't strictly qualify as hard real time (missing a deadline is as bad as a total failure) but failing a deadline is much worse than in graphics.

izacus1y ago

Frame pacing issues due to dropped frames are absolutely a huge deal in games.

1 more reply

lomase1y ago

The GPU is able to paralelizes the drawing of each pixel.

bluGill1y ago

Missing a draw and thus displaying the previous screen for a frame is not noticable in general.

jcelerier1y ago

Depends on where you run your audio. I've worked on embedded devices where we could just use isolcpu and IRQ controls to make sure specific cores were only ever used for the audio thread of the app.

spacechild11y ago

What do you mean by "bus" and "module" in this context?

swatcoder1y ago

"Bus" (as I was using it) is the path from some audio source to some audio destination and a "module" (as used) would be something that takes a buffer of samples on that bus and does something with it.

You might sometimes build an app where (through your operating system) you connect directly with an input device and/or output device and then do all the audio processing yourself. In this case, you'd more or less control the whole bus and all the code processing samples on it and have a fairly true sense of your deadline. (The OS and drivers would still be introducing some overhead for mixing or resampling, etc, but that's generally of small concern and hard to avoid)

Often, though, you're either going to be building a bus and applying your own effects and some others (from your OS, from team members, from third party plugins/libraries, etc) or you're going to be writing some kind of effect/generator that gets inserted into somebody else's bus in something like a DAW or game. In all these cases, you need to assume that all processing code that isn't yours needs all the time that you can leave for it and just make your own code as efficient as is reasonable.

spacechild11y ago

Thanks for clarifying. The terms are highly ambigious (see the sibling answer https://news.ycombinator.com/item?id=40930298), that's why I asked. Personally, I would rather use the terms "audio pipeline" or "audio graph" instead of the generic "bus".

> In all these cases, you need to assume that all processing code that isn't yours needs all the time that you can leave for it and just make your own code as efficient as is reasonable.

Yes. For audio programmers that is obvious, in particular when it comes to plugins, but for novices it might be worth pointing out!

RossBencina1y ago

> You might sometimes build an app where (through your operating system) you connect directly with an input device and/or output device and then do all the audio processing yourself.

In case it is not clear, that is the primary case that is addressed by the linked blog post (source: I wrote the blog post).

1 more reply

GrantMoyer1y ago

A module is a piece of software or hardware which is independent in some way.

A bus is a shared medium of communication[1]. Often, busses are time-division multiplexed[2], so if you want to use the bus, but another module is already using it, you need to wait.

For example, if your audio buffers are ultimately submitted to a sound card over a PCI bus, the submission may need to wait for any ongoing transactions on the PCI bus, such as messages to a graphics card.

[1]: https://en.wikipedia.org/wiki/Bus_(computing)

[2]: https://en.wikipedia.org/wiki/Time-division_multiplexing

spacechild11y ago

That is one possible interpretation, but not what they meant. That's why I asked because I wasn't sure :)

user_78321y ago· 12 in thread

Slightly tangential, does anyone know any good (windows based) DSP software? EquilizerAPO is decent in theory but beyond being clunky to use unfortunately doesn't even seem to work 90% of the time.

jonathanstrange1y ago

Do you mean software that can modify audio streams via the Windows audio system (as opposed to an ASIO driver used by DAWs)?

DDMF's VirtualAudioStream does that. It allows you to create virtual audio devices with chains of arbitrary VST plugins. As for the VST plugins, there are thousands of free and paid plugins for everything. I'm using VirtualAudio stream to put a Wave's noise cancelling and a good compressor between my mic and Zoom. It increases latency, of course.

user_78321y ago

> Do you mean software that can modify audio streams via the Windows audio system

I think so. TBH I'm quite new to the world of DSPs so I don't know the right terminology. The purpose of the DSP (which I should've mentioned in my original post now that I think of it) is to tweak the speakers on my laptop - there are for example ways to "fake" bass (through missing harmonics), or have dynamically changing bass. I'll have a look at VirtualAudioStream, thanks for the recommendation.

rzzzt1y ago

Cockos' JSFX: https://www.cockos.com/jsfx/

spacechild11y ago

Graphical: Pure Data, Max/MSP

Text based: SuperCollider, Csound, Chuck

Ylpertnodi1y ago

https://www.airwindows.com/consolidated/

J_Shelby_J1y ago

I gave up on software and bought a steinberg ur44c. Almost zero latency for eq and compression so you can monitor your self in real-time.

RossBencina1y ago

AudioMulch?

Optimal_Persona1y ago

I wish you were still actively developing it, I haven't been able to run it on Mac for a while now. More than any other recording software, AudioMulch gave me a taste for digital audio and specifically realtime resampling. Since using AudioMulch, any DAW that doesn't let me record arbitrary combos of audio streams in realtime feels hopelessly limited.

anigbrowl1y ago

I'm still a fan ~25 years later

bratwurst30001y ago

I think camilla dsp works for windows.

chresko1y ago

SuperCollider

Ylpertnodi1y ago

www.airwindows.com

zokier1y ago· 7 in thread

One of my pet peeves is that graphics programming does not generally enjoy this level of rigor like audio does, despite arguably being also real-time. This is doubly true for UI programming, I'm not aware of any UI toolkits designed with real-time in mind.

I would love to see a UI system that has predictable low-latency real-time perf, so you could confidently achieve something like single frame latency on 144Hz display.

Derbasti1y ago

On the other hand, the throughput requirements on image processing are often far more stringent than in audio. A full-resolution stereo audio stream is only 200kb/s = 12 Mb/min after all, whereas a video system might have to chew throw many gigabytes in the same time.

kaba01y ago

Well, if you allow a bit of a sloppy terminology, graphics are “softer” real-time, than audio - a frame drop is less noticeable than audio glitches.

makapuf1y ago

That's where realtime and fast differ : you can consider there's a "hard"-er real-time at 100ms by example, where you can drop some frames but you don't want to block the ui. Be slow but don't block the UI.

robinsonb51y ago

When I last made a similar point someone pointed out to me that this has gained a higher priority recently thanks to VR. Janky framerate is going to be much more noticeable with a headset than with a traditional monitor, and could even result in lost lunches!

dist-epoch1y ago

The tiniest audio glitch is instantly noticed and extremely annoying. So extreme efforts are put into preventing them. If your software has audio glitches people will really stop using it.

A graphics micro-stutter not so much.

> I'm not aware of any UI toolkits designed with real-time in mind.

What would be the point? The human eye can only notice so much FPS (gamers might disagree with their 244 FPS displays).

DontchaKnowit1y ago

Not just annoying, but potentially dangerous. An audio glitch could concievably deafen 2 thousand people at a show if things go wrong.

pjc501y ago

All games seem to manage this somehow. Usually by using fixed layout for the UI, because re-layout is a nasty non parallelizeable problem that UI designers frequently inflict on themselves.

jmkr1y ago· 6 in thread

As a web developer, learning music and audio programming makes my mind melt. We often say "real time" when we mean "fast." But in audio real time means "really fast, all the time" and somewhat deterministically.

If your tempo drifts, then you're not going to hear the rhythm correctly. If you have a bit of latency on your instrument, it's like turning on a delay pedal where the only signal coming through is the delay.

One might assume if you just follow audio programming guides then you can do all this, but you still need to have your system setup to handle real time audio, in addition to your program.

It's all noticeable.

duped1y ago

> We often say "real time" when we mean "fast." But in audio real time means "really fast, all the time" and somewhat deterministically.

As a former developer of real time software, the usage of "real time" to mean "fast" makes me cringe a bit whenever I read it. If there's a TCP/IP stack in the middle of something, it's probably not "real time."

"real time" means there's a deadline. Soft real time means missing the deadline is a problem, possibly a bug, and quite bad. Hard real time means the "dead" part of "deadline" could be literal, either in terms of your program (a missed deadline is an irrecoverable error) or the humans that need the program to make the deadline are no longer alive.

kaba01y ago

And to demonstrate that hard realtime is not about speed, there is a whole hard-real time JVM implementation with GC and everything used in military contexts.

Modern computers are ridiculously fast, relatively speaking you don’t need much resources to calculate a missile trajectory, so “simply” 100% sure doing some calculations at a fixed rate, with even a GC cycle that has a deterministic higher bound (e.g. it will go throw the whole, non-resizable heap, but it will surely always take n seconds), you can pass the requirements. Though a desktop computer pretty much already begets the hard part of hard realtime, due to all the stuff that makes it fast - memory caching, CPU pipelining, branch prediction, normal OSs scheduling, etc.

jmkr1y ago

What is it like working with hard real time? As in, is there some tooling to determine how long a given function will take to run to meet a deadline? It starts to sound like an NP problem, but we have other kinds of correctness and proof testing.

I suppose it's hard to make guarantees with different environments and hardware, but I realized when we (non-realtime people) ship software we don't really have guarantees for when our functions run.

1 more reply

hrkfmud50k1y ago

if you think that's cool then you may also like a "hard" real time programs e.g. ABS brakes https://en.wikipedia.org/wiki/Real-time_computing#Criteria_f...

chalcolithic1y ago

You can actually do music/audio programming in a browser. It has some rough edges but it works.

xipix1y ago

Absolutely you can. With WebAsm SIMD you have near-native DSP performance. Downsides from my experience [1]:

- You are at the mercy of the browser. If browser engineers mess up the audio thread or garbage collection, even the most resilient web audio app breaks. It happens.

- Security mitigations prevent or restrict use of some useful APIs. For example, SharedArrayBuffer and high resolution clocks.

[1] https://bungee.parabolaresearch.com/bungee-web-demo

demondemidi1y ago· 6 in thread

If you’re worried about glitches during live performances there’s a fool proof solution: play real instruments. ;)

shiroiushi1y ago

That's great if you don't need any amplification at all, or perhaps nothing more than a typical guitar/bass amplifier. Big venues need a lot more hardware than this.

demondemidi1y ago

“Big venues”. lol. I’m sure when Metallica played in russia in 1991 for the 4th largest concert in history it didn’t really happen because according to you they needed digital (glitchy) hardware.

What are you even talking about, man.

shiroiushi1y ago

They don't have the old stuff any more. It's like trying to land men on the Moon now: the Apollo hardware is all gone, except the stuff in a museum that doesn't work any more.

1 more reply

lomase1y ago

A digital synth is a real instrument and electronic music is real music.

PaulDavisThe1st1y ago

... and if the FOH engineer is using a poorly engineered digital console, or even a DAW, for mixing ... ?

uwagar1y ago

or embrace glitch in your music.

brcmthrowaway1y ago· 5 in thread

This seems super outdated. Isn't CoreAudio HW accelerated now?

raphlinus1y ago

It continues to be relevant. On modern computers, mobile especially, there are are more and more things that can glitch audio from being actually real-time. Probably one of the worst offenders is power management; it is extremely likely that the power management governor has no awareness of upcoming audio deadlines.

Obviously the happy case is when all the audio processing is done in a DSP where scheduling is deterministic, but it's rare to be able to count on that. Part of the problem is that modern computers are so fast that people expect them to handle audio tasks without breathing hard. But that speed is usually measured as throughput rather than worst-case latency.

The advice I'd give to anybody building audio today is to relentlessly measure all potential sources of scheduling jitter end-to-end. Once you know that, it becomes clearer how to address it.

nyanpasu641y ago

How do you measure scheduling jitter of audio drivers and userspace? I hear real-time priority or kernels improve latency? (There was some controversy about Ubuntu Studio or something switching to RT kernels with more missed deadlines than RT-PREEMPT, but I don't know how to quantify this stuff.)

raphlinus1y ago

That's actually a very good question. To some extent, performance analysis tools may be able to give you high-accuracy timestamps of things involving context switches and other things that can cause scheduling jitter. If you can get access to things like the fill level of FIFO buffers, even better. You may also be able to do experiments like cutting buffer sizes down to the bone to see how low they can go without glitching.

Of course, it's not unusual that the many layers of abstraction in modern systems actively frustrate getting real performance data. But dealing with that is part of the requirements of doing real engineering.

1 more reply

PaulDavisThe1st1y ago

cyclictest(8) is the canonical tool for starting out down this pathway. That measures basic kernel stuff relevant to this area of inquiry.

However, since actual context switch times depend a lot on working set size, ultimately you can't measure this stuff accurately unless you instrument the actual application code you are working with. A sample playback engine is going to have very different performance characteristics than an EQ plugin, even if there is theoretically more actual computation going on in the latter.

binary1321y ago

also interested in the answer to this

chaosprint1y ago· 4 in thread

Great resource! For those interested in learning the fundamentals of audio programming, I highly recommend starting with Rust.

the cpal library in Rust is excellent for developing cross-platform desktop applications. I'm currently maintaining this library:

https://github.com/chaosprint/asak

It's a cross-platform audio recording/playback CLI tool with TUI. The source code is very simple to read. PRs are welcomed and I really hope Linux users can help to test and review new PRs :)

When developing Glicol(https://glicol.org), I documented my experience of "fighting" with real-time audio in the browser in this paper:

https://webaudioconf.com/_data/papers/pdf/2021/2021_8.pdf

Throughout the process, Paul Adenot's work was immensely helpful. I highly recommend his blog:

https://blog.paul.cx/post/profiling-firefox-real-time-media-...

I am currently writing a wasm audio module system, and hope to publish it here soon.

smj-edison1y ago

I have mixed feelings on cpal: on the one hand, it's been really wonderful to have a library that just works on different platforms. On the other hand, it's an absolute pain in the butt for doing anything simple. I really wish it would have a simple interface for when I'm only worried about floating point data (I ended up creating my own library to wrap cpal's idiosyncrasies for my mixed midi/audio node program: https://github.com/smj-edison/clocked).

nyanpasu641y ago

Is it still the case that cpal doesn't support "synchronous" duplex audio where the program inputs audio from a source and outputs it to a sink (either with feedback or outputting unrelated audio), with an integer number of periods (as little as 2) of software-level latency if you copy source buffers to the sink? Last time I used it, each stream is opened in input or output mode and opening both does not run with any guaranteed timing relation.

dmix1y ago

Is there a good "toolbox" style cli for audio? Like pitch shifting and time stretching etc

jim-jim-jim1y ago

sox

forrestthewoods1y ago· 2 in thread

> Applications where low latency is especially important are (1) interactive audio systems (such as musical instruments or DJ tools) where the UI needs to be responsive to the performer, and (2) real-time audio effects, where the system needs to process analog input (say from a guitar) and output the processed signal without noticeable delay.

It's worth noting that these are practically the only case where extreme real-time audio programming measures are necessary.

If you're making, for example, a video game the requirements aren't actually that steep. You can trivially trade latency for consistency. You don't need to do all your audio processing inside a 5ms window. You need to provide an audio buffer every 5 milliseconds. You can easily queue up N buffers to smooth out any variance.

Highly optimized competitive video games average like ~100ms of audio latency [1]. Some slightly better. Some in the 150ms and even 200ms range. Input latency is hyper optimized, but people rarely pay attention to audio latency. My testing indicates that ~50ms is sufficient.

Audio programming is fun. But you can inject latency to smooth out jitter in almost all use cases that don't involve a live musical instrument.

[1] https://www.youtube.com/watch?v=JTuZvRF-OgE&t=490s

PaulDavisThe1st1y ago

Shooting a gun (or whatever) in a game and "waiting" for the sound is extremely isomorphic to pressing a key on a MIDI keyboard and "waiting" for the sound.

Yes, background sound in games can be handled with very large buffers, but most players expect music-performance-like latency for action-driven sound.

forrestthewoods1y ago

Games don't have particularly large buffers. There's just a very long pipeline with lots of buffering. It's honestly pretty bad. But almost no one measures and it doesn't actually matter to players.

Musicians have keenly trained ears. I would imagine their much more sensitive to audio latency than even a pro gamer, nevermind average Joe off the street.

Where latency really matters is when you have a musical instrument that plays a sound and it's connected to a monitor. If those sounds are separated by more than 8ms or so the difference will be super noticeable to anyone, including Joe off the street.

I'd be interested for someone to run a user study on MIDI keyboard latency. I'd bet $3.50 that anything under 40 milliseconds would be sufficient. Maybe 30 milliseconds. I'd be utterly shocked if it needed to be 8 milliseconds. And I'd be extremely shocked if every popular MIDI keyboard on the market actually hit that level of latency.

1 more reply

jcelerier1y ago· 1 in thread

Timur Doumler's videos on the topic are also pretty good and bring some new methodologies to the table:

https://youtu.be/zrWYJ6FdOFQ

https://youtu.be/vn7563IAQ_E

https://youtu.be/7fKxIZOyBCE

ec1096851y ago

Good set of videos. Here is the article version of the first: https://timur.audio/using-locks-in-real-time-audio-processin...

The insight is that with two threads contending on one lock, there are efficient ways to build the lock that minimizes cpu on the non-realtime thread.

spacechild11y ago

A timeless classic! This is the first thing I always recommend to anyone interested in real-time audio programming.

marcod1y ago

Off topic. Anybody else like Thursday Next? Had to think of "Time waits for no man!"

white_beach1y ago

i thought glitch music a good

j / k navigate · click thread line to collapse

121 comments

87 comments · 12 top-level

swatcoder1y ago· 32 in thread

(2011) But a great summary and mostly evergreen

Derbasti1y ago

A much different experience from embedded programming, where 99% occupancy is no problem at all.

RossBencina1y ago

swatcoder1y ago

ssfrrOP1y ago

RossBencina1y ago

> dogmatically following this advice might be overkill

It depends on your appetite for risk and the cost of failure.

ssfrrOP1y ago

I don't mean to devalue the advice here. I think it's spot on, and I unreservedly recommend this article to folks who want to learn about writing reliable audio software.

I think in essence I'm repeating the comments of Justin from Cockos, which you summarize [1]:

> It is basically saying that you can reduce the risk of priority inversion to the point where the probability is too low to worry about.

In that comment you also say:

> 100% certainty can’t be guaranteed without a hard real-time OS. However 5ms is now considered a relatively high latency setting in pro/prosumer audio circles

Which I interpret as acknowledging that we're already forced into the regime of establishing an acceptable level of risk.

[1]: http://www.rossbencina.com/code/real-time-audio-programming-...

jancsika1y ago

> If your DSP callback executes in 1ms 99.99% of the time but sometimes takes 10ms, you’re hosed.

I tend to agree, but...

ssfrrOP1y ago

The context for this article is writing pro audio software, where that kind of distortion would generally be as bad as a dropout, if not worse.

1 more reply

munificent1y ago

wizardforhire1y ago

This scenario is the stuff of nightmares for me!

4 more replies

ssfrrOP1y ago

But you'll never be 100% sure. Most musicians aren't willing to pay for NASA-level QA and custom hardware running an RTOS, and even that doesn't guarantee perfect software.

This isn't even in opposition to the article, which says explicitly:

1 more reply

varispeed1y ago

kdjdjjz1y ago

Real time does not mean fast, it means deterministic

Thus such micro optimizations are seldomly used. Quite the opposite, you try to avoid jitter which could be the result of caches

2 more replies

PaulDavisThe1st1y ago

> Create LUTs

This has been slower for most things that raw computation for well over a decade (probably more like two).

3 more replies

nsguy1y ago

Derbasti1y ago

This is great fun! But it's much more prevalent in embedded DSP than desktop.

RossBencina1y ago

> deterministically fast

Indeed, like all real-time systems you need to think in terms of worst-case time complexity, not amortized complexity.

rzzzt1y ago

Use of Ethernet in real-time systems. Packet loss, collision rate, jitter is """good enough""" so it became an acceptable replacement of eg. ATM.

nsguy1y ago

[fixed typo]

lukeh1y ago

Or you use AVB/TSN which gives you stronger guarantees, but requires cooperation of all bridges (switches).

empiricus1y ago

exDM691y ago

Games do a very impressive amount of work for graphics but there's a huge difference: a dropped/late graphics frame every now and then is not a big deal.

izacus1y ago

Frame pacing issues due to dropped frames are absolutely a huge deal in games.

1 more reply

lomase1y ago

The GPU is able to paralelizes the drawing of each pixel.

bluGill1y ago

Missing a draw and thus displaying the previous screen for a frame is not noticable in general.

jcelerier1y ago

Depends on where you run your audio. I've worked on embedded devices where we could just use isolcpu and IRQ controls to make sure specific cores were only ever used for the audio thread of the app.

spacechild11y ago

What do you mean by "bus" and "module" in this context?

swatcoder1y ago

spacechild11y ago

> In all these cases, you need to assume that all processing code that isn't yours needs all the time that you can leave for it and just make your own code as efficient as is reasonable.

Yes. For audio programmers that is obvious, in particular when it comes to plugins, but for novices it might be worth pointing out!

RossBencina1y ago

> You might sometimes build an app where (through your operating system) you connect directly with an input device and/or output device and then do all the audio processing yourself.

In case it is not clear, that is the primary case that is addressed by the linked blog post (source: I wrote the blog post).

1 more reply

GrantMoyer1y ago

A module is a piece of software or hardware which is independent in some way.

A bus is a shared medium of communication[1]. Often, busses are time-division multiplexed[2], so if you want to use the bus, but another module is already using it, you need to wait.

[1]: https://en.wikipedia.org/wiki/Bus_(computing)

[2]: https://en.wikipedia.org/wiki/Time-division_multiplexing

spacechild11y ago

That is one possible interpretation, but not what they meant. That's why I asked because I wasn't sure :)

user_78321y ago· 12 in thread

Slightly tangential, does anyone know any good (windows based) DSP software? EquilizerAPO is decent in theory but beyond being clunky to use unfortunately doesn't even seem to work 90% of the time.

jonathanstrange1y ago

Do you mean software that can modify audio streams via the Windows audio system (as opposed to an ASIO driver used by DAWs)?

user_78321y ago

> Do you mean software that can modify audio streams via the Windows audio system

rzzzt1y ago

Cockos' JSFX: https://www.cockos.com/jsfx/

spacechild11y ago

Graphical: Pure Data, Max/MSP

Text based: SuperCollider, Csound, Chuck

Ylpertnodi1y ago

https://www.airwindows.com/consolidated/

J_Shelby_J1y ago

I gave up on software and bought a steinberg ur44c. Almost zero latency for eq and compression so you can monitor your self in real-time.

RossBencina1y ago

AudioMulch?

Optimal_Persona1y ago

anigbrowl1y ago

I'm still a fan ~25 years later

bratwurst30001y ago

I think camilla dsp works for windows.

chresko1y ago

SuperCollider

Ylpertnodi1y ago

www.airwindows.com

zokier1y ago· 7 in thread

I would love to see a UI system that has predictable low-latency real-time perf, so you could confidently achieve something like single frame latency on 144Hz display.

Derbasti1y ago

kaba01y ago

Well, if you allow a bit of a sloppy terminology, graphics are “softer” real-time, than audio - a frame drop is less noticeable than audio glitches.

makapuf1y ago

robinsonb51y ago

dist-epoch1y ago

The tiniest audio glitch is instantly noticed and extremely annoying. So extreme efforts are put into preventing them. If your software has audio glitches people will really stop using it.

A graphics micro-stutter not so much.

> I'm not aware of any UI toolkits designed with real-time in mind.

What would be the point? The human eye can only notice so much FPS (gamers might disagree with their 244 FPS displays).

DontchaKnowit1y ago

Not just annoying, but potentially dangerous. An audio glitch could concievably deafen 2 thousand people at a show if things go wrong.

pjc501y ago

All games seem to manage this somehow. Usually by using fixed layout for the UI, because re-layout is a nasty non parallelizeable problem that UI designers frequently inflict on themselves.

jmkr1y ago· 6 in thread

One might assume if you just follow audio programming guides then you can do all this, but you still need to have your system setup to handle real time audio, in addition to your program.

It's all noticeable.

duped1y ago

> We often say "real time" when we mean "fast." But in audio real time means "really fast, all the time" and somewhat deterministically.

kaba01y ago

And to demonstrate that hard realtime is not about speed, there is a whole hard-real time JVM implementation with GC and everything used in military contexts.

jmkr1y ago

I suppose it's hard to make guarantees with different environments and hardware, but I realized when we (non-realtime people) ship software we don't really have guarantees for when our functions run.

1 more reply

hrkfmud50k1y ago

if you think that's cool then you may also like a "hard" real time programs e.g. ABS brakes https://en.wikipedia.org/wiki/Real-time_computing#Criteria_f...

chalcolithic1y ago

You can actually do music/audio programming in a browser. It has some rough edges but it works.

xipix1y ago

Absolutely you can. With WebAsm SIMD you have near-native DSP performance. Downsides from my experience [1]:

- You are at the mercy of the browser. If browser engineers mess up the audio thread or garbage collection, even the most resilient web audio app breaks. It happens.

- Security mitigations prevent or restrict use of some useful APIs. For example, SharedArrayBuffer and high resolution clocks.

[1] https://bungee.parabolaresearch.com/bungee-web-demo

demondemidi1y ago· 6 in thread

If you’re worried about glitches during live performances there’s a fool proof solution: play real instruments. ;)

shiroiushi1y ago

That's great if you don't need any amplification at all, or perhaps nothing more than a typical guitar/bass amplifier. Big venues need a lot more hardware than this.

demondemidi1y ago

What are you even talking about, man.

shiroiushi1y ago

They don't have the old stuff any more. It's like trying to land men on the Moon now: the Apollo hardware is all gone, except the stuff in a museum that doesn't work any more.

1 more reply

lomase1y ago

A digital synth is a real instrument and electronic music is real music.

PaulDavisThe1st1y ago

... and if the FOH engineer is using a poorly engineered digital console, or even a DAW, for mixing ... ?

uwagar1y ago

or embrace glitch in your music.

brcmthrowaway1y ago· 5 in thread

This seems super outdated. Isn't CoreAudio HW accelerated now?

raphlinus1y ago

The advice I'd give to anybody building audio today is to relentlessly measure all potential sources of scheduling jitter end-to-end. Once you know that, it becomes clearer how to address it.

nyanpasu641y ago

raphlinus1y ago

1 more reply

PaulDavisThe1st1y ago

cyclictest(8) is the canonical tool for starting out down this pathway. That measures basic kernel stuff relevant to this area of inquiry.

binary1321y ago

also interested in the answer to this

chaosprint1y ago· 4 in thread

Great resource! For those interested in learning the fundamentals of audio programming, I highly recommend starting with Rust.

the cpal library in Rust is excellent for developing cross-platform desktop applications. I'm currently maintaining this library:

https://github.com/chaosprint/asak

It's a cross-platform audio recording/playback CLI tool with TUI. The source code is very simple to read. PRs are welcomed and I really hope Linux users can help to test and review new PRs :)

When developing Glicol(https://glicol.org), I documented my experience of "fighting" with real-time audio in the browser in this paper:

https://webaudioconf.com/_data/papers/pdf/2021/2021_8.pdf

Throughout the process, Paul Adenot's work was immensely helpful. I highly recommend his blog:

https://blog.paul.cx/post/profiling-firefox-real-time-media-...

I am currently writing a wasm audio module system, and hope to publish it here soon.

smj-edison1y ago

nyanpasu641y ago

dmix1y ago

Is there a good "toolbox" style cli for audio? Like pitch shifting and time stretching etc

jim-jim-jim1y ago

sox

forrestthewoods1y ago· 2 in thread

It's worth noting that these are practically the only case where extreme real-time audio programming measures are necessary.

Audio programming is fun. But you can inject latency to smooth out jitter in almost all use cases that don't involve a live musical instrument.

[1] https://www.youtube.com/watch?v=JTuZvRF-OgE&t=490s

PaulDavisThe1st1y ago

Shooting a gun (or whatever) in a game and "waiting" for the sound is extremely isomorphic to pressing a key on a MIDI keyboard and "waiting" for the sound.

Yes, background sound in games can be handled with very large buffers, but most players expect music-performance-like latency for action-driven sound.

forrestthewoods1y ago

Games don't have particularly large buffers. There's just a very long pipeline with lots of buffering. It's honestly pretty bad. But almost no one measures and it doesn't actually matter to players.

Musicians have keenly trained ears. I would imagine their much more sensitive to audio latency than even a pro gamer, nevermind average Joe off the street.

1 more reply

jcelerier1y ago· 1 in thread

Timur Doumler's videos on the topic are also pretty good and bring some new methodologies to the table:

https://youtu.be/zrWYJ6FdOFQ

https://youtu.be/vn7563IAQ_E

https://youtu.be/7fKxIZOyBCE

ec1096851y ago

Good set of videos. Here is the article version of the first: https://timur.audio/using-locks-in-real-time-audio-processin...

The insight is that with two threads contending on one lock, there are efficient ways to build the lock that minimizes cpu on the non-realtime thread.

spacechild11y ago

A timeless classic! This is the first thing I always recommend to anyone interested in real-time audio programming.

marcod1y ago

Off topic. Anybody else like Thursday Next? Had to think of "Time waits for no man!"

white_beach1y ago

i thought glitch music a good

j / k navigate · click thread line to collapse