I am ashamed to admit this took me a long time to properly understand. For further reading I'd recommend:
https://people.xiph.org/~xiphmont/demo/neil-young.html https://www.youtube.com/watch?v=cIQ9IXSUzuM
So if you want to take a continuous (analog) signal, digitize it, then convert back to analog, you are fundamentally adding latency. And if you want to do DSP operations on a digital signal, you also generally add some latency. And the higher the sampling rate, the lower the latency you can achieve, because you can use more compact approximations of sinc that are still good enough below 20kHz.
None of this matters, at least in principle, for audio streaming over the Internet or for a stored library — there is a ton of latency, and up to a few ms extra is irrelevant as long as it’s managed correctly when at synchronizing different devices. But for live sound, or for a potentially long chain of DSP effects, I can easily imagine this making a difference, especially at 44.1ksps.
I don’t work in audio or DSP, and I haven’t extensively experimented. And I haven’t run the numbers. But I suspect that a couple passes of DSP effects or digitization at 44.1ksps may become audible to ordinary humans in terms of added latency if there are multiple different speakers with different effects or if A/V sync is carelessly involved.
Now you could play it back wrong by emitting a sharp pulse f_s times per second with the indicated level. This will have a lot of frequency content above 20kHz and, in fact, above f_s/2. It will sounds all kinds of nasty. In fact, it’s what you get by multiplying the time-domain signal by a pulse train, which is equivalent to convolving the frequency-domain signal with some sort of comb, and the result is not pretty.
Or you do what the sampling theorem says and emit a sinc-shaped pulse for each sample, and you get exactly the original signal. Except that sinc pulses are infinitely long in both directions.
[0] Energy is proportional to pressure squared. You’re sampling pressure, not energy.
[1] This is necessary to prevent aliasing. If you feed this algorithm a signal at f_s/2 + 5kHz, it would come back out at f_s - 5kHz, which may be audible.
At least I don't have tinnitus.
Here's my test,
```fish
set -l sample ~/Music/your_sample_song.flac # NOTE: Maybe clip a 30s sample beforehand
set -l borked /tmp/borked.flac # WARN: Will get overwritten (but more likely won't exist yet)
cp -f $sample $borked
for i in (seq 10)
echo "$i: Resampling to 44.1kHz..."
ffmpeg -i $borked -ar 44100 -y $borked.tmp.flac 2>/dev/null
mv $borked.tmp.flac $borked
echo "$i: Resampling to 48kHz..."
ffmpeg -i /tmp/borked.flac -ar 48000 -y $borked.tmp.flac 2>/dev/null
mv $borked.tmp.flac $borked
end
echo "Playing original $sample"
ffplay -nodisp -autoexit $sample 2>/dev/null
echo "Playing borked file $borked"
ffplay -nodisp -autoexit $borked 2>/dev/null
echo "Diffing..."
set -l spec_config 's=2048x1024:start=0:stop=22000:scale=log:legend=1'
ffmpeg -i $sample -lavfi showspectrumpic=$spec_config /tmp/sample.png -y 2>/dev/null
ffmpeg -i $borked -lavfi showspectrumpic=$spec_config /tmp/borked.png -y 2>/dev/null
echo "Spectrograms,"
ls -l /tmp/*.spec.png
```I imagine the noise increases when one of the supports fail, and the filament starts oscillating leading to mechanical stress and failure
(not that it makes a difference, just thinking out loud)
That is not true... A 22kHz signal only has 2 data points for a sinusoidal waveform. Those 2 points could be anywhere I.e you could read 0 both times the waveform is sampled.... See Nyquist theorem.
From memory changing the sample rate can cause other issues with sample aliasing sue to the algorithms used...
Reducing the sample rate could cause aliasing. Oversampling shouldn't.
I buy loads of DJ music on Bandcamp and "downsample" (I think the term is) to 16bit if they only offer 24bit for smaller size and wider compatability.
What? No. All bandwidth limited signal is. Which means periodic. Causal signals like audio can be approximated, with tradeoffs. Such as pre-ringing (look at sinc(x), used to reconstruct sampled signal — how much energy is in the limb preceding the x=0.)
Is the approximation achieved by filtering the 44.1kHz DAC good enough? Yes, yes it is. But the math is way more involved (i.e. beyond me) than simply "Niquist".
This popular myth that limited frequencies we can hear and limited frequencies in Fourier transform sense is the same thing is quite irritating.
the article explains why.
tldr: formula for regenerating signal at time t uses an infinite amount of samples in the past and future.
Is there a reason the solution that "works very well" for images isn't/can't be applied to audio?
There is this website that has painstakingly compares many resampling algorithms from all sorts of software:
Try it's mirror if you can't access it: https://megapro17.github.io/src/index.html
The only one that says it is a cubic interpolation is the "Renoise 2.8.0 (cubic)" one, the spectrogram isn't very promising with all sorts of noise, intermodulation and aliasing issues. And, by switching to the 1khz tone spectrum view you can see some harmonics creeping up.
When I used to mess with trackers I would sometimes chose different interpolations and bicubic definitely still colored the sound, with sometimes enjoyable results. Obviously you don't want that as a general resampler...
Ehhm, yeah, duh? You don't resample unless there is a clear need, and even then you don't upsample and only downsample, and you tell anyone that tries to convince you otherwise to go away and find the original (analog) source, so you can do a proper transfer.
> given sufficient computing resources, we can resample 44.1 kHz to 48 kHz perfectly. No loss, no inaccuracies.
and then further
> Your smartphone probably can resample 44.1 kHz to 48 kHz in such a way that the errors are undetectable even in theory, because they are smaller than the noise floor. Proper audio equipment can certainly do so.
That is you don't need the original source to do a proper transfer. The author is simply noting
> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.
That is that re-sampling is not a bad idea in this case because it's going to have any sort of error if done properly, it's just that the Author notes you cannot trust any random given re-sampler to do so.
Therefore if you do need to resample, you can do so without the analog source, as long as you have a re-sampler you can trust, or do it yourself.
I'm working on a game. My game stores audio files as 44.1kHz .ogg files. If my game is the only thing playing audio, then great, the system sound mixer can configure the DAC to work in 44.1kHz mode.
But if other software is trying to play 48kHz sound files at the same time? Either my game has to resample from 44.1kHz to 48kHz before sending it to the system, or the system sound mixer needs to resample it to 48kHz, or the system sound mixer needs to resample the other software from 48kHz to 44.1kHz.
Unless I'm missing something?
Those two examples emerged independently, like rail standards or any number of other standards one can cite. That's really just the top of the rabbit-hole, since there are 8-20 "standard" audio sample rates, depending how how you count.
This isn't really a drawback, and it does provide flexibility when making tradeoffs for low bitrates (e.g. 8 kHz narrowband voice is fine for most use cases) and for other authoring/editing vs. distribution choices.
44.1khz exists because it was the lowest technically practical speed and was an optimization for processing speed and storage space.
48khz exists because it syncs with video easily — I’ve also heard it allows for more tolerance in the anti-aliasing filter.
44.1 kHz never really went away because CDs continued using it, allowing them to take any existing 44.1 kHz content as well as to fit slightly more audio per disc.
At the end of the day, the resampling between the two doesn't really matter and is more of a minor inconvenience than anything. There are also lots of other sampling rates which were in use for other things too.
Because of greed.
Early audio manufacturers (SONY notably) used 48kHz for profession-grade audio equipment, that would be used in studios or TV stations, and degraded 44.1khz audio for consumer devices. Typically you would pay an order of magnitude more for the 48kHz version of the hardware.
48khz is better for creating and mixing audio. You cannot practically mix audio at 44.1khz without doing very slight damage to audible high frequencies. But enough to make a difference. If you were creating for consumer devices, you would mix at 48Khz, and then downsample to 44.1khz during final mastering, since conversion from 48kHz to 44.1kHz can be done theoretically (and practically) perfectly. (Opinions of the OP notwithstanding).
I think it's safe to say that the 44.1kHz sampling rate was maliciously selected specifically because it is just low enough that perfect playback is still possible, but perfect mixing is practically not possible. And obviously maliciously chosen to be a rate with no convenient greatest common denominator with 48Khz, which would have allowed easy and cheap perfect realtime resampling. Had Sony chose 44.0kHz, it would be trivially easy to do sample rate conversion to 48Khz in realtime even with primitive hardware available in the late 1970s. That extra .1kHz is transparently obvious malice and greed in plain sight.
Presumably SONY would sell you the software or hardware to perform perfect non-realtime conversion of audio from 48khz to 44.1khz for a few tens of thousands of dollars. Not remotely subtle how greedy all of this was.
There has been no serious reason to use 44.1kHz instead of 48kHz for about 50 years, at least from a technology point of view. (And no real reason to EVER use 44.1khz instead of 48kHz other than GREED).
Most stuff on the internet ripped from CD is 44.1. 48 is getting more common. We’re like smack in the middle of the 75 year transition period to 48kHz.
For new projects, I use 48, because my mics are 32bit (float!)/48kHz.
the first CD player didn't had compute power to upsample perfectly but modern devices certainly do.
As an example, lets say I change frequency in Audacity and press the play button. Does Audacity now go and inspect, whether anything else on my system is making any sound?
In PulseAudio you can choose resample method you want to use for the whole mixing daemon but I don't think that's option in windows/macos
It is also the job of the operating system or its supporting parts to allow applications to configure audio devices to specific sample rates if that's what the application needs.
It's fine to just take whatever you get if you are a game app, and either allow the OS to resample, or do the resampling yourself on the fly.
Not so fine if you are authoring audio, where the audio device rate ABSOLUTELY has to match the rate of content that's being created. It is NOT acceptable to have the OS doing resampling when that's the case.
Audacity allows you to force the sample rate of the input and output devices on both Windows and Linux. Much easier on Windows; utterly chaotic and bug-filled and miserable and unpredictable on Linux (although up-to-date versions of Pipewire can almost mostly sometimes do the right thing, usually).
I think that's the point? In practice the OS (or its supporting parts) resample audio all the time. It's "under the hood" but the only way to actually avoid it would be to limit all audio files and playback systems to a single rate.
If you have a mixer at 48KHz you'll get minor quantization noise but if it's compressed already it's not going to do any more damage than compression already has.
https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampli...
I suppose the option you're missing is you could try to get pristine captures of your samples at every possible sample rate you need / want to support on the host system.
My reply was from an audio mastering perspective.
> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.
That is, you should verify the re-sampler you are using or implement yourself in order to be sure it is done correctly, and that with todays hardware it is easily possible.
From an information theory perspective, this is like putting a smaller pipe right through the middle of a bigger one. The channel capacity is the only variable that is changing and we are increasing it.
For example if you watch a 24fps film on a 60fps screen, in contrast to a 120fps screen
The issues are that 1) resampling has a performance and latency cost, 2) better resampling has a higher performance and latency cost
Also, for decades upsampling on ingest and downsampling on egress has been standard practice for DSP because it reduces audible artifacts from truncation and other rounding techniques.
Finally, most recorded sound does not have an original analog source because of the access digital recording has created…youtube for example.
Then at the operating system level rather than mixing everything to a single audio stream at a single sample rate you group each stream that is at or a multiple of either 44.1khz or 48khz and then finally sends both streams to this "dual dac", thus eliminating the need to resample any 44.1khz or 48khz stream, or even vastly simplifying the resample of any sample rate that is a multiple of this.
You'd just resample both at 192kHz and run it into 192kHz DAC. The "headroom" means you don't need to use the very CPU intensive "perfect" resample.
For a sampled signal, if you know the sampling satisfied Nyquist (i.e., there was no frequency content above fs/2) then the original signal can be reproduced exactly at any point in time using sinc interpolation. Unfortunately that theoretically requires an infinite length sample, but the kernel can be bounded based on accuracy requirements or other limiting factors (such as the noise which was mentioned). Other interpolation techniques should be viewed as approximations to sinc.
Sinc interpolation is available on most oscilloscopes and is useful when the sample rate is sufficient but not greatly higher than the signal of interest.
This sounds contradictory - what would be the precision that can be heard in a test then?
I.e. no one cares.
If you want to change the number of slices of pizza, you can't simply just make 160x more pizza out of thin air.
Personally I'd just do a cubic resample if absolutely required (ideally you don't resample ofc); it's fast and straightforward.
Edit: serves me right for posting, I gotta get off this site.
Makes me think of GPS where the signal is below the noise floor. Which still blows my mind, real RF black magic.
As an aside, G.711 codecs use a kind of log scale with only four bits of signal but small signal values use much smaller bits.
source- wrote dithering code for digital images
further, the halftone technique developed in the 1880s by Georg Meisenbach — breaking images into dots to simulate shades of gray — was called autotype, not dithering. The term dithering was later adopted in digital imaging and computing, particularly in the 1960s, when engineers applied the concept of adding noise to reduce color banding.
If it's really good AI upsampling, you might get qualitatively "better" sounding audio than the original but still technically deviates from the original baseline by ~8%. Conversely, there'll be technically "correct" upsampling results with higher overall alignment with the original that can sound awful. There's still a lot to audio processing that's more art than science.