Granular Audio Synthesis (opens in new tab)

(blog.demofox.org)

153 pointsunsatchmo8y ago33 comments

33 comments

24 comments · 9 top-level

jeffreyrogers8y ago· 4 in thread

This is pretty neat. One frustrating thing I found while doing some audio programming recently is how hard it was working with different audio formats. Most of the libraries I found for doing so were GPL or required a commercial license.

Atrix2568y ago

Does this source code help you much with that? It only deals with wave files but can read / write 8, 16, 24 or 32 bit wave files, at whatever sample rate, with however many channels.

I really wish someone would make a header only C++ audio library, that would be soooo nice.

jeffreyrogers8y ago

Wave files are pretty easy to deal with because the format is simple and the data isn't usually compressed. It's all the other formats that make this hard. Actually, it's probably not that hard, but parsing file formats isn't really a fun programming task (for me at least).

radarsat18y ago

libsndfile is LGPL

jeffreyrogers8y ago

Thanks. I actually came across libsndfile, but for some reason thought it was GPL instead of LGPL. Ideally there'd be a BSD licensed library, but LGPL is usable.

recentdarkness8y ago· 4 in thread

Title should be “Granular Audio Synthesis”

Couldn’t find anything about C++ in that article on a quick scan - feel free to correct me

FraKtus8y ago

It's here https://github.com/Atrix256/GranularSynth. But unfortunately it does not go very deep into the detail, and there is no real motivation on why C++ would be cool to do that processing...

Atrix2568y ago

I'm the author and didn't make this post on yc, but yes, the implementation is in 680 lines of standalone c++

The point of the article isn't about c++ or why it's a good language for doing this sort of thing, but I'm a real time graphics and game engine programmer, so it's my language of choice.

2 more replies

iammyIP8y ago

Because this kind of processing needs to run fast, almost all resources and tutorials are in c/cpp, and security is not a concern.

sctb8y ago

We're reverted the submitted title of “Granular Synthesis in C++” to that of the article.

luk328y ago· 3 in thread

How does granular analysis differ from pcm representation, Fournier transformation and sampling? Or is it a different name for the same thing. I think it's natural to whoever worked with sound on a Pc.

It's probably debatable, but I don't agree with the statement that shortnening the "sound" changes pitch. It depends on your representation of the sound. If you represent it as a function of amplitude vs time then scaling the time axis does change pitch.

This makes a sensational tone about a fallacy. No instrument plays sound faster or slower to make it shorter or longer.... It just stops playing it or doesn't. If one thinks about the phenomenon this way, it becomes natural why you cannot compress time, to play shorter sounds.

teilo8y ago

You don't seem to have a very good grasp of this subject and don't appear to have read the article very carefully. The only viable alternative to PCM is DSD, which failed to gain any traction for good reasons. So for all practical purposes, sampling and PCM are the same thing. You also throw in Fourier (not Fournier) transformation for good measure, which is relevant to additive synthesis, but not to granular synthesis, which is the topic of this article.

> I don't agree with the statement that shortnening the "sound" changes pitch. It depends on your representation of the sound. If you represent it as a function of amplitude vs time then scaling the time axis does change pitch.

The only relevant "representation" is digital audio, which by definition is encoded as amplitude over time regardless of encoding technique. To lengthen time without changing pitch or pitch without changing time requires manipulation of the audio data. That manipulation is either done by granular synthesis, or by utilizing a Fast Fourier Transform to decompose the audio into its component waveforms, changing the frequencies or shortening the wave components, and recomposing them back to a composite waveform. This article is about granular synthesis, which requires far less computation than FFT.

> No instrument plays sound faster or slower to make it shorter or longer....

Irrelevant. We aren't dealing with physical instruments, but with digital audio.

There is nothing in the least fallacious or sensational about this article.

tabtab8y ago

Fourier-based techniques (FBT) and sample slicing (SC) may be similar if doing "raw" transformations, but FBT can potentially be cleaner, or at least easier to clean up. If you use raw "bit-maps" for FBT, yes it will be choppy like SC, but one can use regression or regression-like curve-fitting to give FBT smooth time/frequency curves to synthesize against, sounding more natural. There are down-sides to using regression, but for typical voice and music, those won't matter much.

One rough area for curve-fitting is white-noise-esque sounds (WNES) like the letter "s" or "h" and tambourines. The processor can perhaps detect if WNES exceed a threshold, and use other techniques such as SC instead.

It's roughly comparable to JPEG versus GIF images. JPEG is better (more faithful) at gradual shades while GIF is better at edges. A better compression algorithm perhaps would use each where it does best per given image. However, at the cost of algorithm complexity and compression/decompression processing time.

Atrix2568y ago

By playing a sound faster I mean changing it's sample rate, without doing anything else.

jedimastert8y ago· 2 in thread

I've seen a lot of people commenting about the artifacts you hear when the samples are stretched. These happen because of phasing issues, where frequencies in each of the grains are interfering with one another.

I'm surprised I don't see it mentioned here, but there's a rather interesting extension to this technique made by Paul Nasca[0], which midigates these artifact by (1)carefully choosing the size and placement of grains and (2)randomly changing the phase of each grain before recombining. You can see the algorithm here[1].

The results are absolutely incredible. You can end up slowing a sample down by 800% or more with no artifacts. For example, here[2] is the Windows 95 startup sound extended to be a little over 6 minutes long. The reverb you hear isn't added, that's just what is sounds like.

Also, if you didn't notice from the page, it's one of the default plug-ins in Audacity.

[0]: http://www.paulnasca.com/ [1]: http://www.paulnasca.com/algorithms-created-by-me#TOC-PaulSt... [2]: https://www.youtube.com/watch?v=FsJdplLB1Bs

mushishi8y ago

You made my day, thank you! I've been using Ableton Live's warping mechanism with Complex Pro settings, and this seems really promising alternative.

MrScruff8y ago

There's a recently released VST/AU version of Paul's Stretch under active development.

https://xenakios.wordpress.com/paulxstretch-plugin/

1 more reply

amelius8y ago· 2 in thread

Perhaps a better way of looking at it is this. Basically, a sound triggers hair cells in the ear. A single harmonic tone triggers a single group of hair cells. Through modeling, you can compute which hair cells are triggered at what moment for a given signal. Your task is then to compute a new signal for which the same haircells are triggered but faster.

yoklov8y ago

I think their description is a much more actionable description of it than yours, to be honest.

amelius8y ago

Yes, it would require some math, but I suspect you'd get superior results. For example, you can replace the "modeling" by the FFT transform applied to small time-slices (i.e., this determines which hair-cells get triggered at a given time). Now you have to stitch these slices together without introducing spurious frequencies, which is the difficult part.

1 more reply

mgeorgoulo8y ago

Very good results and embarrassingly easy to implement!

The very stretched waveform did contain some audible artifacts, but I think other methods like FFT would introduce some as well.

This kind of trick works because our hearing is frequency-based. So the crucial thing is to preserve the frequencies and it is going to sound exactly the same.

Spatial mapping of frequencies in the human ear here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2394499/ (see fig 5.)

Trying this with an image for example wouldn't work, because our vision is sample-based. Imagine splitting an image in tiny fragments and repeating/interpolating them on top of one another.

jancsika8y ago

> What this does is make it so you can put any grain next to any other grain, and they should fit together pretty decently. This gives you C0 continuity by the way, but higher order discontinuities still affect the quality of the result. So, while this method is fast, it isn’t the highest quality. I didn’t try it personally, so am unsure how it affects the quality in practice.

It's not just about continuity. It also removes an entire set of concerns from the process.

For example-- suppose someone analyzes an audio recording, splits it into grains, then does some fancy re-organization based on the timbral content of the recording/grains.

Now suppose they are subjectively unhappy with the result. Perhaps it sounds "wimpy," "fluttery," or some other such vague complaint. Is that sound due to a) their process of re-organizing the grains, b) the quality of the original recording, c) the envelopes they used, or d) something else entirely?

If instead one uses grains which begin and end at zero, the answer can't be C because it doesn't exist. I can say that the quality sounds fine in the few examples I've heard that use this technique.

I'd imagine the reason the latter isn't used as often is because it's simply more difficult to program if each grain can be an arbitrary size (or at least not quantized).

aidenn08y ago

I think the speed-up sounds much better than the slow-down. With the slow-down there are very noticable artifacts; I'm not sure if it's because of the envelope they choose or just because repeating a grain adds harmonics.

vladimirralev8y ago

As far as I see this is basically naive TDHS (Time Domain Harmonic Scaling). It's a great starter project as an intro to audio-effect coding, since you can visually observe where you go wrong and where the noise comes from at the edges. Just great for learning how audio works for beginners. It's very rare to have an audio effects algorithm so cool and so easy to observe without special analysis tools.

Some more famous algorithms that work this way and are similarly easy to implement are TDHS and PSOLA. They all work in the time domain but find different ways to smooth out the discontinuities and to make more extreme shifts sound better.

j / k navigate · click thread line to collapse

33 comments

24 comments · 9 top-level

jeffreyrogers8y ago· 4 in thread

Atrix2568y ago

Does this source code help you much with that? It only deals with wave files but can read / write 8, 16, 24 or 32 bit wave files, at whatever sample rate, with however many channels.

I really wish someone would make a header only C++ audio library, that would be soooo nice.

jeffreyrogers8y ago

radarsat18y ago

libsndfile is LGPL

jeffreyrogers8y ago

Thanks. I actually came across libsndfile, but for some reason thought it was GPL instead of LGPL. Ideally there'd be a BSD licensed library, but LGPL is usable.

recentdarkness8y ago· 4 in thread

Title should be “Granular Audio Synthesis”

Couldn’t find anything about C++ in that article on a quick scan - feel free to correct me

FraKtus8y ago

It's here https://github.com/Atrix256/GranularSynth. But unfortunately it does not go very deep into the detail, and there is no real motivation on why C++ would be cool to do that processing...

Atrix2568y ago

I'm the author and didn't make this post on yc, but yes, the implementation is in 680 lines of standalone c++

The point of the article isn't about c++ or why it's a good language for doing this sort of thing, but I'm a real time graphics and game engine programmer, so it's my language of choice.

2 more replies

iammyIP8y ago

Because this kind of processing needs to run fast, almost all resources and tutorials are in c/cpp, and security is not a concern.

sctb8y ago

We're reverted the submitted title of “Granular Synthesis in C++” to that of the article.

luk328y ago· 3 in thread

teilo8y ago

> No instrument plays sound faster or slower to make it shorter or longer....

Irrelevant. We aren't dealing with physical instruments, but with digital audio.

There is nothing in the least fallacious or sensational about this article.

tabtab8y ago

Atrix2568y ago

By playing a sound faster I mean changing it's sample rate, without doing anything else.

jedimastert8y ago· 2 in thread

Also, if you didn't notice from the page, it's one of the default plug-ins in Audacity.

[0]: http://www.paulnasca.com/ [1]: http://www.paulnasca.com/algorithms-created-by-me#TOC-PaulSt... [2]: https://www.youtube.com/watch?v=FsJdplLB1Bs

mushishi8y ago

You made my day, thank you! I've been using Ableton Live's warping mechanism with Complex Pro settings, and this seems really promising alternative.

MrScruff8y ago

There's a recently released VST/AU version of Paul's Stretch under active development.

https://xenakios.wordpress.com/paulxstretch-plugin/

1 more reply

amelius8y ago· 2 in thread

yoklov8y ago

I think their description is a much more actionable description of it than yours, to be honest.

amelius8y ago

1 more reply

mgeorgoulo8y ago

Very good results and embarrassingly easy to implement!

The very stretched waveform did contain some audible artifacts, but I think other methods like FFT would introduce some as well.

This kind of trick works because our hearing is frequency-based. So the crucial thing is to preserve the frequencies and it is going to sound exactly the same.

Spatial mapping of frequencies in the human ear here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2394499/ (see fig 5.)

Trying this with an image for example wouldn't work, because our vision is sample-based. Imagine splitting an image in tiny fragments and repeating/interpolating them on top of one another.

jancsika8y ago

It's not just about continuity. It also removes an entire set of concerns from the process.

For example-- suppose someone analyzes an audio recording, splits it into grains, then does some fancy re-organization based on the timbral content of the recording/grains.

If instead one uses grains which begin and end at zero, the answer can't be C because it doesn't exist. I can say that the quality sounds fine in the few examples I've heard that use this technique.

I'd imagine the reason the latter isn't used as often is because it's simply more difficult to program if each grain can be an arbitrary size (or at least not quantized).

aidenn08y ago

vladimirralev8y ago

j / k navigate · click thread line to collapse