Understanding Convolutions in Probability (opens in new tab)

(countbayesie.com)

123 pointsCountBayesie3y ago23 comments

23 comments

As the author mentions, convolutions for signal processing and convolutions for probability are essentially equivalent things in two different contexts. It is interesting that the same concept is behind the multiplication of polynomials. In the discrete case we have the parallel of the vectors representing ...

  the (discretized) probability density <-> the (discrete) signal intensity <-> the coefficients of the polynomial

... in the respective context.

If we look at the discrete case, since the operation of convolution is computationally expensive (O(n^3)), people often rely on the Fast Fourier Transform Algorithm (FFT), that runs in O(n log(n)). An excellent and very enjoyable video on the FFT applied to the polynomial multiplication is https://youtu.be/h7apO7q16V0?t=1.

sfpotter3y ago

Discrete convolution is O(n^2).

H8crilA3y ago

That's the whole magic of FFT, it can be done in O(n*log(n)). And this is a practical algorithm, unlike the many fascinating algorithms for multiplying matrices. GMP will switch to FFT for multiplying very large numbers, for example, as numbers laid out in digits (whether binary or decimal) are polynomials if you look at them from the right angle. For processing audio and video it very quickly becomes economical to do the convolution in the frequency domain.

EDIT: never mind, I just figured out that you pointed a typo in the parent comment. Straightforward implementation is indeed O(n^2)

squaredot3y ago

Yes, you're right.

YouWhy3y ago

A minor thought I found inspiring many years ago: the Central Limit Theorem may be considered a statement about dynamic processes in the functional space of distributions.

If you consider the operator of convolution (rescaling for unit variance), the normal distribution is the only attractor (under many "natural" choices of metrics).

Sniffnoy3y ago

It's actually not the only attractor -- it's just the only attractor once you restrict to finite variance. If you allow infinite variance, there can be others: https://en.wikipedia.org/wiki/Stable_distribution

jesuslop3y ago

Makes sense, if you have a random process with IID variables, the distributions d are the same, so you can view the partial sums process (∑Xi) dynamic as advancing by convolving with d.

In the generating function pic, the distribution is borne as coefficients of polynomials or of formal power series, and convolution is polynomial multiplication. One example of your step is multiplying by (x+1)/2 (Bernoulli trial), and that gives approaches to the gaussian by chunks of normalized binomial coefficients.

Other related limit, described as a functional central limit theorem is given by Donsker's theorem [1], giving the passage from the discrete situation (random walk) to the continous (Brownian motion).

[1] https://en.wikipedia.org/wiki/Donsker%27s_theorem

amitport3y ago

Can you elaborate / add reference?

fjkdlsjflkds3y ago

"Central Limit Theorem" says that "adding (bounded variance) stuff together makes it converge to a Gaussian" (roughly). On the other hand, when you are adding random variable together, you are actually convolving their densities. Thus, what the CLT says is that a gaussian is some sort of attractor/fixed point of the "dynamic process" of convolving (finite energy/bounded variance) distributions.

1 more reply

hackandthink3y ago

"At a high level, in probability, a convolution is the way to determine the distribution of the sum of two random variables."

Probability of a certain sum value s is:

sum of probabilities of all (a,b) with a + b = s

(a: value from first input distribution. b: value from second input distribution)

with probability (a,b) = probability(a) * probability(b)

rustybolt3y ago

Mandatory link to a 3blue1brown video: https://m.youtube.com/watch?v=KuXjwB4LzSA

For me, the way to understanding convolutions was to to put sample rand(0, 1) a bunch of times and bucket the samples, then plot the number of samples in a bucket. You get a more-or-less flat graph. Now if you add or multiply two variables, you get different shapes. Once you spent enough effort on trying to understand why, you derive convolutions.

hackernewds3y ago

You seem to have replied without reading the article. This is linked right away in it.

froh3y ago

yes, this is the video linked in the first paragraph of the Original article.

roumenguha3y ago

I loved this article, it helped me understand: https://betterexplained.com/articles/intuitive-convolution/

brrrrrm3y ago

A somewhat related, visual guide to discrete 2d convolutions: https://ezyang.github.io/convolution-visualizer/

thecleaner3y ago

Small correction: this is correlation operator. Deep learning wise they (correlation and convolution) are equivalent but in probability it will lead to incorrect results.

wnoise3y ago

To elaborate, though the page does mention the distinction:

(Cross-)correlation is convolution with a reversed kernel (second signal) (or vice-versa, of course). (For discrete signals, reversing the kernel is just a swapping the indexes around; which makes absolutely no difference for deep-learning). Convolution is more "natural" because it's abelian, whereas swapping signal and kernel in cross-correlation time-reverses the result.

Koshkin3y ago

> a convolution is the way to determine the distribution of the sum of two random variables

Crucially, independent (i.e., uncorrelated) random variables.

(At the other extreme, the sum of maximally correlated random variables is computed as a "comonotonic sum" instead).

roumenguha3y ago

https://betterexplained.com/articles/intuitive-convolution/

ABeeSea3y ago

Nice jazz pun.

mathematicaster3y ago

Came to say the same!

j / k navigate · click thread line to collapse

23 comments

squaredot3y ago

  the (discretized) probability density <-> the (discrete) signal intensity <-> the coefficients of the polynomial

... in the respective context.

sfpotter3y ago

Discrete convolution is O(n^2).

H8crilA3y ago

EDIT: never mind, I just figured out that you pointed a typo in the parent comment. Straightforward implementation is indeed O(n^2)

squaredot3y ago

Yes, you're right.

YouWhy3y ago

A minor thought I found inspiring many years ago: the Central Limit Theorem may be considered a statement about dynamic processes in the functional space of distributions.

If you consider the operator of convolution (rescaling for unit variance), the normal distribution is the only attractor (under many "natural" choices of metrics).

Sniffnoy3y ago

jesuslop3y ago

Makes sense, if you have a random process with IID variables, the distributions d are the same, so you can view the partial sums process (∑Xi) dynamic as advancing by convolving with d.

[1] https://en.wikipedia.org/wiki/Donsker%27s_theorem

amitport3y ago

Can you elaborate / add reference?

fjkdlsjflkds3y ago

1 more reply

hackandthink3y ago

"At a high level, in probability, a convolution is the way to determine the distribution of the sum of two random variables."

Probability of a certain sum value s is:

sum of probabilities of all (a,b) with a + b = s

(a: value from first input distribution. b: value from second input distribution)

with probability (a,b) = probability(a) * probability(b)

rustybolt3y ago

Mandatory link to a 3blue1brown video: https://m.youtube.com/watch?v=KuXjwB4LzSA

hackernewds3y ago

You seem to have replied without reading the article. This is linked right away in it.

froh3y ago

yes, this is the video linked in the first paragraph of the Original article.

roumenguha3y ago

I loved this article, it helped me understand: https://betterexplained.com/articles/intuitive-convolution/

brrrrrm3y ago

A somewhat related, visual guide to discrete 2d convolutions: https://ezyang.github.io/convolution-visualizer/

thecleaner3y ago

Small correction: this is correlation operator. Deep learning wise they (correlation and convolution) are equivalent but in probability it will lead to incorrect results.

wnoise3y ago

To elaborate, though the page does mention the distinction:

Koshkin3y ago

> a convolution is the way to determine the distribution of the sum of two random variables

Crucially, independent (i.e., uncorrelated) random variables.

(At the other extreme, the sum of maximally correlated random variables is computed as a "comonotonic sum" instead).

roumenguha3y ago

https://betterexplained.com/articles/intuitive-convolution/

ABeeSea3y ago

Nice jazz pun.

mathematicaster3y ago

Came to say the same!

j / k navigate · click thread line to collapse