Understanding Convolution in Deep Learning (2015) (opens in new tab)

(timdettmers.com)

142 pointsteptoria5y ago20 comments

20 comments

14 comments · 4 top-level

nabla95y ago· 7 in thread

Everyone must first get over the terminology confusion. Convolution in DL is actually cross-correlation, not convolution. In practise it does not matter, signal is just flipped, but it can be very confusing when you try to learn and go trough examples.

r_c_a_d5y ago

The terminology comes from signal processing, where a convolution in the frequency domain is equivalent to a multiplication in the time domain. I don't think anyone is thinking about the frequency domain in deep-learning, but they still call the operators convolution kernels.

XMPPwocky5y ago

Fuck the frequency domain, here-

"Convolution with a kernel K" describes a system whose impulse response is K. In discrete time, suppose you have K=[1,2] and convolve [0,1,2,0] with it- you wind up with [0,1,3,2,0], if I'm awake enough for arithmetic.

Correlation with a kernel K is convolution with K time-reversed (i.e. [2,1])- you'd get [0,2,5,2,0] (again if I'm awake). Note that 5- right there, the input signal "lines up just right" with the kernel- 2x2 + 1x1. That's why it's called correlation- its output is big when the input looks like the kernel.

qppo5y ago

I mean ultimately it comes from functional analysis and differential equations (not signal processing).

It's a binary operator on functions that yields a third function. It has a lot of useful properties and equivalences, like that it can be described as the product of two Fourier transforms (although that's very roundabout).

You're actually introduced to convolution in middle school when you're taught how to multiply monomials to build a polynomial (at my middle school they called it "FOIL").

1 more reply

YetAnotherNick5y ago

Multiplying in frequency domain is convolution, in DL terminology convolution is that convolution with the weights rotated by 180 degree.

1 more reply

YetAnotherNick5y ago

As it is mostly done with weights whose initialization and any operation will be same with the flipping, it can basically be imagined whatever you find easy to imagine.

dnautics5y ago

It is a convolution. Grant Sanderson (3blue 1brown) explains the relationship between a filter and the fourier transform towards of the end of this hot-off-the-presses video: https://mitmath.github.io/18S191/Fall20/lecture2/

m0zg5y ago

> actually cross-correlation

That doesn't help one understand what it is at all. Convolution in DL is simply a set of dot products of a patch from the input with a bunch of filters. Each resulting dot product is simply a measure of similarity between a patch and a filter. That's all there is to it.

lacker5y ago· 3 in thread

IMO calling it "convolution" in deep learning is extra confusing, because the word "convolution" means many fairly different things in other contexts.

The idea behind convolution in deep learning is that, if a particular pattern of pixels is meaningful, then it is probably also meaningful if you shift the whole thing in some direction. So you can force some layers of the network to be the same under translation, and it'll be faster to pick up some sorts of patterns.

Der_Einzige5y ago

You didn't explain why its faster though.

It's faster because its reduces the dimensionality of the inputs down to something manageable (hundreds or low thousands). You can replace convolutions with most other types of dimensionality reduction (including other types of layers) and outside of image tasks you'll get very similar or even better performance.

elcritch5y ago

I wonder if that’d work by doing a 2d Fourier transformer on an image before hand, but you’re not reducing dimensionally that way.

bonoboTP5y ago

Convolution has been an established term in image processing long before convnets were invented in the 90s (or 80s or whenever). It's the same thing. It's useful to learn a bit of basic image processing, edge detection etc before jumping straight into the flashiest shiniest DL model with no basic foundations to conceptualize what is happening.

(Even before that it has been used in signal processing.)

timkofu5y ago

Thanks for sharing this.

ellisv5y ago

Published in 2015

1 more reply

j / k navigate · click thread line to collapse

20 comments

14 comments · 4 top-level

nabla95y ago· 7 in thread

r_c_a_d5y ago

XMPPwocky5y ago

Fuck the frequency domain, here-

qppo5y ago

I mean ultimately it comes from functional analysis and differential equations (not signal processing).

You're actually introduced to convolution in middle school when you're taught how to multiply monomials to build a polynomial (at my middle school they called it "FOIL").

1 more reply

YetAnotherNick5y ago

Multiplying in frequency domain is convolution, in DL terminology convolution is that convolution with the weights rotated by 180 degree.

1 more reply

YetAnotherNick5y ago

As it is mostly done with weights whose initialization and any operation will be same with the flipping, it can basically be imagined whatever you find easy to imagine.

dnautics5y ago

m0zg5y ago

> actually cross-correlation

lacker5y ago· 3 in thread

IMO calling it "convolution" in deep learning is extra confusing, because the word "convolution" means many fairly different things in other contexts.

Der_Einzige5y ago

You didn't explain why its faster though.

elcritch5y ago

I wonder if that’d work by doing a 2d Fourier transformer on an image before hand, but you’re not reducing dimensionally that way.

bonoboTP5y ago

(Even before that it has been used in signal processing.)

timkofu5y ago

Thanks for sharing this.

ellisv5y ago

Published in 2015

1 more reply

j / k navigate · click thread line to collapse